Function that shows the Unicode code of a character

pgc01

MrExcel MVP
Joined
Apr 25, 2006
Messages
19,885
Sorry Vladimir, I was out for some days.

I understand your point, and accept it. I have, however, a different opinion on this subject.

Having look on VBA help of AscW() function we can find that:
The AscW function returns the Unicode character code

I can't argue with that. If Microsoft uses it, you certainly can use it too.

All I can say is that I don't agree with it. From what I see, vba strings are simply strings with 16bit characters. If your vba string is to be interpreted with the Unicode standard then it is a Unicode string. The same vba string could, however, be interpreted using any other standard, for ex., a 16bit standard for a far east language.

To me, to say that a string is a Unicode string just because its characters have 16 bits is like saying that a string is a ANSI string just because its characters have 8 bits.
 

Excel Facts

Excel Can Read to You
Customize Quick Access Toolbar. From All Commands, add Speak Cells or Speak Cells on Enter to QAT. Select cells. Press Speak Cells.

ZVI

MrExcel MVP
Joined
Apr 9, 2008
Messages
3,847
Office Version
  1. 2019
  2. 2016
  3. 2010
Platform
  1. Windows
Agree that we've faced here with terminology contradiction.

But in my opinion, to solve such contradiction we can consider that Unicode symbol is unequivocally identified by its main properties:
1. Assignment of Unicode symbol
2. Shape(s) of Unicode symbol
3. Character, identified by the code (2 or more bytes)

In this case the Unicode character is just one of the properties of the Unicode symbol.
Using Unicode character property we should remember that in case of not all properties are taken into account then it is not the Unicode symbol - just its property.

Regards,
Vladimir
 

pgc01

MrExcel MVP
Joined
Apr 25, 2006
Messages
19,885
Wow, I see this seems to be one of those never ending discussions :). One of these days we'll have to put a stop to this.


Anyway not even your point 3 is respected.

You can do a simple test. Let's use the character that Jubinell posted, "��". This is a Unicode string with 1 only Unicode character. This is one of the Unicode characters defined with a surrogate pair, in this case the pair D840 DDA2.

Now copy this character to the cell A1

We have now in cell A1 a Unicode string with 1 character.

Let's test it (using the immediate window).

Test 1 - the length

?Len(Range("A1").Value)
2

Now you say that vba strings are Unicode strings and, as you say in your point 3, Unicode characters may have 2 or more bytes. As you can see vba does not recognise it.

I say that vba strings are just sequences of 16bit codes. Since we have in this case 2 16bit codes we get the answer 2, it's OK

Test 2 - the code of the Unicode character

? Hex(AscW(Range("A1").Value))
D840

You say that vba strings are Unicode strings. Where is the rest of the code of this character? As you can see vba does not recognise it.

I say that vba strings are just sequences of 16bit codes. Since we have in this case 2 16bit codes we get the code of the first one, as expected since when you use Asc or AscW with a string with several characters you get the code of the first one. It's OK.

Test 3 - Let's try Mid() to get a character that does not exist(?)

? Hex(AscW(Mid(Range("A1").Value, 2, 1)))
DDA2

You say that vba strings are Unicode strings. How is it possible to use Mid to get the second Unicode character of a string that has only 1?

I say that vba strings are just sequences of 16bit codes. Since we have in this case 2 16bit codes we can use Mid to get the second 16bit code, and get it's value.

Test 4 - Let's try a test in the worksheet

In B1: =Len(A1) -> result: 2

Not even in the worksheet the string is recognised as being a Unicode string with 1 character.

My conclusion: a vba string is just a sequence of characters encoded with 16 bits.
 

ZVI

MrExcel MVP
Joined
Apr 9, 2008
Messages
3,847
Office Version
  1. 2019
  2. 2016
  3. 2010
Platform
  1. Windows
Wow, I see this seems to be one of those never ending discussions :). One of these days we'll have to put a stop to this.
...
My conclusion: a VBA string is just a sequence of characters encoded with 16 bits.
Hi Pedro,
Ok, let it be as is :)
And thank you very much for the growing of my knowledge of Unicode standard.


Jubinell,
Hope that it is not too late - see more details below.

To insert Unicode symbols you can use Microsoft Global Input Method Editors (IMEs).
There is also IME for Office XP which I've successfully tested with Excel 2003 as well (but Excel 2002 XP is installed on my PC).

To insert into a cell the 1st 2-Bytes Unicode symbol 人 , mentioned in your post#11, use Excel menu:
Insert / Symbol / Font = Arial Unicode MS / Character code = 4EBA / from = Unicode(hex) / Insert button

For the 2nd 2-Bytes Unicode symbol 亻 do the same as above but with Character code = 4EBB

For the 3d 4-Bytes symbol ��:
1. Apply Arial Unicode MS font to the cell
2. Select 2 positions of the 3d symbol in the table, mentioned in your post#11
3. Copy & paste the selection into Excel cell - it will be shown as the single character in a cell.

There are additional user defined functions to convert Hex code into Unicode symbol and vice versa.
Two and four bytes Unicode symbols are supported.
You can pass to the Hex2Uni(HexCode) function the Hex code of the Unicode Lookup table (refer to your post#4).
Prefix "0x" of the Hex code is ignored by this function – you can copy Hex code from Unicode Lookup table with that prefix.

Rich (BB code):

Function Uni2Hex(Txt As String) As String
  Dim b() As Byte, i&, j&
  b() = Trim(Txt)
  j = UBound(b)
  For i = 0 To j Step 2
    If i < j Then Uni2Hex = Uni2Hex & Format(Hex(b(i + 1)), "00")
    Uni2Hex = Uni2Hex & Format(Hex(b(i)), "00")
  Next
End Function

Function Hex2Uni(HexCode As String) As String
  Dim b() As Byte, i&, j&, s$
  s = Replace(HexCode, " ", "")
  s = Replace(s, "0x", "")
  j = Len(s)
  If j <= 4 Then
    ReDim b(1 To 4)
    b() = ChrW("&H" & s)
  Else
    ReDim b(1 To 8)
    b() = ChrW("&H" & Mid$(s, 1, j - 4)) & ChrW("&H" & Mid$(s, j - 3))
  End If
  Hex2Uni = b()
End Function

Regards,
Vladimir
 

Jubinell

Board Regular
Joined
Jan 17, 2008
Messages
166

ADVERTISEMENT

Thank you for the help Vlad. I've moved past this and ran into other hair-pulling problems, but I will go back and implement your methods when I have time.

Thank you to both of you for offering interesting insights to this problem.

Regarding Unicode...well I've always thought that it's a good thing to have a Universal standard. But the more I deal with it in the Japanese context, the more I sense that it's more like a cut-and-paste solution to a problem that requires more of a brand new design. Reading more about its developmenet history leads me to think that it's more of an attempt of the American business interests to hastily mend the cultural/linguistic gap between the West of the rest of the world, in order to sell more products of the former without regards to the cultural integrity of the latter. The ones who suffer the most are the countries with the more complex script but less powerful political standing to make sure the representation thereof is dignified enough by their standard.

That is, yes, off the topic. So please excuse me.
 

pgc01

MrExcel MVP
Joined
Apr 25, 2006
Messages
19,885
Hi Vladimir

Thank you. I'm not a specialist in Unicode and I learned a lot from our exchange. I did not know about the IME's, I'll check your links.

Jubinell

I've read other opinions expressing that Unicode has not an adequate design, as far as some languages that use ideographic characters are concerned. Let's hope that in time a future standard (for ex. a 32bit standard ) will emerge with a more structured and adequate design of the character map.
 

earthasa

New Member
Joined
Nov 3, 2011
Messages
16

ADVERTISEMENT

I know this is an old post, but it's highlighted on Google, so I'd like to point out that Vladimir's version has an oversight: Format() can only format numbers in decimal format. So Hex2Uni will return incorrect results for bytes from A-F, as Format will see them as text instead of numbers and not pad them. Unicode 0A0A will come full circle as AA (=00AA), for example. Also, the use of Trim() in Uni2Hex prevents the function from returning the Unicode value for space.

Here are versions that will similarly handle surrogate pairs; in fact, I just have them convert the entire string to a hex sequence and vice-versa:
Code:
Function Uni2Hex(Txt As String) As String
Dim n As Long
    For n = 1 To Len(Txt)
        Uni2Hex = Uni2Hex & Right("000" & Hex(AscW(Mid(Txt, n, 1))), 4)
    Next n
End Function

Function Hex2Uni(ByVal Txt As String) As String
Dim n As Long, l As Long
    Txt = Replace(Replace(Replace(LCase(Txt), "0x", ""), "&h", ""), " ", "") 'optional clean-up
    l = WorksheetFunction.Ceiling(Len(Txt), 4)
    If l > 0 Then
        Txt = Right("000" & Txt, l)
        For n = 1 To l Step 4
            Hex2Uni = Hex2Uni & ChrW(CInt("&h" & Mid(Txt, n, 4)))
        Next n
    End If
End Function
 
Last edited:

Ikester

New Member
Joined
Feb 10, 2013
Messages
17
I've been having some regex difficulties WRT Unicode characters and while I haven't quite solved my problem (will post new thread if I can't solve it), this thread has been most informative. Thanks to all!

Just one quibble:
I know this is an old post, but it's highlighted on Google, so I'd like to point out that Vladimir's version has an oversight: Format() can only format numbers in decimal format. So Hex2Uni will return incorrect results for bytes from A-F, as Format will see them as text instead of numbers and not pad them. Unicode 0A0A will come full circle as AA (=00AA), for example. Also, the use of Trim() in Uni2Hex prevents the function from returning the Unicode value for space.

Here are versions that will similarly handle surrogate pairs; in fact, I just have them convert the entire string to a hex sequence and vice-versa:
Rich (BB code):
Function Uni2Hex(Txt As String) As String
Dim n As Long
    For n = 1 To Len(Txt)
        Uni2Hex = Uni2Hex & Right("000" & Hex(AscW(Mid(Txt, n, 1))), 4)
    Next n
End Function

Function Hex2Uni(ByVal Txt As String) As String
Dim n As Long, l As Long
    Txt = Replace(Replace(Replace(LCase(Txt), "0x", ""), "&h", ""), " ", "") 'optional clean-up
    l = Application.WorksheetFunction.Ceiling(Len(Txt), 4)
    If l > 0 Then
        Txt = Right("000" & Txt, l)
        For n = 1 To l Step 4
            Hex2Uni = Hex2Uni & ChrW(CInt("&h" & Mid(Txt, n, 4)))
        Next n
    End If
End Function

I found the same thing WRT to the earlier version of a *Uni2Hex* function using a format(n,"00") structure and worked out a slight change to use the RIGHT() function however like this approach better (seems more straightforward). One problem with the Hex2Uni function above though; the Ceiling() function only works if preceded by "Application." (as marked in red above).
 

vsrawat

New Member
Joined
Nov 30, 2015
Messages
2
Happened to visit this thread found through Google.

Just dropping a line to thank you all for this wonderful thread and discussions.

I found all my answers to the questions I was puzzled with Unicode codes and characters in Excel 2010.

Just to add, since then MS has added functions in excel 2013, 2016
=UNICHAR(66) decimal
=UNICODE("B")

to do the same. However, as I am still on excel 2010, so I found your vba functions very useful.

Thanks to you all.
--
Rawat
 

Forum statistics

Threads
1,148,274
Messages
5,745,797
Members
423,974
Latest member
highvoltageacdc

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top