Validate Character Codepage / Font Entry

stevebor1

New Member
Joined
Apr 8, 2005
Messages
7
Hello,

I have an Excel workbook with 3 columns, English, Japanese, Chinese where the user enters text (basically translations) in each row.

I need to create a macro that validates the Codepage (font set) for each character used in each cell.

Cells under English should validate against the codepage 1252, Japanese - 932 and Chinese -936.

These cells cannot have mixed codepages within each language.

Can someone help with a macro to validate the codepage for each character in each cell that pertains to each column?

I think i need to read the text as a byte array then compare to the codepage for that character with the assigned codepage for that given language....
 

Excel Facts

Select a hidden cell
Somehide hide payroll data in column G? Press F5. Type G1. Enter. Look in formula bar while you arrow down through G.
I have not had experience with other languages in Excel, but do not see why you have a problem

Generally each *cell* in Excel can be formatted to use a particular *font*. It is quite difficult to format individual characters in a cell as a different font. If you need to make sure that a whole column is in a particular font, just select it and format accordingly.

Although there are (were) such things as CodePages for MSDOS operating system (eg. MS-DOS 6) these are not applicable within the Windows operating system, which uses Fonts. The MSDOS screen is what we get by running cmd.exe nowadays, where codepages (I guess) might still be valid.

Nowadays Fonts used are those set up for the machine - available to all applications that can use them. There is a small Windows application called "Character Map" (Start\Programs\Accessories\- or search for charmap.exe) that can be used to view the characters of each font.

Each font contains 256 "characters" (0 to 255) - some of which - such as Tab (Character 9) - do not show anything on screen. The standard alphanumeric 'visible' ones effectively start from character(32) = space, up to character(126) = ~.

The non-standard set of character numbers 127-255 *was* called the "extended character set" and used for things like putting graphic lines and foreign characters (etc) on to the MSDOS screen. This is not a "separate" entity in fonts nowadays.

Although this is the normal font setup, Windows allows us to make up our own font set (with the proper software) - that can be anything (if you look at the Windows "Wingdings" font in Character Map you will see that it uses the same 256 numbers but displays something different.

I wonder if the problems you are getting are related to typographical errors, rather than font problems.

Hope this helps.
 
Upvote 0
I will try to explain my need for this code.

My excel workbook is use for data entry.
The data captured is then exported using an application to an XML file.
The XML is then read by a custom application (UI) and the content (text) is displayed.

The custom application (UI) has seperate language sections. Each language section is assigned a CodePage based on the language selected and only that codepage is used. When the XML is read for that given language, it reads the XML and displays the content based on that CodePage.

------
My trouble is when mixed characters are entered in Excel for the assigned language, then exported to the XML. Since the application only displays characters available in the assigned CodePage, the other characters will be displayed as a "?".

----
The XLS has sheet 1 with 3 columns; English, SimpleChinese, Japanese.
Content is each row with their repected translations

Ray is the never-before-told...; “《灵魂歌王》这部音乐 ...; [雷]という音楽伝記シネマ

Sheet 2 has 2 columns; "Language" and "CodePage"
For each row, the language name and codepage assigned are entered.

English ; 1252
SimpleChinese; 936
Japanese; 932

* i will most likely change to 3 columns and in each column include all the Hex Value for each character in that codepage to referance against, but right now the Hex values are in a seperate txt file.

-----
What i am doing now:

I have put together a macro that gets the cell's content and gets the byte for each character

Sample code:

Dim txt, a, b, c As String
Dim bytes() As Byte
Dim i As Integer

' Display the bytes.
txt = "T"
bytes = txt

For i = LBound(bytes) To UBound(bytes)
a = Hex(bytes(i))
If Len(a) < 2 Then
a = Format(a, "00")
End If
If Len(a) = 1 Then
a = "0" & a
End If

i = i + 1

b = Hex(bytes(i))
If Len(a) < 2 Then
b = Format(b, "00")
End If
If Len(b) = 1 Then
b = "0" & b
End If

txt = txt & b & a


Next i
MsgBox (txt)

This will give me a value "0054". (Had to reverse it because how the bytes are stored in the the XLS)
After i get that character byte value, i need to do a lookup of that codepage that I am in, 1252 and verify that that value exists.

-------------
I hope im making things clearer :)

My goal; verify each charcter in the excel file that the codepage used matches with what's assign for that language, if not alert the user.

Last resort, assign a specific font to use for each cell. Lock the font so it can not be changed. if it cant be locked, alert user if it has changed from what it is supposed to be.

thx
 
Upvote 0
I just tried verifying the font of each character. This does not work. I have place a japanese character in a cell with Arial font. I can then format the cell (or select all the text in the cell) and change the font to Arial.

The japanses character remains there, however when i highlight the character, the font in excel says it is "Arial".

When i run the macro to check each letter, it also says the japanese character is "Arial":

For i = 0 To Len(c.Value)
MsgBox(c.Characters(i, 1).Font.name)

Next i

If i take that same character and copy into a word doc, it now gives me "MS Mincho" which i should have got in excel.

How can i get the actual "stored" font value, and not what is used for displayed in Excel? This is why I am trying to get the actual Hex Byte value instead of "font family" of each character.
 
Upvote 0
I think you are getting a bit tied up with code pages when you should be focusing on individual characters.

Whatever the system, font, or codepage, we have just 255 to deal with. We have already established that any one of the 255 can show anything (or nothing) on screen. Whatever the system the characters will be designated a number from 1 to 255 and converted to something on screen. So you should be looking at character "code numbers".

It may even be that the keyboard is not typing the expected character code. Run the Windows MSDOS command screen (cmd.exe) and type chcp .. Enter
On my system it reports "Active Code Page: 850". This is because I am in the UK and want to see a £ sign, not the # from the (default) US set. I am using an English keyboard too, so the £ sign is in its proper place there too. We can change this via Control Panel "Regional Language Options". You may like to experiment with the options there - you might even be able to temporarily changing your system to one of your other code pages.

You have established that Windows\Excel character codes do not match those you want. In the "olden days" of MSDOS this was not unusual because there was no proper standard. In a way there still isn't, as you are discovering, and we can be grateful to Microsoft because lots of people use it. HTML is another story. Here are some conversion tables for 2 of the main ones ASCII and ANSI, and I found 1252. You should be able to find a comparable table for each of your codepages. You may be able to find a Windows font with characters that match the ones you want. You could make your own Windows font with the correct software and amending a similar existing one.

http://www.alanwood.net/demos/ansi.html
http://www.ascii.cl/htmlcodes.htm
http://www.science.co.il/language/Character-Code.asp?s=1252

As I see it, what you are needing is your own conversion table to match the Microsoft set with your code pages. Once you have your text set up in Excel, or whatever, you then need to do a find/replace to convert the characters that do not match. So 'garbage' appears in Excel - but works out OK in the final result.
To find out the code of a character, copy paste it from a document into an Excel cell, and in another cell put the formula like =CODE(A1). You may even be able to do it with a complete set to make up your conversion table.

Try the code below.
Code:
 '=============================================================================
 '- MAKE A FONT CHARACTERS LIST
 '- Can copy paste the result into another column
 '- ... & change its font to see differences
 '=============================================================================
 Sub Show_Characters()
     For c = 1 To 255
         Cells(c, 1).Value = c
         Cells(c, 2).Value = Chr(c)
     Next
 End Sub
 '=============================================================================
Of course, in my ignorance things may not be as simple as this - but I hope this helps.
If you do find a solution please add it to the message thread.
 
Upvote 0

Forum statistics

Threads
1,214,985
Messages
6,122,606
Members
449,089
Latest member
Motoracer88

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top