This is a result of long
discussion and research of MultiByte encoding with Olga L. this morning. Keep in mind the following
information when processing strings when performing localization to Chinese:
- Multibyte is used in HyperLynx for localized strings.
- Multibyte is not related to wide chars (wchar_t, Utf-16) at all, even having 2 bytes per character.
- Multibyte is not related to Utf-8.
- In Visual Studio debugger you always see Multibyte characters in case Chinese Simplified locale is selected in Control Panel
- Multibyte (MBCS, DBCS) is the same as CodePage 936 or GB2312 in case Chinese Simplified locale is selected in Control Panel
- getchar's _(“Two beer or not to be”) returns Multibyte string.
- “tchar.h” routines like _tcsclen, _tcsncpy, etc. deal with Multibyte strings
- .po files are written in UTF-8 and converted to Multibyte on loading
- .rc files resources are written in Win1251
- Chinese .zh-CN.rc resources are written in CP936
- Some of MFC Windows GUI accepts Multibyte, some only accept ANSI or wchar_t *
For example, take character sequence U+963F
- Appearance: 阿
- Unicode Block: CJK Unified Ideographs (
4E00-9FFF
) - Unicode Code Point, Decimal: 38463
- Unicode Code Point, Hexadecimal: U+963F
- HTML Character Entity, Decimal:
阿
- HTML Character Entity, Hexadecimal:
阿
- Keystroke, Windows: Alt+038463
- Keystroke, Macintosh, Unicode Hex Input: Option-963F
- CP936 Encoding:
B0
A2
- UTF-8 Encoding:
E9
98
BF
- UTF-16BE Encoding:
96
3F
- UTF-16LE Encoding:
3F
96
- UTF-32BE Encoding:
00
00
96
3F
- UTF-32LE Encoding:
3F
96
00
00
No comments:
Post a Comment