One Black Bada and his zocks: Why is UTF-8 treated as not multibyte?

It’s a long discussion and object of terminology. Multibyte is a slippery term and is not the best one.

Visual Studio has 3 options for characters sets:

a) No characters set, which means it works OK with single byte characters sets (SBCS) like CP1251 (ru-RU) or CP1252 (en-US)
characters take 1 byte

b) MBCS, which means it works OK with multibyte Character sets like CP936,
characters take 1 or 2 bytes, GUI accepts such characters if appropriate locale is selected in Control Panel

c) Unicode, which means working with UTF-16BE,
characters take 2 bytes, selected locale doesn’t make any sense

Note that there is no option to work with utf-8.

There are conversion functions utf8 <-> MBCS.

In Microsoft documentation term “multibyte” is related to MBCS. It was hard for me to achieve, and I suppose there could be misunderstanding among the team regarding this term.

Even having the same way of coding and having floating amount of bytes, utf8 is a way of encoding Unicode characters, it is not related to MBCS at all.

In order to have our virare and maina (it.) we’ve agreed to call MBCS multibyte

Denis

One Black Bada and his zocks

Thursday, September 5, 2013

Why is UTF-8 treated as not multibyte?

No comments:

Post a Comment