![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||
|
now we have an understandable explanation in english from sun anyone fancy expanding this article? Plugwash 17:43, 22 January 2006 (UTC)
One thing to beware of in using the Sun article - it has an error where it mentions Unicode 2.1 instead of 1.1 as basis for GBK. I checked elsewhere at length, and the Wikipedia GBK page is actually correct. BTW I did the recent updates to GBK, GB2312 and GB18030. -- Richard Donkin 06:39, 27 January 2006 (UTC)
according to http://www.sac.gov.cn/, the 2005 version of this standard was released at Nov. 8,2005, effective May 1, 2006.-- Skyfiler 22:30, 23 February 2006 (UTC)
>which is easily sufficient to cover Unicode's 1,114,112 (17*65536) code points.
this counted the 2048 surrogate points which don't need to be encoded, e.g. un UTF-16 you can't encode U+D800 as this isn't a valid real code point:
where does this figure come from. I undestand that UTF-16 can encode all code points but it only covers 1,112,064:
BMP is 0x10000 - (0xE000 - 0xD800), i.e. don't count surrogate code points = 0xF800 (63488) The rest have 20 bits in UTF-16 four bytes sequences (0x1000 is subtracted) = 2 ^ 20 = 1048576 (0x100000) 0xF800 + 0x100000 = 0x10F800 (1,112,064)
It appears that the mapping table, while probably based on more or less official mappings, refers to an older version of Unicode, and, taking the section further up on the page into account, perhaps also of GB18030, and lacks several mappings that have now become available. Notably, it maps several characters present in GB18030 to characters in Unicode's Private Use Area, although according to Kenneth Whistler of Unicode, Inc, these characters were already mapped in Unicode 4.1. The same appears true of the mapping table included with my copy of Ubuntu Linux, which may or may not be the same table.
It would appear that no up-to-date table is available in the public domain, however, so this may be the most up-to-date table that's available. At any rate, I think we should keep our eyes open in case a more recent table surfaces. Rōnin 20:07, 21 February 2007 (UTC)
I have just created a table at GB 18030#PUA (moved from my user page), with some generally acceptable references. -- Artoria 2e5 emits crap 06:49, 11 September 2016 (UTC)
Can we see a section or subsection on compatibility with other formats, or put the information in the lede? It sounds like the code page is compatible with Unicode, but I don't know for sure if what I'm reading is worded such that other information I'm not seeing would contradict it in some situations. It would be much less confusing if it were stated exactly where all the relevant information will be. ᛭ LokiClock ( talk) 12:29, 14 March 2011 (UTC)
Not sure about on the OS side, but Chinese sites (Sina, Taobao to name a few) have recently been moving towards UTF-8, usually from GB 2312. -- 207.38.206.45 ( talk) 00:25, 26 December 2015 (UTC)
I'm not an expert, but digging on the web, and the box to the right side of the page in this article, suggests the text in the History section uses the value 1E37 when it should be 1E3F. A GB18030 expert should check this and fix it. — Preceding unsigned comment added by 72.179.1.38 ( talk) 05:22, 3 June 2018 (UTC) Ooops.... I'll swear it said 1E37 a few minutes ago in the History section. Now I only see 1E3F, so it appears to be corrected.
To answer my question, I know it will work with all, since it covers all of Unicode, and no, it's not actually commonly used in Japan, but is it as or more space-efficient, as the other encodings used in Asia? It seems better than UTF-16 at least for those languages, for efficiency. UTF-8 isn't bad either (assuming e.g. some mixed in ASCII, for e.g. HTML, but so is this format). comp.arch ( talk) 15:17, 9 September 2021 (UTC)
What exactly is this supposed to tell us? Full support for UTF-8, for example, mandates supporting all (valid) code points outside the BMP, so the PRC deciding to mandate support for certain ones is not more “catastrophic” than UTF-8 support (and UTF-8 had already existed for several years befor the publication of GB 18030). So in what respect was this a “move of historic significance”? Was there something that did not let developers “get away” with supporting only BMP code points in the PRC, while not fully supporting UTF-8 was fine? -- 2A02:8108:50BF:C694:C133:96B0:222E:BF9A ( talk) 21:15, 18 June 2022 (UTC)
[|SO10646]g 2400:AC40:A47:874A:8D1A:55BB:9EB8:C8D2 ( talk) 10:19, 23 August 2022 (UTC)
Japnis Hhdhs Hdhdh 103.143.89.226 ( talk) 05:56, 26 June 2023 (UTC)
In light of this recent update, it might be interesting to know, and clarify in the article, to what extent GB 18030 continues to be used in parallel or in preference to UTF-8. Earlier on there was some suggestion Chinese websites were moving to UTF-8, but what about most Chinese hosts (PCs, tablets, smartphones, etc.). Do most Chinese users use UTF-8 or GB 18030? If the former, then exactly what is the continued relevance of GB 18030? If the latter, then how does most of China using GB 18030 work in practice, given that most of the world seems to be moving to UTF-8? ReadOnlyAccount ( talk) 07:54, 15 October 2023 (UTC)
![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||
|
now we have an understandable explanation in english from sun anyone fancy expanding this article? Plugwash 17:43, 22 January 2006 (UTC)
One thing to beware of in using the Sun article - it has an error where it mentions Unicode 2.1 instead of 1.1 as basis for GBK. I checked elsewhere at length, and the Wikipedia GBK page is actually correct. BTW I did the recent updates to GBK, GB2312 and GB18030. -- Richard Donkin 06:39, 27 January 2006 (UTC)
according to http://www.sac.gov.cn/, the 2005 version of this standard was released at Nov. 8,2005, effective May 1, 2006.-- Skyfiler 22:30, 23 February 2006 (UTC)
>which is easily sufficient to cover Unicode's 1,114,112 (17*65536) code points.
this counted the 2048 surrogate points which don't need to be encoded, e.g. un UTF-16 you can't encode U+D800 as this isn't a valid real code point:
where does this figure come from. I undestand that UTF-16 can encode all code points but it only covers 1,112,064:
BMP is 0x10000 - (0xE000 - 0xD800), i.e. don't count surrogate code points = 0xF800 (63488) The rest have 20 bits in UTF-16 four bytes sequences (0x1000 is subtracted) = 2 ^ 20 = 1048576 (0x100000) 0xF800 + 0x100000 = 0x10F800 (1,112,064)
It appears that the mapping table, while probably based on more or less official mappings, refers to an older version of Unicode, and, taking the section further up on the page into account, perhaps also of GB18030, and lacks several mappings that have now become available. Notably, it maps several characters present in GB18030 to characters in Unicode's Private Use Area, although according to Kenneth Whistler of Unicode, Inc, these characters were already mapped in Unicode 4.1. The same appears true of the mapping table included with my copy of Ubuntu Linux, which may or may not be the same table.
It would appear that no up-to-date table is available in the public domain, however, so this may be the most up-to-date table that's available. At any rate, I think we should keep our eyes open in case a more recent table surfaces. Rōnin 20:07, 21 February 2007 (UTC)
I have just created a table at GB 18030#PUA (moved from my user page), with some generally acceptable references. -- Artoria 2e5 emits crap 06:49, 11 September 2016 (UTC)
Can we see a section or subsection on compatibility with other formats, or put the information in the lede? It sounds like the code page is compatible with Unicode, but I don't know for sure if what I'm reading is worded such that other information I'm not seeing would contradict it in some situations. It would be much less confusing if it were stated exactly where all the relevant information will be. ᛭ LokiClock ( talk) 12:29, 14 March 2011 (UTC)
Not sure about on the OS side, but Chinese sites (Sina, Taobao to name a few) have recently been moving towards UTF-8, usually from GB 2312. -- 207.38.206.45 ( talk) 00:25, 26 December 2015 (UTC)
I'm not an expert, but digging on the web, and the box to the right side of the page in this article, suggests the text in the History section uses the value 1E37 when it should be 1E3F. A GB18030 expert should check this and fix it. — Preceding unsigned comment added by 72.179.1.38 ( talk) 05:22, 3 June 2018 (UTC) Ooops.... I'll swear it said 1E37 a few minutes ago in the History section. Now I only see 1E3F, so it appears to be corrected.
To answer my question, I know it will work with all, since it covers all of Unicode, and no, it's not actually commonly used in Japan, but is it as or more space-efficient, as the other encodings used in Asia? It seems better than UTF-16 at least for those languages, for efficiency. UTF-8 isn't bad either (assuming e.g. some mixed in ASCII, for e.g. HTML, but so is this format). comp.arch ( talk) 15:17, 9 September 2021 (UTC)
What exactly is this supposed to tell us? Full support for UTF-8, for example, mandates supporting all (valid) code points outside the BMP, so the PRC deciding to mandate support for certain ones is not more “catastrophic” than UTF-8 support (and UTF-8 had already existed for several years befor the publication of GB 18030). So in what respect was this a “move of historic significance”? Was there something that did not let developers “get away” with supporting only BMP code points in the PRC, while not fully supporting UTF-8 was fine? -- 2A02:8108:50BF:C694:C133:96B0:222E:BF9A ( talk) 21:15, 18 June 2022 (UTC)
[|SO10646]g 2400:AC40:A47:874A:8D1A:55BB:9EB8:C8D2 ( talk) 10:19, 23 August 2022 (UTC)
Japnis Hhdhs Hdhdh 103.143.89.226 ( talk) 05:56, 26 June 2023 (UTC)
In light of this recent update, it might be interesting to know, and clarify in the article, to what extent GB 18030 continues to be used in parallel or in preference to UTF-8. Earlier on there was some suggestion Chinese websites were moving to UTF-8, but what about most Chinese hosts (PCs, tablets, smartphones, etc.). Do most Chinese users use UTF-8 or GB 18030? If the former, then exactly what is the continued relevance of GB 18030? If the latter, then how does most of China using GB 18030 work in practice, given that most of the world seems to be moving to UTF-8? ReadOnlyAccount ( talk) 07:54, 15 October 2023 (UTC)