This is the
talk page for discussing improvements to the
Unicode article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: Index, 1, 2, 3, 4, 5, 6, 7Auto-archiving period: 730 days |
This
level-5 vital article is rated B-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||||||||||||
|
Text and/or other creative content from this version of Unicode was copied or moved into incubator:Wp/nod/ᩀᩪᨶᩥᨣᩰ᩠ᨯ with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
A security advisory has been recently released from two researchers, one from the University of Cambridge and the other from the same and from the University of Edinburgh, in which they assert that carefully crafted computer source code can be used to introduce vulnerabilities in apparently harmless programs. Some security groups (like the one for Rust language) are already taking measures and issuing their own security advisories.
I think that is something that affects Unicode as source code is one of the main applications of the standard. What do ye think would be a good way to introduce that to the article?
Bruno Unna ( talk) 12:02, 1 November 2021 (UTC)
References
سلام M.h.gholamii ( talk) 19:24, 14 July 2022 (UTC)
ko M.h.gholamii ( talk) 19:24, 14 July 2022 (UTC)
I was designing text shapes for electrical symbols and electronic elements. I design them on the Unicode-encoded FontCreator program, but after exporting it and copying and pasting the symbol I designed into the phone programs, it does not work and appears in the form of a question mark, what is the solution? (Note this topic is important for articles development, I want to design different symbols for non-electrical shapes and not only in the field of electricity and I don't want them to be thumbnails but text). Mohmad Abdul sahib talk☎ talk 18:15, 18 April 2022 (UTC)
I am adding new blocks & data to Wikidata now. Assuming no DAB needed here, the pages are:
DePiep ( talk) 16:10, 13 September 2022 (UTC)
A proposal is opened at WP:COMP § Taskforce WP Unicode –_proposal. Please take a look. DePiep ( talk) 09:35, 2 October 2022 (UTC)
The lead claims that there are currently 149 186 characters in the Standard. That's confusing! Is that actual characters or does it include unprintable code points? I know what a code point is, my point is that the lead shouldn't confuse code points with characters. (I also argue that a "control character" isn't 'really' a character, not a grapheme, but that's a fight for somewhere else.) Writing about Unicode without an early clear explanation of what a code point is, is -I think- awful pedagogy. In fact, I don't think code point - a fundamental aspect of Unicode - is even defined in the article!!!! Wow, just wow.
I also would like someone to verify that Unicode has characters for color. I believe that's wrong/false/misleading. I am aware that certain emoji can be modified by a code point to change some of its color. As far as I know, this is only true with a very small set of code points, and a very very small set of colors (I don't actually know if the colors are well-defined, I'd expect so, but...). These aren't colors, but are color modifiers for those other code points. 174.130.71.156 ( talk) 16:00, 13 December 2022 (UTC)
The offending sentence is:"The Unicode standard defines three and several other encodings exist, all in practice variable-length encodings." (Sure, you could strain to interpret that to mean "all but UTF-32", but let's keep it clear. It clearly implies all encodings are variable length. Wikipedia's own article on UTF-32 says it is fixed length. (Because it only needs to use 21 of the 32 bits for Unicode code points, it is very inefficient (and rarely used, afaik). But rarely used is not the same as "doesn't exist", and "all are variable" clearly implies it doesn't exist. I'd have to look again, are there really 3 variable Unicode encodings? I can only think of UTF-8 and UTF-16. (and some others that afaik are not "defined" in the Unicode standard (like GB18030), or that are obsolete (like UTF-7).) Replace "all" with "all common encodings" or something similar, and mention UTF-32. 174.130.71.156 ( talk) 11:43, 15 December 2022 (UTC)
I object to the reversal by Peter M. Brown, citing WP:ITALICTITLE inappropriately. I'd say that the name, a noun, should not be in italics.
ITALICTITLE referst to the name of a work, ie the work itself (play, periodic, book). However, the Unicode standard is a standard, not a book &tc. not even it's publication. The Standard is abstraction: the set of rules. It is a proper noun full stop. Key is, the article title notes the subject: the standard not the book. DePiep ( talk) 17:04, 21 April 2023 (UTC)
I don't know if it would be manageable, but Unicode clearly does not have all commonly used symbols. A simple example is the very commonly used 'slash marks' used to count. Most reading this will be familiar with the sequence /, //, ///, ////, and //// with the crossmark (strike-through) diagonal (top left to bottom right) rather than horizontal. (This is typical in the USA, I understand European convention is slightly different). I request the editors to consider the addition of a list of missing (but documented) symbols.
40.142.183.146 (
talk) 11:49, 9 June 2023 (UTC)
Unicode 16 is set to release in September 2024. I think the following (con)scripts definitely need to be encoded:
94.180.80.9 ( talk) 07:31, 9 July 2023 (UTC)
@
Spitzak: In the text for example, ḗ (precomposed e with macron and acute above) and ḗ (e followed by the combining macron above and combining acute above) should be rendered identically,
the "e" is followed by two distinct combining characters, but they are rendered at a single location. I inserted a space to cause them to display as two separate characters, and
Spitzak reverted the change with the comment They are supposed to be combined
. In context, I don't understand how it makes sense to combine them, since the text refers to them individually. --
Shmuel (Seymour J.) Metz Username:Chatul (
talk) 21:40, 16 October 2023 (UTC)
for example, ḗ (precomposed e with macron and acute above) and e followed by the combining macron above and combining acute above should be rendered identically,. Alternatively,
for example, ḗ (precomposed e with macron and acute above) and eōó (e followed by the combining macron above and combining acute above) should be rendered identically,. -- Shmuel (Seymour J.) Metz Username:Chatul ( talk) 19:25, 18 October 2023 (UTC)
Welcome, I want the Kurdistan flag on my keyboard 85.94.240.91 ( talk) 23:28, 2 November 2023 (UTC)
@ Spitzak, I'm also really not sure what you're talking about exactly—Microsoft seems to have the definition of "Unicode" in line with that of the rest of the world. [1] If they use "Unicode" as a shorthand for "UTF-16" sometimes (the way many people use it as a shorthand for "UTF-8", then the page I just linked seems to do any theoretical disambiguation work, and doesn't really leave us wondering whether they're somehow creating an ambiguity problem for us to solve. Remsense 诉 02:28, 8 March 2024 (UTC)
isTextUnicode
which returns false for UTF-8. There are a number of other examples where "Unicode" means the 16-bit interface.
Spitzak (
talk) 06:19, 8 March 2024 (UTC)In Microsoft windows, the Unicode support is limited to UTF-16.-- Shmuel (Seymour J.) Metz Username:Chatul ( talk) 15:47, 8 March 2024 (UTC)
This is the
talk page for discussing improvements to the
Unicode article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: Index, 1, 2, 3, 4, 5, 6, 7Auto-archiving period: 730 days |
This
level-5 vital article is rated B-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||||||||||||
|
Text and/or other creative content from this version of Unicode was copied or moved into incubator:Wp/nod/ᩀᩪᨶᩥᨣᩰ᩠ᨯ with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
A security advisory has been recently released from two researchers, one from the University of Cambridge and the other from the same and from the University of Edinburgh, in which they assert that carefully crafted computer source code can be used to introduce vulnerabilities in apparently harmless programs. Some security groups (like the one for Rust language) are already taking measures and issuing their own security advisories.
I think that is something that affects Unicode as source code is one of the main applications of the standard. What do ye think would be a good way to introduce that to the article?
Bruno Unna ( talk) 12:02, 1 November 2021 (UTC)
References
سلام M.h.gholamii ( talk) 19:24, 14 July 2022 (UTC)
ko M.h.gholamii ( talk) 19:24, 14 July 2022 (UTC)
I was designing text shapes for electrical symbols and electronic elements. I design them on the Unicode-encoded FontCreator program, but after exporting it and copying and pasting the symbol I designed into the phone programs, it does not work and appears in the form of a question mark, what is the solution? (Note this topic is important for articles development, I want to design different symbols for non-electrical shapes and not only in the field of electricity and I don't want them to be thumbnails but text). Mohmad Abdul sahib talk☎ talk 18:15, 18 April 2022 (UTC)
I am adding new blocks & data to Wikidata now. Assuming no DAB needed here, the pages are:
DePiep ( talk) 16:10, 13 September 2022 (UTC)
A proposal is opened at WP:COMP § Taskforce WP Unicode –_proposal. Please take a look. DePiep ( talk) 09:35, 2 October 2022 (UTC)
The lead claims that there are currently 149 186 characters in the Standard. That's confusing! Is that actual characters or does it include unprintable code points? I know what a code point is, my point is that the lead shouldn't confuse code points with characters. (I also argue that a "control character" isn't 'really' a character, not a grapheme, but that's a fight for somewhere else.) Writing about Unicode without an early clear explanation of what a code point is, is -I think- awful pedagogy. In fact, I don't think code point - a fundamental aspect of Unicode - is even defined in the article!!!! Wow, just wow.
I also would like someone to verify that Unicode has characters for color. I believe that's wrong/false/misleading. I am aware that certain emoji can be modified by a code point to change some of its color. As far as I know, this is only true with a very small set of code points, and a very very small set of colors (I don't actually know if the colors are well-defined, I'd expect so, but...). These aren't colors, but are color modifiers for those other code points. 174.130.71.156 ( talk) 16:00, 13 December 2022 (UTC)
The offending sentence is:"The Unicode standard defines three and several other encodings exist, all in practice variable-length encodings." (Sure, you could strain to interpret that to mean "all but UTF-32", but let's keep it clear. It clearly implies all encodings are variable length. Wikipedia's own article on UTF-32 says it is fixed length. (Because it only needs to use 21 of the 32 bits for Unicode code points, it is very inefficient (and rarely used, afaik). But rarely used is not the same as "doesn't exist", and "all are variable" clearly implies it doesn't exist. I'd have to look again, are there really 3 variable Unicode encodings? I can only think of UTF-8 and UTF-16. (and some others that afaik are not "defined" in the Unicode standard (like GB18030), or that are obsolete (like UTF-7).) Replace "all" with "all common encodings" or something similar, and mention UTF-32. 174.130.71.156 ( talk) 11:43, 15 December 2022 (UTC)
I object to the reversal by Peter M. Brown, citing WP:ITALICTITLE inappropriately. I'd say that the name, a noun, should not be in italics.
ITALICTITLE referst to the name of a work, ie the work itself (play, periodic, book). However, the Unicode standard is a standard, not a book &tc. not even it's publication. The Standard is abstraction: the set of rules. It is a proper noun full stop. Key is, the article title notes the subject: the standard not the book. DePiep ( talk) 17:04, 21 April 2023 (UTC)
I don't know if it would be manageable, but Unicode clearly does not have all commonly used symbols. A simple example is the very commonly used 'slash marks' used to count. Most reading this will be familiar with the sequence /, //, ///, ////, and //// with the crossmark (strike-through) diagonal (top left to bottom right) rather than horizontal. (This is typical in the USA, I understand European convention is slightly different). I request the editors to consider the addition of a list of missing (but documented) symbols.
40.142.183.146 (
talk) 11:49, 9 June 2023 (UTC)
Unicode 16 is set to release in September 2024. I think the following (con)scripts definitely need to be encoded:
94.180.80.9 ( talk) 07:31, 9 July 2023 (UTC)
@
Spitzak: In the text for example, ḗ (precomposed e with macron and acute above) and ḗ (e followed by the combining macron above and combining acute above) should be rendered identically,
the "e" is followed by two distinct combining characters, but they are rendered at a single location. I inserted a space to cause them to display as two separate characters, and
Spitzak reverted the change with the comment They are supposed to be combined
. In context, I don't understand how it makes sense to combine them, since the text refers to them individually. --
Shmuel (Seymour J.) Metz Username:Chatul (
talk) 21:40, 16 October 2023 (UTC)
for example, ḗ (precomposed e with macron and acute above) and e followed by the combining macron above and combining acute above should be rendered identically,. Alternatively,
for example, ḗ (precomposed e with macron and acute above) and eōó (e followed by the combining macron above and combining acute above) should be rendered identically,. -- Shmuel (Seymour J.) Metz Username:Chatul ( talk) 19:25, 18 October 2023 (UTC)
Welcome, I want the Kurdistan flag on my keyboard 85.94.240.91 ( talk) 23:28, 2 November 2023 (UTC)
@ Spitzak, I'm also really not sure what you're talking about exactly—Microsoft seems to have the definition of "Unicode" in line with that of the rest of the world. [1] If they use "Unicode" as a shorthand for "UTF-16" sometimes (the way many people use it as a shorthand for "UTF-8", then the page I just linked seems to do any theoretical disambiguation work, and doesn't really leave us wondering whether they're somehow creating an ambiguity problem for us to solve. Remsense 诉 02:28, 8 March 2024 (UTC)
isTextUnicode
which returns false for UTF-8. There are a number of other examples where "Unicode" means the 16-bit interface.
Spitzak (
talk) 06:19, 8 March 2024 (UTC)In Microsoft windows, the Unicode support is limited to UTF-16.-- Shmuel (Seymour J.) Metz Username:Chatul ( talk) 15:47, 8 March 2024 (UTC)