![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | → | Archive 5 |
This paragraph was added to the start of the article:
I removed it because it is inaccurate (It overplays Unicode-as-a-standard rather than Unicode as a consortium that produces lots of standards), confusing (its mention of ASCII is not clearly historical), and adds no information that isn't already in the article. I assume, however, that it was added because someone thought the existing first paragraph was unclear, so I'm open to suggestions about how to improve it. -- Lee Daniel Crocker 12:20, 9 November 2001 (UTC)
I think section 0 which currently reads could be improved by changing: In computing, Unicode is the international standard whose goal is to specify a code matching every character needed by every written human language, including many dead languages in small scholarly use, to a single unique integer number, called a code point.
To: In computing, Unicode is the international standard whose goal is to specify a code matching every character needed by every written human language, including many dead languages in small scholarly use such as foo and bar as well as [some other good example, perhaps a made-up language?], to a single unique integer number, called a code point.
I think the intro would be better by adding two examples there, furthermore i think is the international standard should be is an international standard, or has it been approved by a major authoraty as the standard? -- Ævar Arnfjörð Bjarmason 17:44, 2004 Oct 5 (UTC)
And, argueing about section 0, what about: in
internationalization of software. A Thai programmer writing a program with Thai user interface for Thai customers doesn't fit at all the definition of
internationalization. --
Pjacobi 20:30, 5 Oct 2004 (UTC)
The interesting thing for most people is that it provides a way to store text in any language in a computer. Starting off by mentioning "unique integer numbers" doesn't make Unicode easier to understand. Even as a computer programmer, I have a bit of trouble reading that sentence and understanding what it means. And it's not really true as given; characters in Unicode is a polite fiction. Many characters (Maltese "ie", Lakota p with bar above, many Khmer characters) are more then characters in Unicode-ese. Going to rewrite boldly. -- Prosfilaes 21:47, 11 Oct 2004 (UTC)
I can't believe that Unicode 5.0 doesn't have American Sign Language characters. Linguistic research during the past thirty years has demonstrated that American Sign Language (and indeed any of the world's indigenous sign languages) meets all of the requirements for human languages - it is a rule-governed, grammatical symbol system that changes over time and that members of a community share —Preceding unsigned comment added by 64.40.46.183 ( talk) 22:34, 19 September 2007 (UTC)
I am not a techie! Nevertheless I can see the usefulness of much of the material available in Unicode. Neither am I the sort of anti-techie that complains that anything in other than plain-Jane unaccented English alphabetical characters must be thrown out of Wikipedia, or that articles should not be displaying meaningless question marks. I was visiting the chess page, and someone there has made a valiant effort to produce diagrams of how the pieces move by using only ordinary keyboard characters. I'm sure that he would not take it as a sign of disrespect when I say that it looks like shit.
I'm sure that most of us would like to see the special symbols, letters, or chinese characters at the appropriate time and place. At the same time I understand that for many Wikipedians there are technical reasons which prevent their hardware from dealing with this material (eg. limited memory). Then there are others for whom only the appropriate software is missing. Even some of the people with hardware restrictions may be able to handle Greek or Russian, though probably not Chinese. In cases where I've tried to find the code, I've ended up wading through reams of technical discussions. These discussions may be very interesting, but they don't provide a solution to my immediate problem.
The practical suggestion may be a notice at the head of any article containing symbols not in ISO 8859-1 saying in effect. "This article contains non-standard characters. You may download these characters by activating this LINK". Eclecticology
In cases where I've tried to find the code, ...
What exactly were you looking for ? Do you have the Unicode value, and you're looking for a typical glyph (like a ASCII chart) ? Are you looking for the Unicode value ?
These discussions may be very interesting, but they don't provide a solution to my immediate problem.
What exactly is your "immediate problem" ?
Is there a reason to use '<code>foo</code> <code>bar</code> <code>baz</code> ...' instead of '<code>foo bar baz ...</code>'? -- Miciah
Isn't there a UTF-7? Or is an invetion of Microsoft (it's in .NET)? CGS 21:54, 16 Sep 2003 (UTC).
The oldest of Unicode's encodings is UTF-16, a variable-length encoding that uses either one or two 16-bit words, manifesting on most platforms as 2 or 4 8-bit bytes, for each character. {NB: This can't be true; UCS-2 has to predate UTF-16!}
66.44.102.169 wrote "{NB: This can't be true; UCS-2 has to predate UTF-16!}" in the article. UTF-16 was previously UCS-2 but I'm not sure that makes the statement untrue as such but I reworded it anyway. Angela .
Wasn't Unicode created to encode all languages - not just 'human' languages? In the future then, why couldn't Unicode conceivably be used to encode extraterrestrial languages as well?(well, why not? hehehehe) Therefore, shouldn't the 'human' be removed from this page? One possible alternative: Unicode is the international standard created, whose goal is to specify a code matching every character needed by every known written language to a single unique integer number, called a code point.
One of classicists' issues with Unicode has been the omission of the LATIN Y WITH MACRON characters. While the omission has been corrected in Unicode 3, most user agents don't know to render anything for that codepoint. Somewhere in that story is an issue that perhaps might make sense in the article -- either the omission of the letter, or the outdated support available by user agents (I don't see Microsoft rushing to update its fonts and packaging them as an update to Windows or Internet Explorer just to comply with recent standards).
"UNIX-like operating systems such as GNU/Linux, BSD and Mac OS X have adopted Unicode, more specifically UTF-8, as the basis of representation of multilingual text."
Mac OS X stores a lot of text in UTF-8, but the other UTF's are also supported throughout the system and widely used. I agree that UTf-8 is currently the most widely used Unicode encoding (because it is the most legacy-compatible encoding), and that is important enough to mention in the leading section, but perhaps it should be rephrased so that it doesn't mislead the reader into believing those OSes don't support other kinds of Unicode? — David Remahl 04:16, 9 Sep 2004 (UTC)
This phrase has appeared recently without much discussion. "Unicode is the most complete character set, and one of the largest." Could anyone give justification? -- Taku 06:14, Oct 12, 2004 (UTC)
I am convinced that probably Unicode is the largest and most complete character set but can we still ignore criticism on unicode? What I am often heard about unicode, it is not inadequate in handling old text or text containing outdated characters. Maybe most of criticism are pointless or a result of misunderstandings but I still hear them and I don't think we should make a general statement which not everyone agrees with. Unicode is meant to be the largest and the most complete but if it is really so is disputed, if such dispute is nonsense in actuallity. -- Taku 22:41, Oct 12, 2004 (UTC)
I made some reorganization of the sections and the continuing work on the leading section. I think the new 4 big sections make good sense: origin and development, mapping and encoding, process and issues and in use. In addition to this, we probably need:
If I have some time, I will try to address them but you can also help me. Finally, I'm sorry for late reply to unicode as largest and the most complete question. I slighly reworded the mention. Please make further edit if you think necessary. -- Taku 20:42, Oct 17, 2004 (UTC)
Thank you for giving the concrete example. Yes, I specifically asked for it. I apologize for replying in flame-war style. -- Pjacobi 14:41, 18 Oct 2004 (UTC)
Many documents in non-western languages, for instance, are still represented in other character sets. Which languages? Which character sets In this generality it doesn't help. Please state languages and character sets used. And remember, GB18030 is now fully harmonized with Unicode and cannot be considered a different character set, but Unicode encoding form standardized by somebody other than ISO or Unicode Org, namely the Guobiao. -- Pjacobi 22:23, 17 Oct 2004 (UTC)
It's fine. I was just puzzled about what upset you so much. As a matter of fact, I am neither the backer of unicode nor the detractor. I am only interested in making the article informative for those who have questions about unicode. It's very surprising that many people don't know well about unicode, even computer programmers. The article could be a help for them. -- Taku 15:54, Oct 23, 2004 (UTC)
In the section that talks about pre-composed characters vs. composing with several codepoints, how about mentioning that this capability opens up lots of opportunities for phishing once URLs are more universally excepted in UTF-8? For example, once accented characters are common in website addresses, links with a pre-composed "è" and separate "e" plus an accent will point to different sites, but look identical to the user (in fact the intent is for them to look the same). I don't know if this info belongs here, but it's an interesting tidbit. Rlobkovsky 00:06, 6 Dec 2004 (UTC) Insert non-formatted text here
-- Jordi· ✆ 12:26, 28 Dec 2004 (UTC)
![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | → | Archive 5 |
This paragraph was added to the start of the article:
I removed it because it is inaccurate (It overplays Unicode-as-a-standard rather than Unicode as a consortium that produces lots of standards), confusing (its mention of ASCII is not clearly historical), and adds no information that isn't already in the article. I assume, however, that it was added because someone thought the existing first paragraph was unclear, so I'm open to suggestions about how to improve it. -- Lee Daniel Crocker 12:20, 9 November 2001 (UTC)
I think section 0 which currently reads could be improved by changing: In computing, Unicode is the international standard whose goal is to specify a code matching every character needed by every written human language, including many dead languages in small scholarly use, to a single unique integer number, called a code point.
To: In computing, Unicode is the international standard whose goal is to specify a code matching every character needed by every written human language, including many dead languages in small scholarly use such as foo and bar as well as [some other good example, perhaps a made-up language?], to a single unique integer number, called a code point.
I think the intro would be better by adding two examples there, furthermore i think is the international standard should be is an international standard, or has it been approved by a major authoraty as the standard? -- Ævar Arnfjörð Bjarmason 17:44, 2004 Oct 5 (UTC)
And, argueing about section 0, what about: in
internationalization of software. A Thai programmer writing a program with Thai user interface for Thai customers doesn't fit at all the definition of
internationalization. --
Pjacobi 20:30, 5 Oct 2004 (UTC)
The interesting thing for most people is that it provides a way to store text in any language in a computer. Starting off by mentioning "unique integer numbers" doesn't make Unicode easier to understand. Even as a computer programmer, I have a bit of trouble reading that sentence and understanding what it means. And it's not really true as given; characters in Unicode is a polite fiction. Many characters (Maltese "ie", Lakota p with bar above, many Khmer characters) are more then characters in Unicode-ese. Going to rewrite boldly. -- Prosfilaes 21:47, 11 Oct 2004 (UTC)
I can't believe that Unicode 5.0 doesn't have American Sign Language characters. Linguistic research during the past thirty years has demonstrated that American Sign Language (and indeed any of the world's indigenous sign languages) meets all of the requirements for human languages - it is a rule-governed, grammatical symbol system that changes over time and that members of a community share —Preceding unsigned comment added by 64.40.46.183 ( talk) 22:34, 19 September 2007 (UTC)
I am not a techie! Nevertheless I can see the usefulness of much of the material available in Unicode. Neither am I the sort of anti-techie that complains that anything in other than plain-Jane unaccented English alphabetical characters must be thrown out of Wikipedia, or that articles should not be displaying meaningless question marks. I was visiting the chess page, and someone there has made a valiant effort to produce diagrams of how the pieces move by using only ordinary keyboard characters. I'm sure that he would not take it as a sign of disrespect when I say that it looks like shit.
I'm sure that most of us would like to see the special symbols, letters, or chinese characters at the appropriate time and place. At the same time I understand that for many Wikipedians there are technical reasons which prevent their hardware from dealing with this material (eg. limited memory). Then there are others for whom only the appropriate software is missing. Even some of the people with hardware restrictions may be able to handle Greek or Russian, though probably not Chinese. In cases where I've tried to find the code, I've ended up wading through reams of technical discussions. These discussions may be very interesting, but they don't provide a solution to my immediate problem.
The practical suggestion may be a notice at the head of any article containing symbols not in ISO 8859-1 saying in effect. "This article contains non-standard characters. You may download these characters by activating this LINK". Eclecticology
In cases where I've tried to find the code, ...
What exactly were you looking for ? Do you have the Unicode value, and you're looking for a typical glyph (like a ASCII chart) ? Are you looking for the Unicode value ?
These discussions may be very interesting, but they don't provide a solution to my immediate problem.
What exactly is your "immediate problem" ?
Is there a reason to use '<code>foo</code> <code>bar</code> <code>baz</code> ...' instead of '<code>foo bar baz ...</code>'? -- Miciah
Isn't there a UTF-7? Or is an invetion of Microsoft (it's in .NET)? CGS 21:54, 16 Sep 2003 (UTC).
The oldest of Unicode's encodings is UTF-16, a variable-length encoding that uses either one or two 16-bit words, manifesting on most platforms as 2 or 4 8-bit bytes, for each character. {NB: This can't be true; UCS-2 has to predate UTF-16!}
66.44.102.169 wrote "{NB: This can't be true; UCS-2 has to predate UTF-16!}" in the article. UTF-16 was previously UCS-2 but I'm not sure that makes the statement untrue as such but I reworded it anyway. Angela .
Wasn't Unicode created to encode all languages - not just 'human' languages? In the future then, why couldn't Unicode conceivably be used to encode extraterrestrial languages as well?(well, why not? hehehehe) Therefore, shouldn't the 'human' be removed from this page? One possible alternative: Unicode is the international standard created, whose goal is to specify a code matching every character needed by every known written language to a single unique integer number, called a code point.
One of classicists' issues with Unicode has been the omission of the LATIN Y WITH MACRON characters. While the omission has been corrected in Unicode 3, most user agents don't know to render anything for that codepoint. Somewhere in that story is an issue that perhaps might make sense in the article -- either the omission of the letter, or the outdated support available by user agents (I don't see Microsoft rushing to update its fonts and packaging them as an update to Windows or Internet Explorer just to comply with recent standards).
"UNIX-like operating systems such as GNU/Linux, BSD and Mac OS X have adopted Unicode, more specifically UTF-8, as the basis of representation of multilingual text."
Mac OS X stores a lot of text in UTF-8, but the other UTF's are also supported throughout the system and widely used. I agree that UTf-8 is currently the most widely used Unicode encoding (because it is the most legacy-compatible encoding), and that is important enough to mention in the leading section, but perhaps it should be rephrased so that it doesn't mislead the reader into believing those OSes don't support other kinds of Unicode? — David Remahl 04:16, 9 Sep 2004 (UTC)
This phrase has appeared recently without much discussion. "Unicode is the most complete character set, and one of the largest." Could anyone give justification? -- Taku 06:14, Oct 12, 2004 (UTC)
I am convinced that probably Unicode is the largest and most complete character set but can we still ignore criticism on unicode? What I am often heard about unicode, it is not inadequate in handling old text or text containing outdated characters. Maybe most of criticism are pointless or a result of misunderstandings but I still hear them and I don't think we should make a general statement which not everyone agrees with. Unicode is meant to be the largest and the most complete but if it is really so is disputed, if such dispute is nonsense in actuallity. -- Taku 22:41, Oct 12, 2004 (UTC)
I made some reorganization of the sections and the continuing work on the leading section. I think the new 4 big sections make good sense: origin and development, mapping and encoding, process and issues and in use. In addition to this, we probably need:
If I have some time, I will try to address them but you can also help me. Finally, I'm sorry for late reply to unicode as largest and the most complete question. I slighly reworded the mention. Please make further edit if you think necessary. -- Taku 20:42, Oct 17, 2004 (UTC)
Thank you for giving the concrete example. Yes, I specifically asked for it. I apologize for replying in flame-war style. -- Pjacobi 14:41, 18 Oct 2004 (UTC)
Many documents in non-western languages, for instance, are still represented in other character sets. Which languages? Which character sets In this generality it doesn't help. Please state languages and character sets used. And remember, GB18030 is now fully harmonized with Unicode and cannot be considered a different character set, but Unicode encoding form standardized by somebody other than ISO or Unicode Org, namely the Guobiao. -- Pjacobi 22:23, 17 Oct 2004 (UTC)
It's fine. I was just puzzled about what upset you so much. As a matter of fact, I am neither the backer of unicode nor the detractor. I am only interested in making the article informative for those who have questions about unicode. It's very surprising that many people don't know well about unicode, even computer programmers. The article could be a help for them. -- Taku 15:54, Oct 23, 2004 (UTC)
In the section that talks about pre-composed characters vs. composing with several codepoints, how about mentioning that this capability opens up lots of opportunities for phishing once URLs are more universally excepted in UTF-8? For example, once accented characters are common in website addresses, links with a pre-composed "è" and separate "e" plus an accent will point to different sites, but look identical to the user (in fact the intent is for them to look the same). I don't know if this info belongs here, but it's an interesting tidbit. Rlobkovsky 00:06, 6 Dec 2004 (UTC) Insert non-formatted text here
-- Jordi· ✆ 12:26, 28 Dec 2004 (UTC)