It's been my understanding for about 3 years now that English Wikipedia prefers ASCII Roman numerals to precomposed characters, for ease of typing, more consistent search results, less confusing copy-and-paste, and broader compatibility with fonts typically used by English speakers. In the 2020-11-01 database dump, I see for example 1,412,537 instances of "III" but only 288 instances of "Ⅲ". Since we don't use vertical text, this preference seems to align with what I found in the Unicode standard (quoting from Unicode 7.0.0, Chapter 22, p. 754):
I've removed perhaps a few hundred of these, and the first objection I've gotten came recently from Struthious Bandersnatch, who is unpersuaded by this reasoning, and says that the language currently in Wikipedia:Manual of Style/Mathematics#Special symbols means that precomposed Roman numeral characters are preferred, which he intends on adding in articles he edits. We've brought the discussion here to see if there is in fact a consensus on this issue and to document it. I would proposing adding something like this to that MOS section:
What do you think? -- Beland ( talk) 03:16, 22 November 2020 (UTC)
The section Beland cites above does not actually say why it is that forMany characters in the Unicode Standard are used to represent numbers or numeric expressions. Some characters are used exclusively in a numeric context; other characters can be used both as letters and numerically, depending on context. The notational systems for numbers are equally varied. They range from the familiar decimal notation to non-decimal systems, such as Roman numerals.
most purposes, it is preferableto use letters to represent Roman numerals; it seems to me that the reason could simply be that it's easier to type. Apart from the rotation behavior in vertical writing systems, it also doesn't say what other minority purposes there are. (I'm noticing now that our article claims Unicode Roman numerals are "for compatibility only", but cites this to the preceding version of this document, which as far as I can see does not actually say this either.)
similar-looking ASCII or punctuation symbols. Beland did not seem to think this was particularly notable in our previous discussion, but I'd be curious to see numbers on how consistently it can be done since humans can't do it reliably, particularly given the famous (moderately famous? okay, maybe just famous to me) case of the Indian news anchor who was fired after reading the name of the president of China, Xi Jinping, as "Eleven Jinping" on the day he had arrived on a diplomatic visit.
less convenientin any substantive way; and quite frankly it just seems lazy to me on all fronts, search engines and spell-checking and otherwise, to simply normalize the data ahead of time rather than adding one rule and preserving information in the source.It's like going and color-quantizing a bunch of source images because you don't want to bother to write code handling multiple bit depths. You realize that you're implicitly arguing that it would be better for NLP and TTS systems to not contain a simple rule like that, and hence be unable to handle Unicode Roman numerals?On the matter of terminology—you, specifically, are the one claiming that Unicode Roman numerals merely represent pre-composed versions of Latin letter combinations. Note that even in the Wikipedia article you've linked to about compatibility characters, there's a “ citation needed” tag on this claim—and in fact the article says,
If you seriously do not understand why it's biased to make that claim in a policy thread I haven't commented in yet, and also after I've explicitly said that the Unicode Consortium documents you're linking to do not say that—without any counterargument or presenting better sourcing for your claim, and to use this wording which favors your desired conclusion in framing the question itself when introducing a policy discussion on a talk page, I kind of wonder whether you're a good candidate for making such proposals. Perhaps you should ask a neutral third party to make the proposal in this sort of situation. David Eppstein, it's great that it looks indistinguishable to you, but as I said, to me, with my system's combination of browsers and built-in fonts, [†] it does look different, and better. Hence the benefit is better styling, from my point of view at least, and what I'm saying is that these arguments simply aren't overwhelming enough to justify eradicating my styling preferences from Wikipedia.Now if there was an accessibility benefit, I'd find that a persuasive argument. But web accessibility is something I know a fair bit about—in fact, I've worked on Section 503 compliance issues in web content management systems in the U.S. since the last century—and no one has actually presented any evidence here that converting Unicode numbers to a bunch of undifferentiated Latin letters provides improved accessibility....in certain academic circles the use of Roman numerals as distinct from Latin letters that share the same glyphs would be no different from the use of Cuneiform numerals or ancient Greek numerals. Collapsing the Roman numeral characters to Latin letter characters eliminates a semantic distinction.
about personal behavior that lack evidence; it's you who are saying you don't understand the bias inherent in your own words, then linking to mainspace articles where the claim you're making is marked “ citation needed”. No need to discuss it anymore if you will stop portraying your position in this policy discussion—the position not simply that I have my preferences, and you have your preferences which are better, but that your preferences are so superlatively correct as to exclude my preferences as even being an acceptable option or variation—as a mere unbiased reflection of what Unicode Consortium documents say, or what orthodox programming practice would dictate.If the developers of a(n unnamed) search engine don't invest developer time in something as simple as an equivalence like this, or don't even invest time in thinking about how their product should handle this situation, then they've done a poor job of making a search engine. I mean, they've basically made a search engine that doesn't handle Unicode. I'd be inclined to audit for Y2K bugs and UNIX epoch problems too. It's not a problem for Wikipedia to solve by constraining our styling decisions.And I'm sorry but you simply have not demonstrated that a naïve system is going to be unable to handle Unicode Roman numerals properly. At all. The reason why a naïve system could handle them, and do so virtually effortlessly, is because as I've said (and the article you linked to said, which I quoted above) these Unicode numbers and their Unicode Latin letter equivalents are not simply fungible—the number code points contain additional information, which is being removed when they're converted to letters, which is one of the things I'm objecting to. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 11:18, 27 November 2020 (UTC)
Oh, come on. This is not even a good faith argument, after you've proposed that theA naive system that only uses the character encoding to semantically interpret Roman numerals is going to fail over 99% of the time on that particular task.
easiest thing to do from a machine readability perspectiveas a “naïve” system is removing numeric information from source data and then using a possibly-infinite number of ad hoc rules for translating letters into Roman numerals. A naïve NLP system built in a 1930s teletype machine (video on YouTube), mechanically, would also mostly fail btw.But “easiest” and “most effective” are different criteria; I specifically asked why one rule, or even a vanishingly-small handful of rules for that matter, for processing characters already encoded as numbers would make for a material difference in ease of implementation (...implementation of all of these products which aren't part of Wikipedia, for which you have yet to explain why we would need to conform our styling rules to their vendors' needs, if indeed your claim about ease of implementation is even valid, though it seems trivially untrue to me)—why this
would result in a salient difference in how easy it is to do anything. This is a blatant case of moving the goalposts, and a rather rhetorically clumsy one at that.
You have simply repeated a previous, uncited claim you made on your user talk page, which I challenged then, without any actual citation here either. As with so many other assertions, you haven't demonstrated that any behavior of the Google Search Engine is the result of not investing developer time (on the part of Google, of all companies—not exactly resource-poor when it comes to developer time for their flagship product) or failing to even...both "poorly written" (which in this case would apparently include Google) and "well-written" search engines...
invest time in thinking about how their product should handle this situation.The definition of the robustness principle you link to reads,
...so just how exactly does that even remotely describe your approach here? (Or paraphrase into a heuristic that fixing a problemBe conservative in what you do, be liberal in what you accept from others
should happen at both ends?—that's pretty much the diametric opposite of the concept.) How is destroying information in your source data to permit an unexplained, supposedly-“easiest” implementation of these various automated systems interacting with Wikipedia content which don't care about style at all anyways, “conservative”?The introduction to the Unicode Consortium document's section on numerals says, again,
...which is what's clear: Unicode Roman numerals are a notation system for numbers, and no matter how many times you call them pre-composed versions of letters, merely for compatibility, or if you insert the term into the talk page section header here (or link to a Wikipedia article about encoding compatibility which marks the same claim with “ citation needed”...) that doesn't change anything. What's “strained” is claiming that a single sentence which is verging on a footnote (which still explicitly says that these are Forms of Numbers, anyways) overrides and excludes the definition of these code points in the document's own introductory section about numerals and overrides any autonomy Wikipedia would have for determining its own styling choices.And it's also strained to act like you aren't obligated by Wikipedia practices to present a policy change proposal from an NPOV, when you're putting yourself forward as a superlative authority onThe notational systems for numbers are equally varied. They range from the familiar decimal notation to non-decimal systems, such as Roman numerals.
common practiceof styling decisions for Roman numerals; and tbh acting that way gainsays the authority you've arrogated to yourself. (Obviously, your judgment about these kinds of styling decisions wasn't authoritative for the type foundries that designed the fonts installed on my computer, who intended for these glyphs to be used and put developer time into it.) -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 07:06, 28 November 2020 (UTC)
I explicitly said at your talk page,I doubt that any style guide anywhere explicitly recommends using the precomposed Roman numerals in English prose, but I'm open to being proven wrong.
(Edit:)I am, of course, not advocating for using these characters as letters, like in “triⅵal” or something, but exclusively numerically.
It is wrongheaded to think that we can have a single different character for every possible number that could appear as a roman numeral.You could go tell that to whoever is arguing that, wherever they are, because they aren't participating in this discussion. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:27, 30 November 2020 (UTC)
best argument yetis that the arguments attempting to support your position so far have involved things like linking to a Wikipedia article where one of your central claims is marked " citation needed" and paraphrasing the robustness principle into its exact opposite.Again, web browsers are products from vendors other than Wikipedia or the WMF. It's not our job to improve them or compensate for the failings of those vendors.I notice that if I go to the Wikipedia article Letterlike Symbols in Firefox, searching for "h" does not find the Planck constant symbol and searching for "K" does not find the Kelvin symbol. Similarly, at Mathematical Alphanumeric Symbols none of the glyphs that are definitely representing Latin letters are found by searching for their Plane 0, Row 00 look-alikes.So if this was your best argument, after all the above writing, color me unimpressed. You would need to demonstrate that it has anything to do with Roman numerals in particular rather than Roman numerals plus everything else §Special symbols also covers.
I'd invite you to provide a citation... I don't think I have to know much about NLP to say that unless a system is able to correctly distinguish Roman numeral collections of Latin letters from non-Roman-numeral ones 100% of the time, Unicode Roman numerals in source material can have only positive utility. But again, it doesn't matter, because this sort of system is not a Wikipedia or WMF product.And in fact I expect most such non-naive systems would perform better if they decomposed such characters early in processing.
The main reason for Google to be case-insensitive is that it halves the size of the index necessary (and combinatorically, the reduction is much greater than half.) Punctuation is ignored for similar reasons.in English NLP systems, it's common to destroy the information encoded in capitalization by lowercasing all inputs [...] Notice, for example, that Google gives an identical page of results on Shakespeare's play whether you search for "hamlet" as "Hamlet"
Uh, better aesthetic style? We're in a discussion in a talk page for the Wikipedia Manual of Style, remember? Not that you didn't know that was one of the benefits I'd already proposed, or that you've demostrated other benefits—not to mention the existing damn rules in the style guide—aren't valid, this is all such empty rhetorical posturing.I'll also point out, again, that it is completely absurd to refer to "Ⅴ" as a "pre-composed" version of "V"—you are hammering a square peg into a round hole here. But by all means, continue to demonstrate the inherent ridiculousness and prejudiced nature of your exhibition. I believe the word "wrongheaded" arose above, and it didn't apply to anything else in the conversation...And as far as style guides other than the one you're proposing changing right now—proposing changing to reflect something you yourself are saying other style guides do not say—the AP Stylebook gives guidance to, for example, not use brackets because they supposedly can't be transmitted over news wires. Which, I'll bet anything, is based on some technical problem present in twentieth-century technology that tied back to nineteenth-century telegraph encoding practices. So if by some remote chance you actually go looking for evidence to cite, and in the even more unlikely event under a joint probability distribution that you actually find a style guide which says something about Unicode-specific character encoding practices rather than 19th-century telegraph stuff, also please bring evidence that the authors even remotely know what the hell they're talking about when it comes to this realm of technical topics.The typography term for the difference between "ae" and "æ" is that the latter is called a ligature. And you're still using the term "ASCII encoding" too...If you're going to switch from claiming you're trying...then what's the benefit we are getting from the precomposed encoding? If you think there is a benefit, perhaps a specific example would help explain.
...to dispel the idea that encoding Roman numerals as precomposed Unicode characters some of the time will allow a naive system to handle them properlyto saying that a naïve system is
one that relies on the character encoding to differentiate between Roman numerals and pronouns like "I"—which would mean, under this new definition, that handling Unicode Roman numerals properly is the one thing a "naïve" system can actually do—and yet suggest that I'm the one who is confused? I'm just going to say it: in addition to neutral talk page proposals, Beland, you also seem to be out of your depth when it comes to the intersection of character encodings, typography, web styling, and UX and accessibility. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:27, 30 November 2020 (UTC)
specific phrasingof your proposed addition accurately is not some sort of transactional favor where you then get to call me rude and uncivil for pointing out your sloppy use of terminology at the same time you're trying to install an MOS rule demanding that Wikipedia editors use individual characters to your exacting preferences and specifications.And furthermore, on the subject of your concomitant inattention to detail, the phrase
lack of knowledge, which I've supposedly been saying
over and over againand even just the word "knowledge", appears only in your own comment above in this thread. But it is an accurate self-description; if indeed you are knowledgeable in the technical areas I list above you are failing to convey that by even using terminology correctly. All while insisting on your superlative personal insight into what this Wikipedia guideline should say, while demonstrating at best superficial familiarity with Wikipedia policies and guidelines in general.
What internal system wouldn't, if you yourself have admitted that it takes only a few lines of code in any programming language to create a "naïve NLP system" (rather overkill terminology-wise IMO, but it works) that can handle Roman numerals properly encoded in Unicode, but requires a system with a potentially-infinite number of ad hoc rules to do the job with Basic Latin letters?I mean... this is the basic definition of the term "machine readable". I also don't get why you keep putting that phrase in quotes past the use–mention distinction... you're almost treating it like it's an unfamiliar term, or doesn't mean what our article machine-readable data says:What internal system, if any, would benefit from a "machine readable" encoding?
Machine readable is not synonymous with digitally accessible. A digitally accessible document may be online, making it easier for humans to access via computers, but its content is much harder to extract, transform, and process via computer programming logic if it is not machine-readable.
Then get the entire Wikipedia:Manual of Style/Mathematics § Special symbols section done away with. You can't cherry-pick Unicode Roman numerals to apply this usability quibble (which, I'll bet, no actual user has ever complained about anyways, not even to browser vendors, at least not with mathematical content) to: doing so is, as with so many other arguments made here, fallacious.And I'd note that it's not just §Special symbols you need to work on changing if your concern about Ctrl+f browser page-specific searching is in any way whatsoever real instead of just more chaff thrown up in the process of trying to get your way: in Firefox if I search for "sinxdx" it doesn't find that sequence in §Using LaTeX markup. (Which of course makes sense, since Wikipedia currently uses a plugin which renders LaTeX to images.)I don't think I or most editors would ever agree that the user experience on Wikipedia's own web site doesn't matter or doesn't justify Wikipedia making an effort to improve it.
Tricksy Hobbitses. You're trying to translate the absence of style guides which recommend against the use of Unicode Roman numerals into a positive reason to add a MOS rule prohibiting them, which is a clumsy argument from silence (or more realistically an argument from ignorance because I doubt you've actually gone and looked at anywhere near all style guides to ascertain such an absence.) Are you guys playing logical fallacy bingo? Or going through the list of fallacies article and checking them off, or something?And how would you know whether reliable sources, particularly printed reliable sources, use Roman numeral Unicode characters? Even if this were a valid argument in the first place (if our citation formatting has never had to conform to the intricate specifications of the many organizations making bux off of selling such things, why would our Unicode character encoding of Roman numerals need to conform to anyone else's not-even-explicitly-specified practice?) you have not exactly demonstrated yourself willing to put much effort into doing research.If we can't identify any style guides that recommend non-ASCII Roman numerals, and we can't find any reliable sources that actually use them
Well did you do a survey of publications before encoding Roman numerals the way you do it? Surely, if you can put the question to me, you're willing to answer it yourself.As far as pages which don't currently follow §Special symbols for the Kelvin and Planck constant symbols, again, propose changing the whole thing if you think it's fundamentally invalid.I'm curious how you got started using these characters yourself. Did you seem the used in a publication that you respect, or did you find out about them in a computing context and decide to start using them in articles on your own?
For you to allow editors to use "non-ASCII" characters? As I said on your talk page, you do not have any such power to overturn Wikipedia policy or guidelines by personal fiat. And notice that, if you're using the standard editor, right below the main editing field are a dropdown and a bunch of buttons which allow anyone to insert "non-ASCII" characters, including for example "™" which could easily be reproduced with the HTMLIt would certainly make no sense to me to allow non-ASCII characters in non-math articles
<sup>TM</sup>
....right, and that's a valid styling point of view. No one is saying it isn't. What you have not even begun to do here is demonstrate that your styling viewpoint is so virtuous and superior that it must exclude all other styling points of view, to the degree that for Roman numerals, Wikipedia:Manual of Style/Mathematics § Special symbols should be changed to say the complete opposite of what it says now—to go from saying that the rule of thumb is, characters and character sequences with mathematical significance should be represented by Unicode code points which encode that mathematical significance specifically rather than visually similar glyphs, to saying that Roman numerals must mandatorily be only represented with Basic Latin Unicode code points.Between acting like you don't know what the term "aesthetic style" would mean in a Manual of Style discussion where I've repeatedly brought up fonts and even type foundries, and all of the other see no evil, hear no evil, speak no evil behavior on display here, this is all taking on the aspect of King Canute shouting at the tides. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 19:02, 1 December 2020 (UTC)I don't find anything wrong with the appearance of ASCII Roman numerals
...then, of course, you are actually rendering all of your own quibbles about benefits from the supposed “easiest” things for NLP systems to do invalid as well.As far as I know, Wikipedia doesn't have any internal NLP systems that are attempting to parse the numerical values of Roman numerals, and thus there are none that would benefit from making them "machine readable".
I of course wrote at great length above about why using the term “ASCII” for Unicode Basic Latin characters is inaccurate and inappropriate.If you want to call the entire community down to look at an example of you trying to rewrite a Wikipedia guideline using terminology from before even 1969's RFC 20, that's your business. I'm sure I will have a delightful discussion, among other things, reminiscing about old character encoding times with my fellow neckbeards.My preference for phrasing is the current guideline, as it stands, unchanged, as I have stated repeatedly. You, of all people, are in no position to try to place any prior restraints on what sorts of arguments I can make, when you simply ignore my requests to follow basic Wikipedia procedural guidelines if you don't feel like it. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 00:30, 8 December 2020 (UTC)If you have no objection to the phrasing "ASCII letters should be used instead of precomposed Unicode characters", then that's what the RFC will propose. If you don't express a preference for a different phrasing now, please don't argue later that the phrasing is defective and thus the proposal should be abandoned entirely.
...a later objection as to the quality of the wording would be an argument made in bad faith.—Right, so what you've got are current and past objections to the quality of your wording. Your concern that bad faith arguments not be made—evidently by repeating objections you're already aware of, which definitely has nothing whatsoever to do with the concept of "bad faith"—is touching. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)
I've just encountered a second serious accessibility issue. I just ran the proposed MOS change though a text-to-speech system. One ASCII character sequence is read aloud as "vee eye", but the non-ASCII equivalent is read aloud as "letter two one seven five". That means if we don't want Roman numerals to be essentially jibberish to some people with visual impairments, we should stick with the ASCII characters. -- Beland ( talk) 04:13, 9 December 2020 (UTC)
-- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)You can't cherry-pick Unicode Roman numerals to apply this usability quibble (which, I'll bet, no actual user has ever complained about anyways, not even to browser vendors, at least not with mathematical content) to: doing so is, as with so many other arguments made here, fallacious.
We've been discussing above whether or not any reputable style guides or reliable sources encode Roman numerals in non-ASCII characters. If anyone knows of any, please share! Personally, I don't recall ever having seen that in professional English publications, though it's not always obvious if you're not explicitly looking. Maybe half the very small number of instances of non-ASCII Roman numerals in English Wikipedia are actually in Japanese text. One way to approach this is just to come up with a list of reliable sources and do a site search to find a page with a Roman numeral. I've checked a few, though obviously the more that are checked the more reliable the sampling is. Feel free to round out the below with more sources you find reliable or if you can find a style guide that even mentions this issue that would be illuminating. -- Beland ( talk) 04:21, 2 December 2020 (UTC)
Slapping together a list of web sites that supposedly don't use Unicode Roman numeral notation does not make your point, either. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 00:30, 8 December 2020 (UTC)...if our citation formatting has never had to conform to the intricate specifications of the many organizations making bux off of selling [style guides], why would our Unicode character encoding of Roman numerals need to conform to anyone else's not-even-explicitly-specified practice?
If we can't identify any style guides that recommend non-ASCII Roman numerals, and we can't find any reliable sources that actually use them...What I said was,
You certainly have not proved that the printed New York Times does not use Unicode Roman numerals at any stage of its typesetting process, nor the web site either. And even if you could, it still wouldn't matter, because "bunch of undocumented practices which may have nothing to do with styling" does not equal "Beland gets what they want in disagreements over Wikipedia styling guidelines, which specify things to a much lower technical level than anyone else does anyways".If you're worried that the appearance of properly-numeric-notation-encoded Roman numerals will be some sort of unexpected shock to the reader, I've responded to that genre of quibble already, but it looks like you have an argument with our friend David in that case:And how would you know whether reliable sources, particularly printed reliable sources, use Roman numeral Unicode characters? Even if this were a valid argument in the first place (if our citation formatting has never had to conform to the intricate specifications of the many organizations making bux off of selling such things, why would our Unicode character encoding of Roman numerals need to conform to anyone else's not-even-explicitly-specified practice?) you have not exactly demonstrated yourself willing to put much effort into doing research.
...the two variations look identical on my screen; I would guess (another guess) that this is because the browser converts the precomposed ones to ASCII internally, so there is no actual benefit to precomposition for people who are just reading Wikipedia in browsers...
...we do tend to pick and choose from general-audience style guides and the practices of cited sources, and many editors who participate in MOS discussions find such evidence persuasive. If you personally don't, that's fine.—Of course, not only is this not much of an actual tendency of ours—again, citation formatting—you have not presented any evidence whatsoever from style guides, nor that your handwavy pointing to some web pages demonstrates anything to do with styling practices. Much less any persuasive evidence. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)
I don't see how print sources are relevant to this question.—because print sources are implementing aesthetic styling of Roman numerals, as would be embedded images or hoary old Flash .swfs in "explainers" on the ancient static HTML NYT pages from the last century that are still kicking around if you follow the right links.What I would like for Wikipedia is, of course, the Manual of Style to be followed—it is correct, as I have said again and again, that I don't think we should be following the supposed interpolated unwritten styling rules of other organizations—I think we should be following the Wikipedia Manual of Style. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 15:00, 12 December 2020 (UTC)
Your response is about visual appearance, but you complaint is about character encodingI don't have a "complaint"; I support the Manual of Style as it currently reads and have asked you to comply with it. You are complaining that all editors are not mandatorily required to follow your styling preferences, and wish the MOS to be changed to require that.Anyone reading the above conversation can easily see that from my very first sentence I've characterized the use of Roman numeral Unicode code points as a
valid style variationand directly answered your repeated WP:HUH? questions about what benefits I could possibly see by specifying aesthetic benefits, among others.
Surely it's irrelevant to readers if the encoding was changed in some intermediate version of the typesetting process that isn't even the finished product.But, what, readers do care about which Unicode code points are used to represent Roman numerals in these Unicode web pages?
I'm not advocating for any particular visual appearance, so evidence of how Roman numerals appear in print is not relevant to any argument I'm making.Then how exactly are Roman numerals encoded from collections of Basic Latin Unicode code points
what [readers] will be expecting and familiar with? You are making so many simultaneously-contradictory arguments, even within the same sub-threads here; even you seem to be having difficulty keeping track of them.
Though I don't see anywhere you actually specified what you're looking for in a finished web page or Wikipedia print version?—what the MOS says. For the bazillionth time. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 14:46, 21 December 2020 (UTC)
or
- Leave room for flexibility (or: Avoid instruction creep). [...]
- Don't be prescriptive. Devolve responsibility. [...]
"Good faith"... you see that as compatible with not presenting the discussion of a policy change from a neutral point of view? Or compatible with your talk of whether you're going to allow editors to apply this consensus Wikipedia guideline to use
- Consult widely – make a special effort to engage potential critics of the new guideline, engage them and get them to help find the middle ground early.
non-ASCII characters in non-math articles, a guideline which was initially proposed in 2005?Amazing how quickly you're able to flip from, "My wild guesses about a supposedly universal utilitarian un-thought-out internet-wide practice equate to an iron-clad ineluctable undocumented Wikipedia styling rule from which there must be no variation!" to "It's just a thirteen year old essay, made from even older pages that used to be in the Help: namespace, which could totally merely be a coincidence instead of representative of standard Wikipedia practices and procedures, so it doesn't count!" But sure, regale me with how your creative interpretations apply to the more concise procedural policy WP:PGCHANGE.As far as,
we have two editors in favor of the proposal and one opposed, surely an editor with such overweening faith in your own insight into WP:P&G knows what I'm going to say in response, right? Wikipedia:Consensus,
Consensus on Wikipedia does not mean unanimity (which is ideal but not always achievable), nor is it the result of a vote(my emphasis), its explanatory supplement Wikipedia:Polling is not a substitute for discussion § Policy and guidelines, and Wikipedia:What Wikipedia is not § Wikipedia is not a democracy.Also, another important bit of policy from Wikipedia:Consensus § In talk pages:
Let's not forget that your self-identified "best argument" above for your de novo addition of a mandatory styling rule to the MOS turned out to be something not even specific to Unicode Roman numerals at all.If you still seriously want to proceed further here, and don't want to revise your previous ah, approach to achieving consensus, at all, sure, I'll write a summary. What length should we aim for? (standard third-party word count tool linked on noticeboards, for convenience) As far as an RfC, you're welcome to go ahead with that if you want (while notifying me and following procedures, of course), but you're the one who wants to change the existing guideline.The quality of an argument is more important than whether it represents a minority or a majority view. The arguments "I just don't like it" and "I just like it" usually carry no weight whatsoever.
It sounds like your preference is to run an RFC.You are firmly in WP:IDIDNTHEARTHAT territory at this point. My repeatedly stated preference, at your talk page and here, is for this
established and rather more reasonable guidelineto remain as it is. It's unprofessional, undignified, and further clumsy rhetoric to persist in pretending that your desire to change this guideline and arrogation that you won't allow editors to use Unicode Roman numerals amongst Unicode Basic Latin characters is somehow my wish. Be your own person and take responsibility for your own actions.And on the same theme, WP:RFCBRIEF / WP:RFCNEUTRAL—shortcuts pointing to the same section Wikipedia:Requests for comment § Statement should be neutral and brief—apply to you as the editor making the request for a comment from the community, not me. And I certainly see no reason to be any more neutral in any responding comment I might make than you have been above. Also, lest you try to act as if it's unfair, I will point out the specific arguments you've made here, on your user talk page, and your general rhetorical behavior here as well if I choose.And if you do not follow policies, guidelines, and procedures both to the letter and in spirit, or even just don't follow orthodox practice, or again try to make up extra rules and claim you're merely following them so as to put your thumb on the scale, or any of the other rhetorical crap you've been pulling in this talk page section, I will point those things out as stridently as I choose. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 00:30, 8 December 2020 (UTC)
actively avoiding the questionabove, but seems to have petered out on responding to questions about their own use of fallacies? Sure, let's have an RfC if that's what you guys want. By the rules, though. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)
Well, given the majority of editors in the discussion so far want to adopt the change, "leave things as they are" is not an option unless more editors weigh in to support that. It sounded like you were volunteering to write the neutral summary for the RFC, but it wasn't entirely clear, which is why I repeated back to you my understanding. Since it now seems you are declining to do so so, and there is majority support for an RFC, here is my draft, which you can inspect for neutrality:
As mentioned above, here's an improved version of the proposed addition to the MOS:
Before the RFC starts, feel free to propose any tweaks that would make you happier in the event that this is adopted. (It's unfortunately usually difficult to get RFC participants to come back and give a second opinion on an amended option.) -- Beland ( talk) 00:55, 9 December 2020 (UTC)
Silly me to think that, after all of this discussion, you might be able to think of some view you weren't faithfully reflecting. I keep over-estimating you.Nice attempt to declare that there are no options other than what you want, I guess, but the Gish gallop is for unstructured debate about things like creationism, not written, change-managed, behavioral-P&G-governed Wikipedia policy discussion.So, me characterizing your ah, requests, related to rewording your desired mandatory rule changes to this guideline asbecause policies and guidelines are sensitive and complex, users should take care over any edits, to be sure they are faithfully reflecting the community's view
Asking me to do your work for you and write the "specific phrasing" of your proposed addition accurately, or my response to your inquiry about "formal third-party closure" of this thread, which I explicitly separated from my remarks "As far as an RfC..." are things you heard as "volunteering to write the neutral summary" that "wasn't entirely clear", eh? Right.I'm definitely not objecting to an RfC at all, just insisting that policies, guidelines, and procedures be followed. WP:RFCBRIEF / WP:RFCNEUTRAL isn't an excuse to take a tabula rasa approach to the RfC, as though we haven't had the above discussion; as it says,
Your RfC should explicitly state that you wish to overrule MOS:STYLERET in these cases, excluding all other styling variations like plain Unicode and I'm assuming things like MathML character entities when MathML is implemented ( see here for example), or if not, you should say so. To faithfully reflect the views you are aware of, at least mention our differing opinions of better styling, the instances we investigated where popular search engines do and do not handle them properly, machine readability, the usability issues you brought up and my responses to them, and the absence of any external style guides speaking to the matter either of us have been able to find.As I've said, for clarity, and because you are specifically talking about character encoding and not the string comparison algorithm of RFC 20 or some W3C documents, I think that the term "Basic Latin" linked to the article Basic Latin (Unicode block) should be used in place of ASCII, in any sentence addressing character encoding such as this—particularly a sentence that's going to appear in Wikipedia P&G, where we use technical terminology carefully. The wording proposed to be included in the guideline itself should also emphasize that it's really, actually trying to mandate sequences of Latin letters in lieu of specific numerical notation encoding, since this is proposed to follow the §Special symbols rule of thumb saying that mathematical versions of symbols should be used when glyphs are similar. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)If you have lots to say on the issue, give and sign a brief statement in the initial description and publish the page, then edit the page again and place additional comments below your first statement and timestamp. If you feel that you cannot describe the issue neutrally, you may either ask someone else to write the question or summary, or simply do your best and leave a note asking others to improve it. It may be helpful to discuss your planned RfC question on the talk page before starting the RfC, to see whether other editors have ideas for making it clearer or more concise.
I [...] am following the "talk first" approach which is also described there(the alternative being boldly making an edit one expects to be unchallenged, which in this case I'd simply have reverted anyways and you knew this after the discussion on your user talk page: it is not some virtuous thing to refrain from starting an edit war on a policy guideline page, it's pretty much just minimal expected proper editor conduct), and which also reads,
Because Wikipedia practice exists in the community through consensus...—does not govern talk page discussions seeking consensus to change the text of a guideline, what policy or guideline does govern such discussions?When it's a matter of rules that would restrict the behavior and Wikipedia editing practices of other people it seems like you can't wait to conjure them out of thin air and grasp at straws for a way to impose your own will through them—but when it comes to any rules which would apply to your own behavior, it's WP:HUH? -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 15:00, 12 December 2020 (UTC)
As the quoted policy refers to Wikipedia namespace edits and not talk page edits, as I said...What you said was
I [...] am following the "talk first" approach which is also described there. You have not at any point attempted to faithfully reflect the community's views: you haven't even faithfully reflected what the guideline currently says. WP:RM#CM literally says,
(My emphasis.) You're seriously trying to suggest that, while the very non-P&G page you quote explicitly says that other procedures are expected to be neutral for mainspace pages, and that even comments mentioning the existence of requested mainspace article move discussions must be neutral, but you can just say whatever you want in a proposal to change P&G, even though the governing policy WP:PGCHANGE explicitly refers to "faithfully reflecting the community's view". I don't believe, in all the years I've worked on Wikipedia, that I've ever brought up the Wikipedia:Wikilawyering essay in a discussion. But this would appear to be an appropriate point to do so.Don't try to misrepresent what I've said as being that PGCHANGE proposals can't be persuasive, because that's clearly not what I've said—I pointed you to an entire essay on the subject of changing P&G Wikipedia:How to contribute to Wikipedia guidance § General recommendations written by other editors. You offhandedly dismissed it asUnlike other request processes on Wikipedia, such as Requests for comment, nominations need not be neutral. Make your point as best you can; use evidence (such as Google Ngrams and pageview statistics) and refer to applicable policies and guidelines, especially our article titling policy and the guideline on disambiguation and primary topics. [...] Requesters should feel free to notify any other Wikiproject or noticeboard that might be interested in the move request, as long as this notification is neutral.
an essay that doesn't even necessarily have community consensusbut you can't claim I haven't thoroughly and specifically justified my statements about P&G and how this process is supposed to work.Yes,
it really shouldn't matter who spoke first... it shouldn't, IF everyone is participating in good faith, weighing arguments in good faith, and neutrally, faithfully seeking to reflect the community's view and arrive at consensus. But you have explicitly chosen not to do that in this discussion and I'm not just going to assume you'll follow P&G in subsequent discussions. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 14:46, 21 December 2020 (UTC)
Neutral summary:
This RFC proposes adding the following to the end of Wikipedia:Manual of Style/Mathematics#Special symbols:
Related style guidelines:
-- Beland ( talk) 04:16, 21 December 2020 (UTC)
The following arguments in favor are mostly summarized from the above discussion and were written by Beland with suggestions from other editors... (moved to #Roman Numerals RFC)
I'm assuming you would like to write the arguments against, Struthious Bandersnatch? Anything to add or change, David Eppstein?
I did not add anything explicitly about MathML. Based on the page you linked to, SB, MathML appears to be using the "Unicode characters in the U+21XX range", which are already mentioned. -- Beland ( talk) 04:16, 21 December 2020 (UTC)
As a rule of thumb, specific mathematical symbols shall be used, not similar-looking ASCII or punctuation symbols, even if corresponding glyphs are indistinguishable.It should do this in the initial summary.
either encoding is acceptable, and MOS:STYLERET would [prohibits] changing any given instance from one to the otherare the current guidelines in force.
Using a web browsershould instead begin with, "Like most preferred styling approaches in MOS:MATHS, using a web browser..."
Some screenreaders pronounceshould instead begin with, "Some screenreaders pronounce most code points referred to by Wikipedia:Manual of Style/Mathematics § Special symbols in an overly verbose way which makes for poor accessibility, and this is true of Roman numeral code points..."
[NLP systems encountering Unicode Roman numerals] appearing rarely or some of the time probably results in worse performance than not appearing at all—as I've said repeatedly above, citation needed. Repeating this again and again with no evidence does not make it true, and of course doing so for the 𝑛th time, or offering your own third-party system which you won't have designed to support all Unicode characters if you're opposed to that as some sort of generalization "we" have "seen", is not doing your best to present opinions on the subject neutrally.
[A]lmost all people who read Wikipedia do not use browsers with working support for MathML—this isn't true. Firefox/Gecko supports MathML and so does WebKit. With Edge switching over to be (WebKit-derived) Blink-based last year— Microsoft Edge § Anaheim (2019–present)—all major browsers now contain at least the code to support MathML. Even for browsers with MathML support turned off or older browsers lacking support, extremely mature javascript polyfills/shims like MathJax are available to enable rendering.So the path to MathML is pretty firm; it's only "delusional" if one assigns no importance to accessibility and other benefits. MathJax has a variety of accessibility measures but the specific concern you and Beland voiced about Ctrl+f doesn't seem to work, or works differently, from native MathML, in my cursory testing. (Which would appear to affirm that progress towards native MathML support in browsers and in Wikipedia will be optimal for that particular aspect of accessibility.)
@ Struthious Bandersnatch: Happy New Year! We haven't heard from you on this topic in a while. I was hoping that you would write a summary of your arguments in your own words, as you originally promised, because they are best presented by someone who actually believes in them. Non-response can't be a veto in favor of the status quo, so the RFC will proceed either way. Rather than running the RFC without a summary of arguments against, I have drafted my own summary of your arguments below. Feel free to throw it away completely and express your ideas in your own way, or tweak it if you think some is worth keeping. If there's no response in a week or so, I'll go ahead with the RFC. -- Beland ( talk) 19:56, 7 January 2021 (UTC)
(moved to #Roman Numerals RFC)
This discussion is a new instance of many similar past discussions about non-ASCII Unicode characters and symbols. Examples that I remember include ellipses (...), radical sign (√), blackboard bold (ℝ), function composition (), integer exponents (x2), fractions (1⁄2), but the list is certainly not complete. From all these discussions, I arrive to the following suggestion for the manual of style:
The use of non-ASCII Unicode characters and symbols is discouraged unless if there is no convenient equivalent in plain text or LaTeX (in mathematical formulas), or when talking about them.(Typo and grammar fixed as suggested below. D.Lazard ( talk) 10:14, 10 January 2021 (UTC))
The rationale for this is
For the present discussion, I can add that the semantics of a Roman numeral is based on the fact that it is a sequence of digits represented by Latin letters. The combined Unicode symbols destroy this semantics. D.Lazard ( talk) 14:12, 9 January 2021 (UTC)
@ D.Lazard: and other interested editors...
So favoring ASCII characters over non-ASCII would mean:
I would support all of those preferences.
Wikipedia:Manual of Style/Mathematics#Multiplication sign prefers U+00D7 × MULTIPLICATION SIGN or × (and ⋅ where appropriate). I'd lean away from changing × to the ASCII letter "x", just because it's typographically distinct and there are a very large number of instances. If there is consensus in favor of keeping ×, it would be good to note it explicitly as an exception. I would support changing U+2715 ✕ MULTIPLICATION X, U+2A09 ⨉ N-ARY TIMES OPERATOR, and U+2A2F ⨯ VECTOR OR CROSS PRODUCT to U+00D7 × MULTIPLICATION SIGN for the same reasons as we prefer ASCII characters, like find-in-page consistency. Where these characters do appear, they are usually not used "correctly" according to how the Unicode standard defines the semantics. Even though U+00D7 is in a slightly higher character range, it's much more widely used than the others, and is more easily accessed because it is on the special character list in every Wikipedia edit window (for desktop browsers).
I would also support converting all instances of "×" to "×" since the difference with "x" is pretty obvious, and we almost always already do this anyway.
For the record, in the December 20, 2020, database dump, I see:
Favoring LaTeX markup over non-ASCII Unicode characters is an interesting but much more complicated question which I would like to discuss sometime soon. I'm going to defer that for now, since the ASCII preference alone is pretty complicated. Given the very long discussion we've already had and the complicated arguments made, I'd like to proceed with the Roman numerals RFC as planned, to get an explicit consensus on that. Either after that or in parallel, I think we should discuss flipping Wikipedia:Manual of Style/Mathematics#Special symbols to prefer ASCII symbols, which as mentioned, would affect asterisk, colon, equals, tilde, and perhaps others. If there is no opposition on this talk page, would we want to just make the change, or would we want to do a formal RFC on that, given there must have been a pre-existing consensus to write the current rule? Would we want to make flipping Wikipedia:Manual of Style/Mathematics#Minus sign to prefer hyphen-minus a separate discussion? Lump it in with the rest? Maybe do a single RFC but ask editors if it should be kept as an exception? -- Beland ( talk) 20:40, 15 January 2021 (UTC)
ok, I couple of points:
Michael Hardy ( talk) 05:44, 19 January 2021 (UTC)
The use of non-ASCII Unicode characters and symbols is discouraged unless if there is no convenient equivalent in plain text or LaTeX (in mathematical formulas), or when talking about them. This does not apply to the non-mathematical use of these symbols and to symbols that are commonly used outside mathematics, such as the minus and the multiplication signs.
Should markup for Roman numerals be restricted by the Manual of Style to Basic Latin (ASCII) letters only (like "VII") and exclude characters in the U+21XX range (like "Ⅶ")? -- 19:45, 26 January 2021 (UTC)
This RFC proposes adding the following to the end of Wikipedia:Manual of Style/Mathematics#Special symbols:
Related style guidelines:
-- Beland ( talk) 19:45, 26 January 2021 (UTC)
The following arguments in favor are mostly summarized from the above subsections and were written by Beland with suggestions from other editors.
The following arguments against were written by Beland (who does not endorse them) as a summary of points made by Struthious Bandersnatch (who has not commented on this phrasing).
<math>...</math>
markup. (And many other non-mathematical symbols in common use on Wikipedia.) We can't make a rule for Roman numerals only; we would have to change the "rule of thumb" to favor ASCII characters for all math symbols. Find-in-page problems for <math>...</math>
markup could be fixed more generally with
MathML improvements.Wikipedia:Manual of Style/Mathematics#Fractions says that precomposed fractions like ½ cause accessibility problems. However, in the discussion at Wikipedia:Categories for discussion/Log/2021 March 3#Category:10¼ in gauge railways in England, Graham87, who uses a screenreader, says these characters do not cause problems. Is anyone aware of any specific accessibility problems caused by these characters, or should that claim be removed? I do know search engines don't always handle them well, and though that may impede access, that's not what we generally mean when we say "accessibility". -- Beland ( talk) 19:13, 10 September 2021 (UTC)
The word "zeroth" appears in over 500 articles. It is potentially unfamiliar to users outside the Anglophone countries. Some English speakers might need to pause or re-read the word to infer the intended pronunciation and hence the meaning when it is written as "zeroth".
Should the word written as "zero'th", "zero-th" or "0th"; the "th" be in superscript; or a link to the Wiktionary page for 'zeroth' added to clarify what is meant?
Sesquivalent ( talk) 19:01, 26 September 2021 (UTC)
Doing some cleanup work, I just discovered that LaTeX-based double-stroke blackboard bold doesn't work for numbers when using "mathbb". There is a workaround using "text" but it leaves a lot of space after the number. Conversion to regular bold is a possibility, but not where the notation itself is being explained. What's the preferred solution for a. discussion the notation itself, and b. when using the notation?
Markup examples:
Articles currently affected:
-- Beland ( talk) 19:53, 10 January 2022 (UTC)
\unicode[STIXGeneral]{x1D7D9}
etc. It might be possible to add non-standard macros for these if it's really required. --
Salix alba (
talk):
19:11, 11 January 2022 (UTC)
MOS:MATH#PUNC currently says: "Similarly, if the conventional punctuation rules would require a question mark, comma, semicolon, or other punctuation at that place, the formula must have that punctuation at the end." We also have
MOS:PUNCTSPACE: "In normal text, never put a space before a comma, semicolon, colon, period/full stop, question mark, or exclamation mark". However, it might be unclear whether mathematical formulas are "normal text", :–) and unfortunately some people insert spaces (\,
and even ~
) before such punctuation marks. I think, it would be useful to add to
MOS:MATH#PUNC a short phrase against this practice (with a link to
MOS:PUNCTSPACE). Any objections or better ideas? —
Mikhail Ryazanov (
talk)
19:55, 30 October 2021 (UTC)
Mathematics is written in sentences. Often the subject or the verb of the sentence is a mathematical symbol rather than a word. Copyediting, therefore, requires the ability to determine which part of speech is represented by the various symbols. In §3.2.1 there is a listing of mathematical symbols according to their grammatical function.
EXAMPLES:
The example is a complete sentence with , , and acting as nouns, as a conjunction, and as the verb. This is, of course, a relatively simple example but the same principles apply to the more complicated situations.
Authors of mathematics almost invariably write in sentences but sometimes do not punctuate correctly. Although it is not universal practice to punctuate various sections of a display, it often adds to the clarity of the writing. For the most part in AMS publications, mathematical equations are punctuated, with the occasional exception of diagrams, matrices, and determinants. For example, when several separate equations are displayed, it is AMS practice to separate them by inserting a comma or other appropriate punctuation at the end of each line of the display.
When the mathematics in a paragraph is abundant, punctuation needs to be considered with more care than usual. A common mistake, for instance, is for an author to neglect to punctuate an equation that comes at the end of the typed line in a manuscript, even when the next line begins with a separate equation.
...
Specific suggestions are made in the sections below concerning spelling and punctuation. To help a copy editor maintain consistency in punctuation, several guidelines based on AMS practice are proposed; another publisher might well use different criteria. Rules of grammar are not cited because their use in writing mathematical research is no different from their use in other types of writing.
In general, the copy editor should make the manuscript correct if the grammar or punctuation is definitely wrong. In cases where there is more than one correct method, the copy editor sometimes must make a choice to maintain consistency.
12.5: Words versus mathematical symbols in text
In general, mathematical symbols may be used in text in lieu of words, and such statements as “” should not be rewritten as “ is greater than or equal to zero.” Nonetheless, symbols should not be used as a shorthand for words if the result is awkward or ungrammatical. In the phrase
the vectors ,
the condition “” is better expressed in words:
the nonzero vectors
or
the vectors , all nonzero,
depending on the emphasis desired. Moreover, logical symbols should generally not appear in text:
a minimum value of the function on the interval
should be replaced by
there exists a minimum value of the function on the interval
or
the function has a minimum value on the interval .
12.18: Mathematical expressions and punctuation
Mathematical expressions, whether run in with the text or displayed on a separate line, are grammatically part of the text in which they appear. Thus, expressions must be edited not only for correct presentation of the mathematical characters but also for correct grammar in the sentence. For example, if several expressions appear in a single display, they should be separated by commas or semicolons. For example,
Consecutive lines of a single multiline expression, however, should not be punctuated: Expressions must carry ending punctuation if they end a sentence. All ending punctuation and the commas and semicolons separating expressions should be aligned horizontally on the baseline, even when preceded by constructs such as subscripts, superscripts, or fractions.
<code>
tags, which provide unambiguous rendering). —
Mikhail Ryazanov (
talk)
20:07, 3 November 2021 (UTC)$48 + 5 = 53$.
). The
MOS's current position on that is something horribly complicated that
the MOS itself doesn't even follow.
XOR'easter (
talk)
11:00, 2 November 2021 (UTC)
$48 + 5 = 53$.
is indeed how it's written in LaTeX, and this markup produces no extra space. I don't remember seeing any professionally typeset publication with extra spaces in display formulas either, so I don't know where did you get the idea that is should be there. —
Mikhail Ryazanov (
talk)
20:07, 3 November 2021 (UTC)+
and −
is default in LaTeX, so spaced-away minus signs are possible as a matter of course, and a spaced away leading minus sign can easily be accidentally created. (Leading minus signs typically should concatenate onto the following variable or bracket, with only a 'hair' space.)\ \, \; ~ \quad \qquad
but only one way to subtract space: \!
and then only a tiny amount. TeX was designed to assign aesthetic responsibility to the human typesetter. It's up to you to express yourself clearly with your notation; the math renderer will help, but only to a minimal degree.I replaced the text:
with the text:
I may be mistaken, but as I understand it, the issue is to not have different symbols for the same thing (even using a different font) in the same article. If a whole section consistently uses unique variables (unique by both symbol and intended meaning) then there should be no objection.
My possibly mistaken understanding is that if a symbol is in a different font then it is not allowed (e.g. in one section, but R in another); likewise disallowed is a change of notation for the same or nearly the same object between two sections. So for example, if a spacecraft's velocity in one place, but same spacecraft, same velocity elsewhere in the same article would be disallowed.
|quote=
item in a <ref>
, or a clearly delineated quote in a footnote, just as long as the formula is expressed in the article's notation where it used in the article's own text.Astro-Tom-ical ( talk) 11:46, 4 March 2022 (UTC)
I know that specific algebraic structures should be written upright (with operatorname) while unspecified algebraic structures should be written in italics, e.g. as in ; my question is whether the same applies to other structures/mathematical objects, e.g. topological spaces/manifolds: should the n-sphere be denoted by or ? Joel Brennan ( talk) 18:30, 21 March 2022 (UTC)
It's been my understanding for about 3 years now that English Wikipedia prefers ASCII Roman numerals to precomposed characters, for ease of typing, more consistent search results, less confusing copy-and-paste, and broader compatibility with fonts typically used by English speakers. In the 2020-11-01 database dump, I see for example 1,412,537 instances of "III" but only 288 instances of "Ⅲ". Since we don't use vertical text, this preference seems to align with what I found in the Unicode standard (quoting from Unicode 7.0.0, Chapter 22, p. 754):
I've removed perhaps a few hundred of these, and the first objection I've gotten came recently from Struthious Bandersnatch, who is unpersuaded by this reasoning, and says that the language currently in Wikipedia:Manual of Style/Mathematics#Special symbols means that precomposed Roman numeral characters are preferred, which he intends on adding in articles he edits. We've brought the discussion here to see if there is in fact a consensus on this issue and to document it. I would proposing adding something like this to that MOS section:
What do you think? -- Beland ( talk) 03:16, 22 November 2020 (UTC)
The section Beland cites above does not actually say why it is that forMany characters in the Unicode Standard are used to represent numbers or numeric expressions. Some characters are used exclusively in a numeric context; other characters can be used both as letters and numerically, depending on context. The notational systems for numbers are equally varied. They range from the familiar decimal notation to non-decimal systems, such as Roman numerals.
most purposes, it is preferableto use letters to represent Roman numerals; it seems to me that the reason could simply be that it's easier to type. Apart from the rotation behavior in vertical writing systems, it also doesn't say what other minority purposes there are. (I'm noticing now that our article claims Unicode Roman numerals are "for compatibility only", but cites this to the preceding version of this document, which as far as I can see does not actually say this either.)
similar-looking ASCII or punctuation symbols. Beland did not seem to think this was particularly notable in our previous discussion, but I'd be curious to see numbers on how consistently it can be done since humans can't do it reliably, particularly given the famous (moderately famous? okay, maybe just famous to me) case of the Indian news anchor who was fired after reading the name of the president of China, Xi Jinping, as "Eleven Jinping" on the day he had arrived on a diplomatic visit.
less convenientin any substantive way; and quite frankly it just seems lazy to me on all fronts, search engines and spell-checking and otherwise, to simply normalize the data ahead of time rather than adding one rule and preserving information in the source.It's like going and color-quantizing a bunch of source images because you don't want to bother to write code handling multiple bit depths. You realize that you're implicitly arguing that it would be better for NLP and TTS systems to not contain a simple rule like that, and hence be unable to handle Unicode Roman numerals?On the matter of terminology—you, specifically, are the one claiming that Unicode Roman numerals merely represent pre-composed versions of Latin letter combinations. Note that even in the Wikipedia article you've linked to about compatibility characters, there's a “ citation needed” tag on this claim—and in fact the article says,
If you seriously do not understand why it's biased to make that claim in a policy thread I haven't commented in yet, and also after I've explicitly said that the Unicode Consortium documents you're linking to do not say that—without any counterargument or presenting better sourcing for your claim, and to use this wording which favors your desired conclusion in framing the question itself when introducing a policy discussion on a talk page, I kind of wonder whether you're a good candidate for making such proposals. Perhaps you should ask a neutral third party to make the proposal in this sort of situation. David Eppstein, it's great that it looks indistinguishable to you, but as I said, to me, with my system's combination of browsers and built-in fonts, [†] it does look different, and better. Hence the benefit is better styling, from my point of view at least, and what I'm saying is that these arguments simply aren't overwhelming enough to justify eradicating my styling preferences from Wikipedia.Now if there was an accessibility benefit, I'd find that a persuasive argument. But web accessibility is something I know a fair bit about—in fact, I've worked on Section 503 compliance issues in web content management systems in the U.S. since the last century—and no one has actually presented any evidence here that converting Unicode numbers to a bunch of undifferentiated Latin letters provides improved accessibility....in certain academic circles the use of Roman numerals as distinct from Latin letters that share the same glyphs would be no different from the use of Cuneiform numerals or ancient Greek numerals. Collapsing the Roman numeral characters to Latin letter characters eliminates a semantic distinction.
about personal behavior that lack evidence; it's you who are saying you don't understand the bias inherent in your own words, then linking to mainspace articles where the claim you're making is marked “ citation needed”. No need to discuss it anymore if you will stop portraying your position in this policy discussion—the position not simply that I have my preferences, and you have your preferences which are better, but that your preferences are so superlatively correct as to exclude my preferences as even being an acceptable option or variation—as a mere unbiased reflection of what Unicode Consortium documents say, or what orthodox programming practice would dictate.If the developers of a(n unnamed) search engine don't invest developer time in something as simple as an equivalence like this, or don't even invest time in thinking about how their product should handle this situation, then they've done a poor job of making a search engine. I mean, they've basically made a search engine that doesn't handle Unicode. I'd be inclined to audit for Y2K bugs and UNIX epoch problems too. It's not a problem for Wikipedia to solve by constraining our styling decisions.And I'm sorry but you simply have not demonstrated that a naïve system is going to be unable to handle Unicode Roman numerals properly. At all. The reason why a naïve system could handle them, and do so virtually effortlessly, is because as I've said (and the article you linked to said, which I quoted above) these Unicode numbers and their Unicode Latin letter equivalents are not simply fungible—the number code points contain additional information, which is being removed when they're converted to letters, which is one of the things I'm objecting to. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 11:18, 27 November 2020 (UTC)
Oh, come on. This is not even a good faith argument, after you've proposed that theA naive system that only uses the character encoding to semantically interpret Roman numerals is going to fail over 99% of the time on that particular task.
easiest thing to do from a machine readability perspectiveas a “naïve” system is removing numeric information from source data and then using a possibly-infinite number of ad hoc rules for translating letters into Roman numerals. A naïve NLP system built in a 1930s teletype machine (video on YouTube), mechanically, would also mostly fail btw.But “easiest” and “most effective” are different criteria; I specifically asked why one rule, or even a vanishingly-small handful of rules for that matter, for processing characters already encoded as numbers would make for a material difference in ease of implementation (...implementation of all of these products which aren't part of Wikipedia, for which you have yet to explain why we would need to conform our styling rules to their vendors' needs, if indeed your claim about ease of implementation is even valid, though it seems trivially untrue to me)—why this
would result in a salient difference in how easy it is to do anything. This is a blatant case of moving the goalposts, and a rather rhetorically clumsy one at that.
You have simply repeated a previous, uncited claim you made on your user talk page, which I challenged then, without any actual citation here either. As with so many other assertions, you haven't demonstrated that any behavior of the Google Search Engine is the result of not investing developer time (on the part of Google, of all companies—not exactly resource-poor when it comes to developer time for their flagship product) or failing to even...both "poorly written" (which in this case would apparently include Google) and "well-written" search engines...
invest time in thinking about how their product should handle this situation.The definition of the robustness principle you link to reads,
...so just how exactly does that even remotely describe your approach here? (Or paraphrase into a heuristic that fixing a problemBe conservative in what you do, be liberal in what you accept from others
should happen at both ends?—that's pretty much the diametric opposite of the concept.) How is destroying information in your source data to permit an unexplained, supposedly-“easiest” implementation of these various automated systems interacting with Wikipedia content which don't care about style at all anyways, “conservative”?The introduction to the Unicode Consortium document's section on numerals says, again,
...which is what's clear: Unicode Roman numerals are a notation system for numbers, and no matter how many times you call them pre-composed versions of letters, merely for compatibility, or if you insert the term into the talk page section header here (or link to a Wikipedia article about encoding compatibility which marks the same claim with “ citation needed”...) that doesn't change anything. What's “strained” is claiming that a single sentence which is verging on a footnote (which still explicitly says that these are Forms of Numbers, anyways) overrides and excludes the definition of these code points in the document's own introductory section about numerals and overrides any autonomy Wikipedia would have for determining its own styling choices.And it's also strained to act like you aren't obligated by Wikipedia practices to present a policy change proposal from an NPOV, when you're putting yourself forward as a superlative authority onThe notational systems for numbers are equally varied. They range from the familiar decimal notation to non-decimal systems, such as Roman numerals.
common practiceof styling decisions for Roman numerals; and tbh acting that way gainsays the authority you've arrogated to yourself. (Obviously, your judgment about these kinds of styling decisions wasn't authoritative for the type foundries that designed the fonts installed on my computer, who intended for these glyphs to be used and put developer time into it.) -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 07:06, 28 November 2020 (UTC)
I explicitly said at your talk page,I doubt that any style guide anywhere explicitly recommends using the precomposed Roman numerals in English prose, but I'm open to being proven wrong.
(Edit:)I am, of course, not advocating for using these characters as letters, like in “triⅵal” or something, but exclusively numerically.
It is wrongheaded to think that we can have a single different character for every possible number that could appear as a roman numeral.You could go tell that to whoever is arguing that, wherever they are, because they aren't participating in this discussion. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:27, 30 November 2020 (UTC)
best argument yetis that the arguments attempting to support your position so far have involved things like linking to a Wikipedia article where one of your central claims is marked " citation needed" and paraphrasing the robustness principle into its exact opposite.Again, web browsers are products from vendors other than Wikipedia or the WMF. It's not our job to improve them or compensate for the failings of those vendors.I notice that if I go to the Wikipedia article Letterlike Symbols in Firefox, searching for "h" does not find the Planck constant symbol and searching for "K" does not find the Kelvin symbol. Similarly, at Mathematical Alphanumeric Symbols none of the glyphs that are definitely representing Latin letters are found by searching for their Plane 0, Row 00 look-alikes.So if this was your best argument, after all the above writing, color me unimpressed. You would need to demonstrate that it has anything to do with Roman numerals in particular rather than Roman numerals plus everything else §Special symbols also covers.
I'd invite you to provide a citation... I don't think I have to know much about NLP to say that unless a system is able to correctly distinguish Roman numeral collections of Latin letters from non-Roman-numeral ones 100% of the time, Unicode Roman numerals in source material can have only positive utility. But again, it doesn't matter, because this sort of system is not a Wikipedia or WMF product.And in fact I expect most such non-naive systems would perform better if they decomposed such characters early in processing.
The main reason for Google to be case-insensitive is that it halves the size of the index necessary (and combinatorically, the reduction is much greater than half.) Punctuation is ignored for similar reasons.in English NLP systems, it's common to destroy the information encoded in capitalization by lowercasing all inputs [...] Notice, for example, that Google gives an identical page of results on Shakespeare's play whether you search for "hamlet" as "Hamlet"
Uh, better aesthetic style? We're in a discussion in a talk page for the Wikipedia Manual of Style, remember? Not that you didn't know that was one of the benefits I'd already proposed, or that you've demostrated other benefits—not to mention the existing damn rules in the style guide—aren't valid, this is all such empty rhetorical posturing.I'll also point out, again, that it is completely absurd to refer to "Ⅴ" as a "pre-composed" version of "V"—you are hammering a square peg into a round hole here. But by all means, continue to demonstrate the inherent ridiculousness and prejudiced nature of your exhibition. I believe the word "wrongheaded" arose above, and it didn't apply to anything else in the conversation...And as far as style guides other than the one you're proposing changing right now—proposing changing to reflect something you yourself are saying other style guides do not say—the AP Stylebook gives guidance to, for example, not use brackets because they supposedly can't be transmitted over news wires. Which, I'll bet anything, is based on some technical problem present in twentieth-century technology that tied back to nineteenth-century telegraph encoding practices. So if by some remote chance you actually go looking for evidence to cite, and in the even more unlikely event under a joint probability distribution that you actually find a style guide which says something about Unicode-specific character encoding practices rather than 19th-century telegraph stuff, also please bring evidence that the authors even remotely know what the hell they're talking about when it comes to this realm of technical topics.The typography term for the difference between "ae" and "æ" is that the latter is called a ligature. And you're still using the term "ASCII encoding" too...If you're going to switch from claiming you're trying...then what's the benefit we are getting from the precomposed encoding? If you think there is a benefit, perhaps a specific example would help explain.
...to dispel the idea that encoding Roman numerals as precomposed Unicode characters some of the time will allow a naive system to handle them properlyto saying that a naïve system is
one that relies on the character encoding to differentiate between Roman numerals and pronouns like "I"—which would mean, under this new definition, that handling Unicode Roman numerals properly is the one thing a "naïve" system can actually do—and yet suggest that I'm the one who is confused? I'm just going to say it: in addition to neutral talk page proposals, Beland, you also seem to be out of your depth when it comes to the intersection of character encodings, typography, web styling, and UX and accessibility. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:27, 30 November 2020 (UTC)
specific phrasingof your proposed addition accurately is not some sort of transactional favor where you then get to call me rude and uncivil for pointing out your sloppy use of terminology at the same time you're trying to install an MOS rule demanding that Wikipedia editors use individual characters to your exacting preferences and specifications.And furthermore, on the subject of your concomitant inattention to detail, the phrase
lack of knowledge, which I've supposedly been saying
over and over againand even just the word "knowledge", appears only in your own comment above in this thread. But it is an accurate self-description; if indeed you are knowledgeable in the technical areas I list above you are failing to convey that by even using terminology correctly. All while insisting on your superlative personal insight into what this Wikipedia guideline should say, while demonstrating at best superficial familiarity with Wikipedia policies and guidelines in general.
What internal system wouldn't, if you yourself have admitted that it takes only a few lines of code in any programming language to create a "naïve NLP system" (rather overkill terminology-wise IMO, but it works) that can handle Roman numerals properly encoded in Unicode, but requires a system with a potentially-infinite number of ad hoc rules to do the job with Basic Latin letters?I mean... this is the basic definition of the term "machine readable". I also don't get why you keep putting that phrase in quotes past the use–mention distinction... you're almost treating it like it's an unfamiliar term, or doesn't mean what our article machine-readable data says:What internal system, if any, would benefit from a "machine readable" encoding?
Machine readable is not synonymous with digitally accessible. A digitally accessible document may be online, making it easier for humans to access via computers, but its content is much harder to extract, transform, and process via computer programming logic if it is not machine-readable.
Then get the entire Wikipedia:Manual of Style/Mathematics § Special symbols section done away with. You can't cherry-pick Unicode Roman numerals to apply this usability quibble (which, I'll bet, no actual user has ever complained about anyways, not even to browser vendors, at least not with mathematical content) to: doing so is, as with so many other arguments made here, fallacious.And I'd note that it's not just §Special symbols you need to work on changing if your concern about Ctrl+f browser page-specific searching is in any way whatsoever real instead of just more chaff thrown up in the process of trying to get your way: in Firefox if I search for "sinxdx" it doesn't find that sequence in §Using LaTeX markup. (Which of course makes sense, since Wikipedia currently uses a plugin which renders LaTeX to images.)I don't think I or most editors would ever agree that the user experience on Wikipedia's own web site doesn't matter or doesn't justify Wikipedia making an effort to improve it.
Tricksy Hobbitses. You're trying to translate the absence of style guides which recommend against the use of Unicode Roman numerals into a positive reason to add a MOS rule prohibiting them, which is a clumsy argument from silence (or more realistically an argument from ignorance because I doubt you've actually gone and looked at anywhere near all style guides to ascertain such an absence.) Are you guys playing logical fallacy bingo? Or going through the list of fallacies article and checking them off, or something?And how would you know whether reliable sources, particularly printed reliable sources, use Roman numeral Unicode characters? Even if this were a valid argument in the first place (if our citation formatting has never had to conform to the intricate specifications of the many organizations making bux off of selling such things, why would our Unicode character encoding of Roman numerals need to conform to anyone else's not-even-explicitly-specified practice?) you have not exactly demonstrated yourself willing to put much effort into doing research.If we can't identify any style guides that recommend non-ASCII Roman numerals, and we can't find any reliable sources that actually use them
Well did you do a survey of publications before encoding Roman numerals the way you do it? Surely, if you can put the question to me, you're willing to answer it yourself.As far as pages which don't currently follow §Special symbols for the Kelvin and Planck constant symbols, again, propose changing the whole thing if you think it's fundamentally invalid.I'm curious how you got started using these characters yourself. Did you seem the used in a publication that you respect, or did you find out about them in a computing context and decide to start using them in articles on your own?
For you to allow editors to use "non-ASCII" characters? As I said on your talk page, you do not have any such power to overturn Wikipedia policy or guidelines by personal fiat. And notice that, if you're using the standard editor, right below the main editing field are a dropdown and a bunch of buttons which allow anyone to insert "non-ASCII" characters, including for example "™" which could easily be reproduced with the HTMLIt would certainly make no sense to me to allow non-ASCII characters in non-math articles
<sup>TM</sup>
....right, and that's a valid styling point of view. No one is saying it isn't. What you have not even begun to do here is demonstrate that your styling viewpoint is so virtuous and superior that it must exclude all other styling points of view, to the degree that for Roman numerals, Wikipedia:Manual of Style/Mathematics § Special symbols should be changed to say the complete opposite of what it says now—to go from saying that the rule of thumb is, characters and character sequences with mathematical significance should be represented by Unicode code points which encode that mathematical significance specifically rather than visually similar glyphs, to saying that Roman numerals must mandatorily be only represented with Basic Latin Unicode code points.Between acting like you don't know what the term "aesthetic style" would mean in a Manual of Style discussion where I've repeatedly brought up fonts and even type foundries, and all of the other see no evil, hear no evil, speak no evil behavior on display here, this is all taking on the aspect of King Canute shouting at the tides. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 19:02, 1 December 2020 (UTC)I don't find anything wrong with the appearance of ASCII Roman numerals
...then, of course, you are actually rendering all of your own quibbles about benefits from the supposed “easiest” things for NLP systems to do invalid as well.As far as I know, Wikipedia doesn't have any internal NLP systems that are attempting to parse the numerical values of Roman numerals, and thus there are none that would benefit from making them "machine readable".
I of course wrote at great length above about why using the term “ASCII” for Unicode Basic Latin characters is inaccurate and inappropriate.If you want to call the entire community down to look at an example of you trying to rewrite a Wikipedia guideline using terminology from before even 1969's RFC 20, that's your business. I'm sure I will have a delightful discussion, among other things, reminiscing about old character encoding times with my fellow neckbeards.My preference for phrasing is the current guideline, as it stands, unchanged, as I have stated repeatedly. You, of all people, are in no position to try to place any prior restraints on what sorts of arguments I can make, when you simply ignore my requests to follow basic Wikipedia procedural guidelines if you don't feel like it. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 00:30, 8 December 2020 (UTC)If you have no objection to the phrasing "ASCII letters should be used instead of precomposed Unicode characters", then that's what the RFC will propose. If you don't express a preference for a different phrasing now, please don't argue later that the phrasing is defective and thus the proposal should be abandoned entirely.
...a later objection as to the quality of the wording would be an argument made in bad faith.—Right, so what you've got are current and past objections to the quality of your wording. Your concern that bad faith arguments not be made—evidently by repeating objections you're already aware of, which definitely has nothing whatsoever to do with the concept of "bad faith"—is touching. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)
I've just encountered a second serious accessibility issue. I just ran the proposed MOS change though a text-to-speech system. One ASCII character sequence is read aloud as "vee eye", but the non-ASCII equivalent is read aloud as "letter two one seven five". That means if we don't want Roman numerals to be essentially jibberish to some people with visual impairments, we should stick with the ASCII characters. -- Beland ( talk) 04:13, 9 December 2020 (UTC)
-- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)You can't cherry-pick Unicode Roman numerals to apply this usability quibble (which, I'll bet, no actual user has ever complained about anyways, not even to browser vendors, at least not with mathematical content) to: doing so is, as with so many other arguments made here, fallacious.
We've been discussing above whether or not any reputable style guides or reliable sources encode Roman numerals in non-ASCII characters. If anyone knows of any, please share! Personally, I don't recall ever having seen that in professional English publications, though it's not always obvious if you're not explicitly looking. Maybe half the very small number of instances of non-ASCII Roman numerals in English Wikipedia are actually in Japanese text. One way to approach this is just to come up with a list of reliable sources and do a site search to find a page with a Roman numeral. I've checked a few, though obviously the more that are checked the more reliable the sampling is. Feel free to round out the below with more sources you find reliable or if you can find a style guide that even mentions this issue that would be illuminating. -- Beland ( talk) 04:21, 2 December 2020 (UTC)
Slapping together a list of web sites that supposedly don't use Unicode Roman numeral notation does not make your point, either. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 00:30, 8 December 2020 (UTC)...if our citation formatting has never had to conform to the intricate specifications of the many organizations making bux off of selling [style guides], why would our Unicode character encoding of Roman numerals need to conform to anyone else's not-even-explicitly-specified practice?
If we can't identify any style guides that recommend non-ASCII Roman numerals, and we can't find any reliable sources that actually use them...What I said was,
You certainly have not proved that the printed New York Times does not use Unicode Roman numerals at any stage of its typesetting process, nor the web site either. And even if you could, it still wouldn't matter, because "bunch of undocumented practices which may have nothing to do with styling" does not equal "Beland gets what they want in disagreements over Wikipedia styling guidelines, which specify things to a much lower technical level than anyone else does anyways".If you're worried that the appearance of properly-numeric-notation-encoded Roman numerals will be some sort of unexpected shock to the reader, I've responded to that genre of quibble already, but it looks like you have an argument with our friend David in that case:And how would you know whether reliable sources, particularly printed reliable sources, use Roman numeral Unicode characters? Even if this were a valid argument in the first place (if our citation formatting has never had to conform to the intricate specifications of the many organizations making bux off of selling such things, why would our Unicode character encoding of Roman numerals need to conform to anyone else's not-even-explicitly-specified practice?) you have not exactly demonstrated yourself willing to put much effort into doing research.
...the two variations look identical on my screen; I would guess (another guess) that this is because the browser converts the precomposed ones to ASCII internally, so there is no actual benefit to precomposition for people who are just reading Wikipedia in browsers...
...we do tend to pick and choose from general-audience style guides and the practices of cited sources, and many editors who participate in MOS discussions find such evidence persuasive. If you personally don't, that's fine.—Of course, not only is this not much of an actual tendency of ours—again, citation formatting—you have not presented any evidence whatsoever from style guides, nor that your handwavy pointing to some web pages demonstrates anything to do with styling practices. Much less any persuasive evidence. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)
I don't see how print sources are relevant to this question.—because print sources are implementing aesthetic styling of Roman numerals, as would be embedded images or hoary old Flash .swfs in "explainers" on the ancient static HTML NYT pages from the last century that are still kicking around if you follow the right links.What I would like for Wikipedia is, of course, the Manual of Style to be followed—it is correct, as I have said again and again, that I don't think we should be following the supposed interpolated unwritten styling rules of other organizations—I think we should be following the Wikipedia Manual of Style. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 15:00, 12 December 2020 (UTC)
Your response is about visual appearance, but you complaint is about character encodingI don't have a "complaint"; I support the Manual of Style as it currently reads and have asked you to comply with it. You are complaining that all editors are not mandatorily required to follow your styling preferences, and wish the MOS to be changed to require that.Anyone reading the above conversation can easily see that from my very first sentence I've characterized the use of Roman numeral Unicode code points as a
valid style variationand directly answered your repeated WP:HUH? questions about what benefits I could possibly see by specifying aesthetic benefits, among others.
Surely it's irrelevant to readers if the encoding was changed in some intermediate version of the typesetting process that isn't even the finished product.But, what, readers do care about which Unicode code points are used to represent Roman numerals in these Unicode web pages?
I'm not advocating for any particular visual appearance, so evidence of how Roman numerals appear in print is not relevant to any argument I'm making.Then how exactly are Roman numerals encoded from collections of Basic Latin Unicode code points
what [readers] will be expecting and familiar with? You are making so many simultaneously-contradictory arguments, even within the same sub-threads here; even you seem to be having difficulty keeping track of them.
Though I don't see anywhere you actually specified what you're looking for in a finished web page or Wikipedia print version?—what the MOS says. For the bazillionth time. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 14:46, 21 December 2020 (UTC)
or
- Leave room for flexibility (or: Avoid instruction creep). [...]
- Don't be prescriptive. Devolve responsibility. [...]
"Good faith"... you see that as compatible with not presenting the discussion of a policy change from a neutral point of view? Or compatible with your talk of whether you're going to allow editors to apply this consensus Wikipedia guideline to use
- Consult widely – make a special effort to engage potential critics of the new guideline, engage them and get them to help find the middle ground early.
non-ASCII characters in non-math articles, a guideline which was initially proposed in 2005?Amazing how quickly you're able to flip from, "My wild guesses about a supposedly universal utilitarian un-thought-out internet-wide practice equate to an iron-clad ineluctable undocumented Wikipedia styling rule from which there must be no variation!" to "It's just a thirteen year old essay, made from even older pages that used to be in the Help: namespace, which could totally merely be a coincidence instead of representative of standard Wikipedia practices and procedures, so it doesn't count!" But sure, regale me with how your creative interpretations apply to the more concise procedural policy WP:PGCHANGE.As far as,
we have two editors in favor of the proposal and one opposed, surely an editor with such overweening faith in your own insight into WP:P&G knows what I'm going to say in response, right? Wikipedia:Consensus,
Consensus on Wikipedia does not mean unanimity (which is ideal but not always achievable), nor is it the result of a vote(my emphasis), its explanatory supplement Wikipedia:Polling is not a substitute for discussion § Policy and guidelines, and Wikipedia:What Wikipedia is not § Wikipedia is not a democracy.Also, another important bit of policy from Wikipedia:Consensus § In talk pages:
Let's not forget that your self-identified "best argument" above for your de novo addition of a mandatory styling rule to the MOS turned out to be something not even specific to Unicode Roman numerals at all.If you still seriously want to proceed further here, and don't want to revise your previous ah, approach to achieving consensus, at all, sure, I'll write a summary. What length should we aim for? (standard third-party word count tool linked on noticeboards, for convenience) As far as an RfC, you're welcome to go ahead with that if you want (while notifying me and following procedures, of course), but you're the one who wants to change the existing guideline.The quality of an argument is more important than whether it represents a minority or a majority view. The arguments "I just don't like it" and "I just like it" usually carry no weight whatsoever.
It sounds like your preference is to run an RFC.You are firmly in WP:IDIDNTHEARTHAT territory at this point. My repeatedly stated preference, at your talk page and here, is for this
established and rather more reasonable guidelineto remain as it is. It's unprofessional, undignified, and further clumsy rhetoric to persist in pretending that your desire to change this guideline and arrogation that you won't allow editors to use Unicode Roman numerals amongst Unicode Basic Latin characters is somehow my wish. Be your own person and take responsibility for your own actions.And on the same theme, WP:RFCBRIEF / WP:RFCNEUTRAL—shortcuts pointing to the same section Wikipedia:Requests for comment § Statement should be neutral and brief—apply to you as the editor making the request for a comment from the community, not me. And I certainly see no reason to be any more neutral in any responding comment I might make than you have been above. Also, lest you try to act as if it's unfair, I will point out the specific arguments you've made here, on your user talk page, and your general rhetorical behavior here as well if I choose.And if you do not follow policies, guidelines, and procedures both to the letter and in spirit, or even just don't follow orthodox practice, or again try to make up extra rules and claim you're merely following them so as to put your thumb on the scale, or any of the other rhetorical crap you've been pulling in this talk page section, I will point those things out as stridently as I choose. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 00:30, 8 December 2020 (UTC)
actively avoiding the questionabove, but seems to have petered out on responding to questions about their own use of fallacies? Sure, let's have an RfC if that's what you guys want. By the rules, though. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)
Well, given the majority of editors in the discussion so far want to adopt the change, "leave things as they are" is not an option unless more editors weigh in to support that. It sounded like you were volunteering to write the neutral summary for the RFC, but it wasn't entirely clear, which is why I repeated back to you my understanding. Since it now seems you are declining to do so so, and there is majority support for an RFC, here is my draft, which you can inspect for neutrality:
As mentioned above, here's an improved version of the proposed addition to the MOS:
Before the RFC starts, feel free to propose any tweaks that would make you happier in the event that this is adopted. (It's unfortunately usually difficult to get RFC participants to come back and give a second opinion on an amended option.) -- Beland ( talk) 00:55, 9 December 2020 (UTC)
Silly me to think that, after all of this discussion, you might be able to think of some view you weren't faithfully reflecting. I keep over-estimating you.Nice attempt to declare that there are no options other than what you want, I guess, but the Gish gallop is for unstructured debate about things like creationism, not written, change-managed, behavioral-P&G-governed Wikipedia policy discussion.So, me characterizing your ah, requests, related to rewording your desired mandatory rule changes to this guideline asbecause policies and guidelines are sensitive and complex, users should take care over any edits, to be sure they are faithfully reflecting the community's view
Asking me to do your work for you and write the "specific phrasing" of your proposed addition accurately, or my response to your inquiry about "formal third-party closure" of this thread, which I explicitly separated from my remarks "As far as an RfC..." are things you heard as "volunteering to write the neutral summary" that "wasn't entirely clear", eh? Right.I'm definitely not objecting to an RfC at all, just insisting that policies, guidelines, and procedures be followed. WP:RFCBRIEF / WP:RFCNEUTRAL isn't an excuse to take a tabula rasa approach to the RfC, as though we haven't had the above discussion; as it says,
Your RfC should explicitly state that you wish to overrule MOS:STYLERET in these cases, excluding all other styling variations like plain Unicode and I'm assuming things like MathML character entities when MathML is implemented ( see here for example), or if not, you should say so. To faithfully reflect the views you are aware of, at least mention our differing opinions of better styling, the instances we investigated where popular search engines do and do not handle them properly, machine readability, the usability issues you brought up and my responses to them, and the absence of any external style guides speaking to the matter either of us have been able to find.As I've said, for clarity, and because you are specifically talking about character encoding and not the string comparison algorithm of RFC 20 or some W3C documents, I think that the term "Basic Latin" linked to the article Basic Latin (Unicode block) should be used in place of ASCII, in any sentence addressing character encoding such as this—particularly a sentence that's going to appear in Wikipedia P&G, where we use technical terminology carefully. The wording proposed to be included in the guideline itself should also emphasize that it's really, actually trying to mandate sequences of Latin letters in lieu of specific numerical notation encoding, since this is proposed to follow the §Special symbols rule of thumb saying that mathematical versions of symbols should be used when glyphs are similar. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 21:53, 9 December 2020 (UTC)If you have lots to say on the issue, give and sign a brief statement in the initial description and publish the page, then edit the page again and place additional comments below your first statement and timestamp. If you feel that you cannot describe the issue neutrally, you may either ask someone else to write the question or summary, or simply do your best and leave a note asking others to improve it. It may be helpful to discuss your planned RfC question on the talk page before starting the RfC, to see whether other editors have ideas for making it clearer or more concise.
I [...] am following the "talk first" approach which is also described there(the alternative being boldly making an edit one expects to be unchallenged, which in this case I'd simply have reverted anyways and you knew this after the discussion on your user talk page: it is not some virtuous thing to refrain from starting an edit war on a policy guideline page, it's pretty much just minimal expected proper editor conduct), and which also reads,
Because Wikipedia practice exists in the community through consensus...—does not govern talk page discussions seeking consensus to change the text of a guideline, what policy or guideline does govern such discussions?When it's a matter of rules that would restrict the behavior and Wikipedia editing practices of other people it seems like you can't wait to conjure them out of thin air and grasp at straws for a way to impose your own will through them—but when it comes to any rules which would apply to your own behavior, it's WP:HUH? -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 15:00, 12 December 2020 (UTC)
As the quoted policy refers to Wikipedia namespace edits and not talk page edits, as I said...What you said was
I [...] am following the "talk first" approach which is also described there. You have not at any point attempted to faithfully reflect the community's views: you haven't even faithfully reflected what the guideline currently says. WP:RM#CM literally says,
(My emphasis.) You're seriously trying to suggest that, while the very non-P&G page you quote explicitly says that other procedures are expected to be neutral for mainspace pages, and that even comments mentioning the existence of requested mainspace article move discussions must be neutral, but you can just say whatever you want in a proposal to change P&G, even though the governing policy WP:PGCHANGE explicitly refers to "faithfully reflecting the community's view". I don't believe, in all the years I've worked on Wikipedia, that I've ever brought up the Wikipedia:Wikilawyering essay in a discussion. But this would appear to be an appropriate point to do so.Don't try to misrepresent what I've said as being that PGCHANGE proposals can't be persuasive, because that's clearly not what I've said—I pointed you to an entire essay on the subject of changing P&G Wikipedia:How to contribute to Wikipedia guidance § General recommendations written by other editors. You offhandedly dismissed it asUnlike other request processes on Wikipedia, such as Requests for comment, nominations need not be neutral. Make your point as best you can; use evidence (such as Google Ngrams and pageview statistics) and refer to applicable policies and guidelines, especially our article titling policy and the guideline on disambiguation and primary topics. [...] Requesters should feel free to notify any other Wikiproject or noticeboard that might be interested in the move request, as long as this notification is neutral.
an essay that doesn't even necessarily have community consensusbut you can't claim I haven't thoroughly and specifically justified my statements about P&G and how this process is supposed to work.Yes,
it really shouldn't matter who spoke first... it shouldn't, IF everyone is participating in good faith, weighing arguments in good faith, and neutrally, faithfully seeking to reflect the community's view and arrive at consensus. But you have explicitly chosen not to do that in this discussion and I'm not just going to assume you'll follow P&G in subsequent discussions. -- ‿Ꞅtruthious 𝔹andersnatch ͡ |℡| 14:46, 21 December 2020 (UTC)
Neutral summary:
This RFC proposes adding the following to the end of Wikipedia:Manual of Style/Mathematics#Special symbols:
Related style guidelines:
-- Beland ( talk) 04:16, 21 December 2020 (UTC)
The following arguments in favor are mostly summarized from the above discussion and were written by Beland with suggestions from other editors... (moved to #Roman Numerals RFC)
I'm assuming you would like to write the arguments against, Struthious Bandersnatch? Anything to add or change, David Eppstein?
I did not add anything explicitly about MathML. Based on the page you linked to, SB, MathML appears to be using the "Unicode characters in the U+21XX range", which are already mentioned. -- Beland ( talk) 04:16, 21 December 2020 (UTC)
As a rule of thumb, specific mathematical symbols shall be used, not similar-looking ASCII or punctuation symbols, even if corresponding glyphs are indistinguishable.It should do this in the initial summary.
either encoding is acceptable, and MOS:STYLERET would [prohibits] changing any given instance from one to the otherare the current guidelines in force.
Using a web browsershould instead begin with, "Like most preferred styling approaches in MOS:MATHS, using a web browser..."
Some screenreaders pronounceshould instead begin with, "Some screenreaders pronounce most code points referred to by Wikipedia:Manual of Style/Mathematics § Special symbols in an overly verbose way which makes for poor accessibility, and this is true of Roman numeral code points..."
[NLP systems encountering Unicode Roman numerals] appearing rarely or some of the time probably results in worse performance than not appearing at all—as I've said repeatedly above, citation needed. Repeating this again and again with no evidence does not make it true, and of course doing so for the 𝑛th time, or offering your own third-party system which you won't have designed to support all Unicode characters if you're opposed to that as some sort of generalization "we" have "seen", is not doing your best to present opinions on the subject neutrally.
[A]lmost all people who read Wikipedia do not use browsers with working support for MathML—this isn't true. Firefox/Gecko supports MathML and so does WebKit. With Edge switching over to be (WebKit-derived) Blink-based last year— Microsoft Edge § Anaheim (2019–present)—all major browsers now contain at least the code to support MathML. Even for browsers with MathML support turned off or older browsers lacking support, extremely mature javascript polyfills/shims like MathJax are available to enable rendering.So the path to MathML is pretty firm; it's only "delusional" if one assigns no importance to accessibility and other benefits. MathJax has a variety of accessibility measures but the specific concern you and Beland voiced about Ctrl+f doesn't seem to work, or works differently, from native MathML, in my cursory testing. (Which would appear to affirm that progress towards native MathML support in browsers and in Wikipedia will be optimal for that particular aspect of accessibility.)
@ Struthious Bandersnatch: Happy New Year! We haven't heard from you on this topic in a while. I was hoping that you would write a summary of your arguments in your own words, as you originally promised, because they are best presented by someone who actually believes in them. Non-response can't be a veto in favor of the status quo, so the RFC will proceed either way. Rather than running the RFC without a summary of arguments against, I have drafted my own summary of your arguments below. Feel free to throw it away completely and express your ideas in your own way, or tweak it if you think some is worth keeping. If there's no response in a week or so, I'll go ahead with the RFC. -- Beland ( talk) 19:56, 7 January 2021 (UTC)
(moved to #Roman Numerals RFC)
This discussion is a new instance of many similar past discussions about non-ASCII Unicode characters and symbols. Examples that I remember include ellipses (...), radical sign (√), blackboard bold (ℝ), function composition (), integer exponents (x2), fractions (1⁄2), but the list is certainly not complete. From all these discussions, I arrive to the following suggestion for the manual of style:
The use of non-ASCII Unicode characters and symbols is discouraged unless if there is no convenient equivalent in plain text or LaTeX (in mathematical formulas), or when talking about them.(Typo and grammar fixed as suggested below. D.Lazard ( talk) 10:14, 10 January 2021 (UTC))
The rationale for this is
For the present discussion, I can add that the semantics of a Roman numeral is based on the fact that it is a sequence of digits represented by Latin letters. The combined Unicode symbols destroy this semantics. D.Lazard ( talk) 14:12, 9 January 2021 (UTC)
@ D.Lazard: and other interested editors...
So favoring ASCII characters over non-ASCII would mean:
I would support all of those preferences.
Wikipedia:Manual of Style/Mathematics#Multiplication sign prefers U+00D7 × MULTIPLICATION SIGN or × (and ⋅ where appropriate). I'd lean away from changing × to the ASCII letter "x", just because it's typographically distinct and there are a very large number of instances. If there is consensus in favor of keeping ×, it would be good to note it explicitly as an exception. I would support changing U+2715 ✕ MULTIPLICATION X, U+2A09 ⨉ N-ARY TIMES OPERATOR, and U+2A2F ⨯ VECTOR OR CROSS PRODUCT to U+00D7 × MULTIPLICATION SIGN for the same reasons as we prefer ASCII characters, like find-in-page consistency. Where these characters do appear, they are usually not used "correctly" according to how the Unicode standard defines the semantics. Even though U+00D7 is in a slightly higher character range, it's much more widely used than the others, and is more easily accessed because it is on the special character list in every Wikipedia edit window (for desktop browsers).
I would also support converting all instances of "×" to "×" since the difference with "x" is pretty obvious, and we almost always already do this anyway.
For the record, in the December 20, 2020, database dump, I see:
Favoring LaTeX markup over non-ASCII Unicode characters is an interesting but much more complicated question which I would like to discuss sometime soon. I'm going to defer that for now, since the ASCII preference alone is pretty complicated. Given the very long discussion we've already had and the complicated arguments made, I'd like to proceed with the Roman numerals RFC as planned, to get an explicit consensus on that. Either after that or in parallel, I think we should discuss flipping Wikipedia:Manual of Style/Mathematics#Special symbols to prefer ASCII symbols, which as mentioned, would affect asterisk, colon, equals, tilde, and perhaps others. If there is no opposition on this talk page, would we want to just make the change, or would we want to do a formal RFC on that, given there must have been a pre-existing consensus to write the current rule? Would we want to make flipping Wikipedia:Manual of Style/Mathematics#Minus sign to prefer hyphen-minus a separate discussion? Lump it in with the rest? Maybe do a single RFC but ask editors if it should be kept as an exception? -- Beland ( talk) 20:40, 15 January 2021 (UTC)
ok, I couple of points:
Michael Hardy ( talk) 05:44, 19 January 2021 (UTC)
The use of non-ASCII Unicode characters and symbols is discouraged unless if there is no convenient equivalent in plain text or LaTeX (in mathematical formulas), or when talking about them. This does not apply to the non-mathematical use of these symbols and to symbols that are commonly used outside mathematics, such as the minus and the multiplication signs.
Should markup for Roman numerals be restricted by the Manual of Style to Basic Latin (ASCII) letters only (like "VII") and exclude characters in the U+21XX range (like "Ⅶ")? -- 19:45, 26 January 2021 (UTC)
This RFC proposes adding the following to the end of Wikipedia:Manual of Style/Mathematics#Special symbols:
Related style guidelines:
-- Beland ( talk) 19:45, 26 January 2021 (UTC)
The following arguments in favor are mostly summarized from the above subsections and were written by Beland with suggestions from other editors.
The following arguments against were written by Beland (who does not endorse them) as a summary of points made by Struthious Bandersnatch (who has not commented on this phrasing).
<math>...</math>
markup. (And many other non-mathematical symbols in common use on Wikipedia.) We can't make a rule for Roman numerals only; we would have to change the "rule of thumb" to favor ASCII characters for all math symbols. Find-in-page problems for <math>...</math>
markup could be fixed more generally with
MathML improvements.Wikipedia:Manual of Style/Mathematics#Fractions says that precomposed fractions like ½ cause accessibility problems. However, in the discussion at Wikipedia:Categories for discussion/Log/2021 March 3#Category:10¼ in gauge railways in England, Graham87, who uses a screenreader, says these characters do not cause problems. Is anyone aware of any specific accessibility problems caused by these characters, or should that claim be removed? I do know search engines don't always handle them well, and though that may impede access, that's not what we generally mean when we say "accessibility". -- Beland ( talk) 19:13, 10 September 2021 (UTC)
The word "zeroth" appears in over 500 articles. It is potentially unfamiliar to users outside the Anglophone countries. Some English speakers might need to pause or re-read the word to infer the intended pronunciation and hence the meaning when it is written as "zeroth".
Should the word written as "zero'th", "zero-th" or "0th"; the "th" be in superscript; or a link to the Wiktionary page for 'zeroth' added to clarify what is meant?
Sesquivalent ( talk) 19:01, 26 September 2021 (UTC)
Doing some cleanup work, I just discovered that LaTeX-based double-stroke blackboard bold doesn't work for numbers when using "mathbb". There is a workaround using "text" but it leaves a lot of space after the number. Conversion to regular bold is a possibility, but not where the notation itself is being explained. What's the preferred solution for a. discussion the notation itself, and b. when using the notation?
Markup examples:
Articles currently affected:
-- Beland ( talk) 19:53, 10 January 2022 (UTC)
\unicode[STIXGeneral]{x1D7D9}
etc. It might be possible to add non-standard macros for these if it's really required. --
Salix alba (
talk):
19:11, 11 January 2022 (UTC)
MOS:MATH#PUNC currently says: "Similarly, if the conventional punctuation rules would require a question mark, comma, semicolon, or other punctuation at that place, the formula must have that punctuation at the end." We also have
MOS:PUNCTSPACE: "In normal text, never put a space before a comma, semicolon, colon, period/full stop, question mark, or exclamation mark". However, it might be unclear whether mathematical formulas are "normal text", :–) and unfortunately some people insert spaces (\,
and even ~
) before such punctuation marks. I think, it would be useful to add to
MOS:MATH#PUNC a short phrase against this practice (with a link to
MOS:PUNCTSPACE). Any objections or better ideas? —
Mikhail Ryazanov (
talk)
19:55, 30 October 2021 (UTC)
Mathematics is written in sentences. Often the subject or the verb of the sentence is a mathematical symbol rather than a word. Copyediting, therefore, requires the ability to determine which part of speech is represented by the various symbols. In §3.2.1 there is a listing of mathematical symbols according to their grammatical function.
EXAMPLES:
The example is a complete sentence with , , and acting as nouns, as a conjunction, and as the verb. This is, of course, a relatively simple example but the same principles apply to the more complicated situations.
Authors of mathematics almost invariably write in sentences but sometimes do not punctuate correctly. Although it is not universal practice to punctuate various sections of a display, it often adds to the clarity of the writing. For the most part in AMS publications, mathematical equations are punctuated, with the occasional exception of diagrams, matrices, and determinants. For example, when several separate equations are displayed, it is AMS practice to separate them by inserting a comma or other appropriate punctuation at the end of each line of the display.
When the mathematics in a paragraph is abundant, punctuation needs to be considered with more care than usual. A common mistake, for instance, is for an author to neglect to punctuate an equation that comes at the end of the typed line in a manuscript, even when the next line begins with a separate equation.
...
Specific suggestions are made in the sections below concerning spelling and punctuation. To help a copy editor maintain consistency in punctuation, several guidelines based on AMS practice are proposed; another publisher might well use different criteria. Rules of grammar are not cited because their use in writing mathematical research is no different from their use in other types of writing.
In general, the copy editor should make the manuscript correct if the grammar or punctuation is definitely wrong. In cases where there is more than one correct method, the copy editor sometimes must make a choice to maintain consistency.
12.5: Words versus mathematical symbols in text
In general, mathematical symbols may be used in text in lieu of words, and such statements as “” should not be rewritten as “ is greater than or equal to zero.” Nonetheless, symbols should not be used as a shorthand for words if the result is awkward or ungrammatical. In the phrase
the vectors ,
the condition “” is better expressed in words:
the nonzero vectors
or
the vectors , all nonzero,
depending on the emphasis desired. Moreover, logical symbols should generally not appear in text:
a minimum value of the function on the interval
should be replaced by
there exists a minimum value of the function on the interval
or
the function has a minimum value on the interval .
12.18: Mathematical expressions and punctuation
Mathematical expressions, whether run in with the text or displayed on a separate line, are grammatically part of the text in which they appear. Thus, expressions must be edited not only for correct presentation of the mathematical characters but also for correct grammar in the sentence. For example, if several expressions appear in a single display, they should be separated by commas or semicolons. For example,
Consecutive lines of a single multiline expression, however, should not be punctuated: Expressions must carry ending punctuation if they end a sentence. All ending punctuation and the commas and semicolons separating expressions should be aligned horizontally on the baseline, even when preceded by constructs such as subscripts, superscripts, or fractions.
<code>
tags, which provide unambiguous rendering). —
Mikhail Ryazanov (
talk)
20:07, 3 November 2021 (UTC)$48 + 5 = 53$.
). The
MOS's current position on that is something horribly complicated that
the MOS itself doesn't even follow.
XOR'easter (
talk)
11:00, 2 November 2021 (UTC)
$48 + 5 = 53$.
is indeed how it's written in LaTeX, and this markup produces no extra space. I don't remember seeing any professionally typeset publication with extra spaces in display formulas either, so I don't know where did you get the idea that is should be there. —
Mikhail Ryazanov (
talk)
20:07, 3 November 2021 (UTC)+
and −
is default in LaTeX, so spaced-away minus signs are possible as a matter of course, and a spaced away leading minus sign can easily be accidentally created. (Leading minus signs typically should concatenate onto the following variable or bracket, with only a 'hair' space.)\ \, \; ~ \quad \qquad
but only one way to subtract space: \!
and then only a tiny amount. TeX was designed to assign aesthetic responsibility to the human typesetter. It's up to you to express yourself clearly with your notation; the math renderer will help, but only to a minimal degree.I replaced the text:
with the text:
I may be mistaken, but as I understand it, the issue is to not have different symbols for the same thing (even using a different font) in the same article. If a whole section consistently uses unique variables (unique by both symbol and intended meaning) then there should be no objection.
My possibly mistaken understanding is that if a symbol is in a different font then it is not allowed (e.g. in one section, but R in another); likewise disallowed is a change of notation for the same or nearly the same object between two sections. So for example, if a spacecraft's velocity in one place, but same spacecraft, same velocity elsewhere in the same article would be disallowed.
|quote=
item in a <ref>
, or a clearly delineated quote in a footnote, just as long as the formula is expressed in the article's notation where it used in the article's own text.Astro-Tom-ical ( talk) 11:46, 4 March 2022 (UTC)
I know that specific algebraic structures should be written upright (with operatorname) while unspecified algebraic structures should be written in italics, e.g. as in ; my question is whether the same applies to other structures/mathematical objects, e.g. topological spaces/manifolds: should the n-sphere be denoted by or ? Joel Brennan ( talk) 18:30, 21 March 2022 (UTC)