This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 200 | ← | Archive 205 | Archive 206 | Archive 207 | Archive 208 | Archive 209 | Archive 210 |
A rhyme scheme is a pattern that appears in the lines of a poem, and generally letters are used to notate the pattern, for example "ABAB CDCD". Sometimes these sequences are very long, the longest I could find on Wikipedia being "abacabadabacabaeabacabadabacabafabacabadabacabaeabacabadabacaba". There is considerable variation in how this notation is capitalized and punctuated:
In some cases, the main article says that the notation requires a specific capitalization. For example, "aBaBccDDeFFeGG" distinguishes masculine and feminine rhymes with lower vs. upper case. I think it would be nice, for human readability reasons, to settle on a consistent style for this notation. To me the sequences are nicely distinguished from sentence prose when they are either in quotes, all caps, or both. I'm open to whatever poetry-editing editors want to do, but for the sake of having a starting point for discussion, how about the below? I have asked for input from Wikipedia talk:WikiProject Poetry. -- Beland ( talk) 22:00, 25 July 2018 (UTC)
Unless otherwise required by a specific notation, rhyme schemes should generally be written when appearing in prose:
Example: This poem uses the "ABAB" rhyming pattern.
I'd like to see, at the least, evidence for A1 or A2 before we even think about embarking on such a debate, because if MOS does not need to have a rule on something, then it needs to not have a rule on that thing. E Eng 21:55, 26 July 2018 (UTC)
For A1, Rhyme scheme uses both lowercase-in-quotes and uppercase-no-quote styles in its prose, but mostly uses uppercase when explaining the different notations. I think this looks ugly and unprofessional because it is inconsistent, and the inconsistency continues when compared to other poetry pages, found by a moss scan:
-- Beland ( talk) 02:23, 28 July 2018 (UTC)
"abab"
and ABAB CDCD EFEF GHGH
to describe traditional rhymes, which should be using the same subnotation. We could also just say something like "Unless differences between uppercase and lowercase letters are being used in a meaningful way, all letters should be uppercase" and "Unless spacing or punctuation is being used in a meaningful way (such as to separate stanzas), patterns should be written without spaces or dashes; spaces are preferred over dashes when indicating groups of lines." --
Beland (
talk) 07:32, 7 August 2018 (UTC)I need some guidance from folks who do care about this stuff so I can correctly program my database scanner and advise other editors how to fix this type of problem– How about if you just don't do that? That would save us all a huge amount of trouble. E Eng 22:44, 6 August 2018 (UTC)
Someone else is going to have to take over. This is hopeless. E Eng 03:21, 7 August 2018 (UTC)
@ David Eppstein: Did you have any thoughts on the alternatives raised in my comment from 07:32, 7 August 2018, above? -- Beland ( talk) 17:52, 12 August 2018 (UTC)
Well, I have a practical problem in front of me I'm trying to solve, which will require investing some work, regardless of the outcome of this discussion. David, you expressed a concern that any guideline Wikipedia puts out would need to follow academic sources. That's an actionable concern. First, it's a debatable question that has pros and cons that can be explored. We also started to look at potential sources to follow, which I think are inconsistent enough to say we can set our own style to some degree. We currently disagree about how to proceed; the only way I know how to come to consensus is to discuss arguments pro and con on their merits and compare opinions and try to find compromise. There are also alternatives to modifying the Manual of Style; for example, I could advise editors who are looking at these instances to check articles for sources and see if there is a particular style being used for a particular reason, and we could develop practices bottom-up rather than top-down. I was hoping to get at least local consensus before posting an RFC, but if neither of you wish to attempt to reach consensus among us, I can just go ahead and open that now and see what other editors have to say. -- Beland ( talk) 03:09, 13 August 2018 (UTC)
Greetings all, I'm currently updating the style-checking code that reports to Wikipedia:Typo Team/moss, and I need some clarity on which HTML character entity references (things like &) are allowed or preferred. Variations that are not allowed or which are disfavored would be brought to the attention of human editors, along with other suspected style and spelling errors. There are occasional mentions of such entities in the Manual of Style, but no general rules that I could find. I would propose the following:
(edited to reflect the below comments)
HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name, decimal number, or hexadecimal number. For example, "€" is the same as "€", "€", or including the character "€" directly. For a comprehensive list, see List of XML and HTML character entity references. Wikipedia editors are encouraged to follow these guidelines to make it easier for editors to read and understand wikitext, especially those not familiar with HTML notation.
What do folks think? -- Beland ( talk) 19:39, 14 July 2018 (UTC)
The proposal should be revised to make it clear how it relates to the advice already in the MOS at WP:MOS#Keep markup simple,
Α
is explicit whereas Α
(the upper-case form of Greek α
) may be misidentified as the Latin A
.Also the proposal should indicate where this addition would go into the MOS; context matters.
The proposal contains the statement "The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards." That's only partially true; in the version I use, there are a variety of special characters to choose from, but when I hover over them, there isn't any little hint that pops up telling me what the name of the character is. So it is hard to be sure if a character is an n dash or a minus. In another case, it's hard to tell a prime from an apostrophe. I've learned to tell an n dash from a hyphen, but I'll bet there's lots of editors who can't. Jc3s5h ( talk) 22:18, 14 July 2018 (UTC)
(Edited to reflect the below discussion)
HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name,
decimal number, or
hexadecimal number. For example, €
is the same as €
, €
, or including the character €
directly. For a comprehensive list, see
List of XML and HTML character entity references
[2].
In choosing between the numeric reference, named reference, and direct character methods, Wikipedia never uses the numeric reference when a named reference is available, and it usually prefers direct character input over named references (and edits in this direction are made by semi-automated systems like
AutoWikiBrowser). For example, −
should be used instead of −
, and é
should be used instead of é
. Wikipedia stores articles with
Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the
Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki>
tag can also be used instead of character escaping to prevent interpretation of special characters as wiki markup. These preferences are detailed in the table below, and some instances where a given character is preferably not used at all (except where that character is itself the topic of discussion) are noted. Wikipedia editors are encouraged to follow these guidelines to make it easier for editors to read and understand wikitext, especially those not familiar with HTML notation.
Category | Preferred forms | Exceptions and notes |
---|---|---|
ASCII characters | ! " % & ' + < = > [ ] | Sometimes proximity to other characters causes misinterpretation of & , < , > , , , or ' as part HTML markup or wiki markup. In these cases, use & , < , > , [ , ] or ' .
|
Latin and Germanic letters | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ Œ œ Š š Ÿ | Instead of ligatures (Æ, æ, Œ, œ) write two separate letters, except in proper names and in text in languages in which they are standard – see Wikipedia:Manual of Style § Ligatures. |
Greek letters | Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω ϑ ϒ ϖ | When written standalone (not part of a Greek word with other Greek characters), the following can be used to reduce confusion with similar-looking Latin alphabet letters: Α Β Ε Ζ Η Ι Κ Μ Ν Ο Ρ Τ Υ Χ κ ο ρ . μ (mu) and Σ (sigma) are nearly identical to µ (micro) and ∑ (sum), but the other characters are not used in Wikipedia so there is no potential for confusion.
|
Quote marks | ‘ ’ ‚ “ ” „ ´ ′ ″
|
ASCII quote marks are
generally preferred.
Wikipedia:Manual of Style/Dates and numbers § Specific units says not to use ′ and ″ for inches and feet.
|
Dashes | –/– —/— ― ­
|
― is not used by Wikipedia. For more info on ­ (optional hyphen) see
MOS:SHY.
|
Whitespace and non-printing |       ‌ ‍ ‎ ‏
|
  ,   , ‌ , and ‍ are generally unnecessary. For more info on text direction, see
MOS:RTL.
|
Math | × ÷ √ ∝ ∝ ¬ ± ∂ ∇ ℵ ℜ ℑ ℘ ∀ ∃ ∈ ∉ ∋ ∅ ∏ ∑ ∠ ∧ (∧ confused with ^) ∨ (∨ confused with v) ∩ ∪ ∫ ∴ ∼ ≅ ≈ ≠ ≡ ≤ ≥ ⊂ ⊃ ⊄ ⊆ ⊇ ⊕ ⊗ ⊥ ⌈ ⌉ ⌊ ⌋ ⟨ (⟨ confused with <) ⟩ (⟩ confused with >)
|
In some cases TeX markup is preferred to Unicode characters; see
Wikipedia:Manual of Style/Mathematics § Typesetting of mathematical formulae. × (× ) is used in article titles and also for hybrid species. ∑ (sum) should not be used; Wikipedia uses the nearly identical Σ (sigma).
|
Currency | ¢ £ ¤ ¥ € $ | |
Non-English punctuation | ¿ ¡ « » ‹ ›
|
‹ and › are not used by Wikipedia; < and > can be used instead.
|
Dots | · • ⋅ …
|
"..." is preferred to "…" - see MOS:ELLIPSIS. Wiki markup should be used instead of these for lists; see Wikipedia:Manual of Style/Lists § List layout. |
Diacritics | ¨ ¸ ‾ ˜ ˆ | |
Arrows | ← ↑ → ↓ ↔ ↵ ⇐ ⇑ ⇒ ⇓ ⇔ | |
Other symbols | ¦ § © ® ™ ° µ ¶ † ‡ ƒ ‰ ◊ ♠ ♣ ♥ ♦ | µ (micro) is not used by Wikipedia; use μ (lowercase Greek letter mu) instead - see Wikipedia:Manual of Style/Dates and numbers § Specific units |
Superscript and subscript | ¹ ² ³ ª º | Do not use
Unicode subscripts and superscripts like these for numbers, per
Wikipedia:Manual of Style/Superscripts and subscripts; use <sup> and <sub> instead.
|
Fractions | ¼ ½ ¾ ⁄
|
These are not used unless discussing the characters themselves; for alternatives, see Wikipedia:Manual of Style/Dates and numbers § Fractions and ratios |
Above is is a draft of a definitive list of whether the HTML reference or the character itself should be used, as suggested by other editors above. I noticed a few things:
∼
) and ~ (ASCII tilde) seem to be used interchangably but ∼
itself is used very rarely.-- Beland ( talk) 08:12, 15 July 2018 (UTC)
usually prefers direct character input over named references– That's too sweeping. I can see this is gonna take a lot of discussion. For starters, pinging David Eppstein for his thoughts on literal or symbolic for math symbols (not meaning to imply there's one simple answer to that). Not pinging SM because he'll find his was here without doubt and his user name is too hard to get right and it's late and I'm tired. E Eng 08:32, 15 July 2018 (UTC)
−
as otherwise it's too difficult to distinguish from &ndash. Otherwise I don't feel strongly but I know I have seen legions of random AWB users replace &
times; (e.g.) by its unicode character. So we should not encourage replacements that go the other way. —
David Eppstein (
talk) 16:30, 15 July 2018 (UTC)
μ
as the metric prefix for micro. I know some Unicode characters were created for obscure reasons such that Wikipedia has no interest in using those characters; I infer from it's low numerical code value µ
(U+00B5, µ) exists as a way of coding the micro symbol that was used in some pre-Unicode character codes that didn't provide for most Greek letters, to permit round-tripping between those older character codes and Unicode. According to the
Unicode Consortium, the Greek letter character is preferred,
[1]. Maybe use the Greek letter mu directly, whether in a Greek word, the archaic stand-alone symbol for micrometer, or the metric prefix, and explicitly encourage editors to replace µ (U+00B5) with μ (U+03BC).
Jc3s5h (
talk) 10:31, 15 July 2018 (UTC)
References
{{
cite web}}
: Cite has empty unknown parameter: |dead-url=
(
help)
EEng made a good find, that $
was missing. It turns out that this is because
List of XML and HTML character entity references only goes up to HTML 4, and HTML 5 has a ton more, listed
here. Given the length of the resulting table if we include all of them, maybe we should just say "use the character itself except for those listed below" and list the ones where named references should be used? (And maybe continue to list the characters that should not be used at all?) --
Beland (
talk) 03:53, 16 July 2018 (UTC)
a-zA-Z0-9`~!@#$%^&*()-_=+[]{};':",./<>?
should be given via &foo;
or {some template}. Also, the table mixes advice on how to express various characters with advice on whether and when to use various characters. Not saying that's bad, just worth noting.
E
Eng 04:12, 16 July 2018 (UTC)
á
. More generally I am in favor of using unicodes over html entities or templates in most cases, with exceptions for characters like &
(when written next to something that would cause it to expand to a different entity) or −
(because there is too much possibility for confusion with other dash-like characters). Also, as an aside, the text above about avoiding ligatures is too strong; when these characters occur in the standard spelling of a name (
e.g.), we should write them that way even when we are writing in English. —
David Eppstein (
talk) 04:25, 16 July 2018 (UTC)
–
by a ratio of about 10.6:1.Well, you were using your guess that the numbers were the other way around– No, you're mixing up two different things. I conjectured that ndashes and mdashes, together, make up the bulk (counting each use separately) of all these not-on-the-keyboard characters; that was without regard to how those characters were expressed (literal vs. symbolic).
that the UI is the way that it is may be an indication that there is not great support for using – and friends– WP's facilities and interfaces are full of debris that's little used or even "impossible" to use (e.g. template parameters that want to present information that an RfC has determined should never be presented). Trying to infer how things are spozed to be based on things you see in the UI will get you way off track very, very fast.
I can generally tell the difference between dashes of different lengths– So can I, easily in the rendered page, but in the wikisource only with a bit of effort, if I make a point of looking. It's that last bit that's the rub: in the rendered page an ndash vs. mdash look like – vs. —, but in the wikisource they're much more similar i.e.
–
vs. —
. (What you see in that sentence may depend on your skin, so your mileage may very.) Thus it's easy in copyediting to not notice that the wrong one is present, and that's why symbolic names should be used instead. (If we really cared we'd suggest that hyphens be rendered as &hyp; as well. I actually tried that once in an article but got laughed off the stage, so we'll just have to live with using the literal -
. What I usually do is when I see e.g. a date range like 1899-1920
, I just change the literal hypheny-dashy thing that's there to &ndash, so that I know it's the right thing.)I haven't heard a good argument for why those shouldn't just be used directly– Clearly a quotation in a language using a non-Roman script should just present that text literally. For everything else, there are a lot of pros and cons relating to how many different special symbols are used (in a given article), the extent to which each one is used repeatedly, how potentially confuse-able they are for one another or for something else not even used on the page page, the likely sophistication of editors who might work on the article, and a lot more. Here's a random example: WP:MOSNUM says arcminutes should be denoted by a prime and not an apostrophe or a single quote i.e. ′ but not ‘ or ' . Once again, you have to be looking to notice if the wrong one is there; thus MOSNUM suggests that the markup ′ be used to save editors squinting. Unfortunately different considerations come into play for different symbols, so separate analyses are needed in each case. That's why I predicted this discussion would take a long time.
As for the general direction of the advice, using characters directly seems to be the recommended best practice for web development generally. It's more WYSIWYG and easier for web editors to read and think about. It also fits the goal of not forcing editors to learn HTML in order to be able to use Wikipedia; they can just input and edit these characters in the same way they do elsewhere like Word or phone apps or other web sites. We also have a UI right below the text-being-edited box which encourages people to add the characters directly; it would be weird if the advice is to generally use the references because that's not what the system is designed to encourage. The escaping system was originally designed to allow input of special characters that were part of SGML or HTML itself (like angle brackets). Later it became a way to work around the limitations of ASCII. But modern web sites all use Unicode now, as does Wikipedia, so it's a bit of an obsolete workaround. I think any system where you have to learn a special language for telling a computer something is less user-friendly than a system where you can express your intention in the way you would express it to other humans. -- Beland ( talk) 06:29, 16 July 2018 (UTC)
General comment This discussion may affect WP:CHECKWIKI error 11. The error is currently disactivated. -- 11:10, 16 July 2018 (UTC)
<code>...</code>
or perhaps with {{
kbd}}
, whatever looks better (semantically, it can be either – it's code when viewed in the wikitext but also input when you're entering it). If we don't like any of the faint-background effects, use bare <kbd>...</kbd>
, which just uses monospace. I would go with <code>
because the table already uses a light grey and it blends in well, while also not requiring any template calls.–
—/—
". Try: "– (–
), — (—
),". Also, "For more info on (optional hyphen) see MOS:SHY" is a misuse of parentheses (round brackets), seeming for some kind of emphasis. Should just remove them. 
; like  
it is generally only used for kerning in templates and such; there is usually not any reason to manually insert either into an article.‹
and ›
are not used by Wikipedia; <
and >
can be used instead" is wrong; the are not the same character and should not be confused. If we need to illustrate French quotation style, etc., use the correct characters, not lesser-than and greater-than, which serve an entirely different purpose. This is pretty much exactly like hyphen vs. dash vs. minus.Posted to Wikipedia:Manual_of_Style/Text_formatting#HTML_character_entity_references
Proposed as new subsection titled "HTML character entity references" under Wikipedia:Manual of Style § Miscellaneous, replacing the second paragraph of "Keep markup simple".
HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name,
decimal number, or
hexadecimal number. For example, €
is the same as €
, €
, or including the character €
directly.
On Wikipedia, characters should be used directly unless doing so is confusing for editors or causes technical problems. Numerical references should not be used if a named reference is available. For example, −
should be used instead of −
, and é
should be used instead of é
. Edits favoring these conventions are made by semi-automated systems like
AutoWikiBrowser. For a comprehensive list of available named references, see
[3].
Wikipedia stores articles with
Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the
Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki>
tag can also be used instead of
character escaping to prevent interpretation of special characters as wiki markup.
Characters to avoid | | ||
Avoid | Instead use | Note |
---|---|---|
… (… )
|
... (i.e. 3 periods)
|
See MOS:ELLIPSIS. |
Unicode Roman numerals like Ⅰ Ⅱ ⅰ ⅱ
|
Latin letters equivalent (I II i ii )
|
MOS:ROMANNUM |
Unicode fractions like ¼ ½ ¾ ⁄
|
{{ frac}}, {{ sfrac}} | See MOS:FRAC. |
Unicode subscripts and superscripts like ¹ ⁺ ⁿ ₁ ₊ ₙ
|
<sup></sup> <sub></sub>
|
See
WP:SUPSCRIPT. In article titles, use {{DISPLAYTITLE:...}} combined with <sup></sup> or <sub></sub> as appropriate.
|
µ (µ )
|
μ (μ )
|
See MOS:NUM#Specific units |
Ligatures like Æ æ Œ œ
|
Separate letters (AE ae OE oe )
|
Generally avoid except in proper names and text in languages in which they are standard. See MOS:LIGATURES. |
∑ (∑ ) ∏ (∏ ) ― (― )
|
Σ (Σ ) Π (Π ) — (— )
|
(Not to be confused with \sum and \prod, which are used within <math> blocks.) |
‘ (‘ ) ’ (’ ) ‚ (‚ ) “ (“ ) ” (” ) „ („ ) ´ (´ ) ′ (′ ) ″ (″ ) ` (` )
|
Straight quotes (" and ' )
|
Use {{ coord}}, {{ prime}} and {{ pprime}} for mathematical notation; elsewhere use straight quotes unless discussing the characters themselves. See MOS:QUOTEMARKS. |
‹ (‹ ) › (› ) « (« ) » (» )
|
Use ⟨ and ⟩ for math notation.
|
In foreign quotations normalize angle quote marks to straight, per MOS:CONFORM, except where internal to non-English text, per MOS:STRAIGHT. |
       
|
Normal space | These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking ( ) and regular spaces are normally sufficient. Exceptions:
MOS:ACRO,
MOS:NBSP.
|
In vertical lists
|
*
|
Proper wiki markup should be used to create vertical lists. See HELP:LIST#List basics. |
‍ ‌
|
see note | Used in certain foreign-language words, see zero-width joiner/ zero-width non-joiner. Should be avoided elsewhere. |
₤ | £ for GBP, keep ₤ for Italian Lira and other lira currencies that use ₤ (see the main article for that currency) | MOS:CURRENCY; find broken instances |
Potentially confusing or technically problematic characters | | ||
Category | coded form (direct form )
|
Notes |
---|---|---|
Miscellany | & (& ) < (< ) > (> ) [ ( ) ] ( ) ' (' ) | (| )
|
Use these characters directly in general, unless they interfere with HTML or wiki markup. Apostrophes and pipe symbols can alternatively be coded with {{
'}} and {{
!}} or {{
pipe}} . See also
character-substitution templates and
WP:ENCODE.
|
Greek letters | Α (Α ) Β (Β ) Ε (Ε ) Ζ (Ζ ) Η (Η ) Ι (Ι ) Κ (Κ ) Μ (Μ ) Ν (Ν ) Ο (Ο ) Ρ (Ρ ) Τ (Τ ) Υ (Υ ) Χ (Χ ) κ (κ ) ο (ο ) ρ (ρ )
|
In isolation, use coded forms to avoid confusion with similar-looking Latin letters; in a Greek word or text, use the direct characters. |
Quotes | ‘ (‘ ) ’ (’ ) ‚ (‚ ) “ (“ ) ” (” ) „ („ ) ´ (´ ) ′ (′ ) ″ (″ ) ` (` )
|
Can be confused with straight quotes (" and ' ), commas, and with one another.
MOS:STRAIGHT generally requires conversion to straight quotes, except when discussing the characters themselves or sometimes with non-English languages. See next row for prime characters.
|
Apostrophe-like | ' ` ′ ´ ʻ ʼ ʽ ʾ ʼ ʽ ʻ ʼ |
|
Dashes, minuses, hyphens | – (– ) — (— ) − (− ) - (hyphen) ­ (soft hyphen)
|
Can be confused with one another. For dashes and minuses, both forms are used (as well as {{
endash}} and {{
emdash}} ).
Soft hyphens should always be coded with the HTML entity or template. Plain hyphens are usually direct, though at times {{
hyphen}} may be preferable (e.g.
Help:CS1#Pages). See
MOS:DASH,
MOS:SHY, and
MOS:MINUS for guidelines.
|
Whitespace |         ‍ ‌
|
In direct form these are nearly impossible to distinguish from a normal space. See also MOS:NBSP. |
Non-printing | ‎ ‏
|
In direct form these are nearly impossible to identify. See MOS:RTL. |
Mathematics-related | ∧ (∧ ) ∨ (∨ ) ⟨ (⟨ ) ⟩ (⟩ )
|
Can be confused with x ^ v < > . In some cases TeX markup is preferred to Unicode characters; see
MOS:FORMULA. Use {{
angbr}} instead of ⟨ ) / (⟩ )
|
Dots | ⋅ (
⋅ ) · (
· ) • (
• )
|
Can be confused with one another.
Interpuncts (· ) are common in horizontal lists and to indicate syllables in words.
Multiplication dots (⋅ ) are used for math. In practice, the dots are used directly instead of the HTML entities.
|
FTR, as of the July 1, 2018 database dump, [ is used about 329 times and &lbracket; is used about 91 times, so I picked the more common one. -- Beland ( talk) 15:04, 18 July 2018 (UTC)
I posted this to Wikipedia:Manual_of_Style/Text_formatting#HTML_character_entity_references (there's another section there that talks about Unicode PUA and RTL characters) and cross-referenced from Wikipedia:Manual of Style § Miscellaneous. Feel free to edit the live version as needed. -- Beland ( talk) 05:56, 20 July 2018 (UTC)
Might be worth adding a comment in the Greek notes that the same sort of thing applies to Cyrillic letters that look like Latin and Greek ones; use the entity codes for clarity when discussing particular characters, but use the Unicode in actual Russian, Ukranian, etc. words. We probably needn't dwell on the details, since there's another proposal open for centralizing all the scattered Cyrillic-related material to one page. Then again, that's mostly to be about transliteration, so maybe the Greek section in the table should be Greek and Cyrillic? — SMcCandlish ☏ ¢ 😼 04:11, 22 July 2018 (UTC)
So after I posted the tables proposed above, David Eppstein reverted, with the edit summary "what part of "I think you should be more patient"..."Try proposing something narrower and more specific" do you not understand?".
I think I did not see those remarks by David Eppstein and SMcCandlish because they were posted in the discussion ("Fraction slash" below) about the "Slashes" section of the main MOS page, which I did not check for comments before updating the "Text formatting" MOS subpage. SMcCandlish wanted a one-word change to the "Slashes" section, which he implemented. I think David Eppstein was commenting on the change he reverted, as he then wrote:
"Bludgeon" sounds pretty ugly and mean. I started a project to spell-check all Wikipedia, which is intended to improve its readability and credibility. Along the way I noticed that editors have also occasionally misspelled HTML character entity references. I thought as long as we're cleaning up the misspellings, we might as well clean up any undesirable forms, because right now we don't seem to be representing them consistently. I started this discussion because I couldn't find any guidance in the Manual of Style to help me write the code to correctly flag undesirable forms vs. ignore desirable forms.
Mediawiki markup uses this part of HTML syntax, and if we have a preferred form for these things we'd want to communicate that to editors, and the Manual of Style is the place to document choices of style rather than technical how-to for the benefit of editors, so I don't understand the criticism that this is not the right place for this sort of guideline. Especially since Wikipedia:Manual of Style#Keep markup simple already discusses exactly this point, and the other sections linked from the proposed tables also address which characters are preferred.
We already encourage editors to make edits that have no reader-visible changes but do have editor-visible changes intended to make wikitext easier to read and thus articles easier to edit. That's the whole point of Wikipedia:WikiProject Wikify and wikification. I do agree there are some edits that don't improve readability all that much that aren't that worthwhile on their own, like changing "==xx==" to "== xx ==". This seems less trivial than that. I'd also note we have Wikipedia:HTML5, a project which is doing nothing but replacing obsolete HTML tags with newer ones, with hopefully no user-visible changes.
There are less than 20,000 articles that even have HTML character entity references at all, less than 3.5% of all articles. Even if we changed all of them today, given the sheer volume of changes to the encyclopedia it would not be a big deal, and in reality it will probably take months or years to manually change all the instances, if that's what we want to do. At worst, editors who notice these changes happening will be educated about the desired way of doing things, and be more likely to input characters that way when adding new text.
Given that editors seem to use characters a lot more than references, and given that characters are built into the Wikipedia UI, it seems a lot less disruptive to move toward characters than away from them.
To illustrate the difference it makes to editors, consider an editor who comes across "São Paulo" in wikitext. To most people who are not web developers, that looks like a typographical error. Some English-speaking people might correct it to "Sao Paulo" which is often seen in English, or, getting the idea there might be an accent there, to "Sáo Paulo", which is incorrect. "São Paulo" is what Portuguese speakers are expecting to see - it's what they type with their keyboards, and it's what appears in Word docs and on the Portuguese Wikipedia and on Google Translate, and in the readable parts of other web sites. With "São Paulo", everyone knows exactly what's going on, and there's no need to waste time doing a search on the meaning of "atilde" or "ã" or whatnot.
If I were making the rules, I think I'd keep it simple and say to use characters directly except for otherwise invisible characters and those that cause technical problems when used directly. I'd actually be fine if we used ASCII hyphens for all of our dashes, but I'm not complaining if people who can see the difference on their monitors want to upgrade some of them to emdashes to make things look pretty as in the golden years of paper typography. That would make a much smaller table than the one proposed above, but given that other editors seem to feel more strongly about making it easy to tell the difference between certain lookalike characters, I think that table now represents a pretty good compromise. Leaving dashes and quotes as they are takes the biggest chunks of potential work off the table, anyway.
Given that this is proposing a simple general rule and then listing all the desirable exceptions to it, I'm not sure that a narrower proposal would make sense. The volume of comments has been relatively small, so having multiple discussions about the same topic it seems would just burn more editor time. I am, however, open to actionable suggestions. -- Beland ( talk) 08:03, 22 July 2018 (UTC)
Even if we changed all of them today, given the sheer volume of changes to the encyclopedia it would not be a big dealthen there are some things you really don't understand; if you made changes like this to 3% of articles in one day, or one week, or even one month, you'd be strung up by your URLs.
I haven't been following that last week of discussion so I don't know where we are and what the open issues are, but if you want this to see the light of day you need to be prepared to keep plugging for quite some time to work through all the details with all interested parties (not that I even know how to find them). I've gone through an effort like this myself elsewhere in MOS and it can be an exhausting task, though you will be quite rightly congratulated by all in the end if you can pull it off, because it will be a very useful achievement for the project. E Eng 19:05, 23 July 2018 (UTC)
How do other editors feel about David Eppstein's proposal for a rule that "such changes be made only as part of other substantive changes to articles"? Personally, I don't see the need for that, given the arguments I made above, but of course I'll implement whatever the consensus is. -- Beland ( talk) 20:46, 23 July 2018 (UTC)
This is a whole lot of stuff being discussed at once. I'll cover it in the order in which I'm seeing it come up above:
...
not …
or …
, at
MOS:ELLIPSIS; and use μ or μ
not µ
, at
MOS:UNITSYMBOLS; and so on), so the idea that it's off-topic or out-of-scope for MoS doesn't fly. B) MoS has already been updated with a footnote against automated "enforcement" of MoS stuff, including cross-references to the COSMETICBOT policy and to ArbCom decisions about it. The fact that someone could go on an bot-mediated enforcement rampage is not an argument against MoS having line-items about various stuff; the fact that we have rules against doing that is already sufficient to address the rare problem. Given that someone just lost their AWB access as a result of doing something like that should discourage a repeat. Rules do not need 100% compliance to be useful, nor does failure to achieve 100% compliance mean they're insufficient; otherwise civil society would be impossible.And what was once an okay idea can become a poor one over time as circumstances change. E.g., the cutover last month to a new HTML linter for the parser broke all kinds of stuff that used to "okay" or "we don't care", but which is no longer okay, and thus we now do care. The most obvious of these is that unclosed inline elements used to be forcibly closed at the opening of a block element and this is no longer the case, resulting in badly broken, mis-rendering HTML in at least tens of thousands of pages. People have been cleaning this up, including with semi-automation tools like AWB and JWB, yet no one having a shit-fit about it. People will have shit-fits about such activity if it's PoV pushing (e.g. changing all "U.S." to "US", or changing all unspaced em-dash parenthesizing to use spaced en dashes), but they don't lose it over technical cleanup. Another example is that <br>
breaks the output of at least two of the available edit-mode syntax highlighters, and needs to be changed to <br />
; I've already fixed one "Help:"-namespace page from the 2000s that was recommending <br>
, and there are probably some others that need fixing in this regard.
— SMcCandlish ☏ ¢ 😼 16:38, 26 July 2018 (UTC)
Whitespace other than the non-breaking and regular space should be avoided in prose.
–
to –
: I actually proposed that several years ago for the same reason, and did not get consensus. Apparently the average editor, with their fonts, can see the difference clearly, and people were dismissive of the idea because the editing tools below the edit window provide a – button for directly inserting the Unicode character. I think, therefore, this is a lost cause. Editors having trouble seeing the difference between –
, —
, −
, and -
need to use
WP:User CSS or their browser's font settings to use a font for editing that works better for them. I wrote instructions on how to do this at
Help:User style#User CSS for a monospaced coding font. It's not absolutely perfect; the minus and hyphen are still hard to distinguish. If I find a better, free coding font than Roboto Mono I'll put it at the front of the font stack.Apparently the average editor, with their fonts, can see the difference clearly. I suspect instead that that the great majority of editors don't even know there is a difference (and just use hyphen), most of those who know the difference are inserting directly using the click-to-insert gizmo but don't really notice or care what it looks like in the edit window since they never look back, and the very small number of us who are copyediting and checking these things have learned to deal somehow with the difficulty of distinguishing them – in my case, wherever I see a direct/literal character which I know should be an ndash but I'm not sure, I just change it to {ndash} so I know it's right. But I'd rather we encouraged editors to use a coded form in the first place to save that trouble. Unfortunately that would create a new flashpoint for my next point, which is...
MoS doesn't constrain editors in any way as to adding new material– You know that and I know that, but as sure as day follows night someone's gonna paste in a direct rho, someone else is gonna change that to ρ (as recommended in the table), and the first guy's gonna change it back, saying "I like it this way." Having said that, looking over the whole table now I don't see very many cases where that might happen (unless we adopt a recommendation to use coded forms of ndash and mdash) but I still think the wider this is advertised for comment in advance the less trouble there will be.
"Humor is Mandkind's greatest blessing." — Mark Twain". Also not a rule; it just looks better. Neither of these uses is vital. But they're not objectionable. So, we have a handful of use cases we can document, and then discourage it otherwise. Put it in a footnote, probably. I'm a big fan of footnoting "there are some geeky exceptions" stuff instead of clouding the central advice. On horizontal marks: Well, you can try proposing glyph-to-code conversion if you want, but don't hold your breath. With my font tweaking solution, I have no difficulty at all telling en dashes and hyphens apart, in rendered or source view. "The wider this is advertised": Sure, but not while we're still banging on it just with 3 or 4 people. Iron out the obvious kinks, or even more surely that day follows night, people will "strongly oppose" the whole thing on the basis of some nitpick we should have already anticipated. — SMcCandlish ☏ ¢ 😼 20:58, 27 July 2018 (UTC)
These are sometimes used for precision positioning in templates but should not be used in prose. Use either non-breaking (
) or regular spaces
. So who's OK with my formulation These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking
and regular space are normally sufficient
(with or without a footnote as suggested by SM)? I'm fine with the rest of what SM has said.
E
Eng 21:27, 27 July 2018 (UTC)I feel like keeping to the spirit of Wikipedia:Manual of Style#Keep markup simple means saying that &thinsp and &hairsp should not be used around italics, dashes, and §, since either a regular space or no space works just fine. And I agree with that general approach; HTML is not well-suited to pixel-perfect character control, and as long as there are no horribly ugly problems like actually-overlapping characters I don't think we should fuss about that sort of small thing. This sort of layout issue may be better addressed by making web browses render text more beautifully than by throwing in a bunch of site-specific directives.
If we were to start putting &thinsp around, say, emdashes, then I think that would be a good argument for doing that in an {{ emdash}} template, since we'd want it everywhere consistently. I don't think it's a good idea to do that sort of fine-control typography on an article-by-article basis, since then it will not be done consistently.
If {{ endash}}, &endash, and – all do exactly the same thing with no fancy spacing, I can see an argument for having two different ways to do it (one HTML-free and one for easier identification), but three ways seems like too many, when two of them serve almost exactly the same purpose.
That said, I'd rather publish the new tables with some of the rows marked as disputed/under discussion than hold the whole thing until there's consensus on every single part, so at least we can start making progress on the items that everyone agrees on, which seems like 95% of it. -- Beland ( talk) 02:16, 28 July 2018 (UTC)
keeping to the spirit of Wikipedia:Manual of Style#Keep markup simple means saying that &thinsp and &hairsp should not be used around italics, dashes, and §– No, what the linked guideline says is "Other things being equal, keep markup simple... Use HTML and CSS markup sparingly". That's not "should not be used".
HTML is not well-suited to pixel-perfect character control, and as long as there are no horribly ugly problems like actually-overlapping characters– It may not be well-suited, but at times we need to do the best we can, and we're not talking about "pixel-perfect". David Eppstein's example is an excellent one in which neither regular space nor no space is at all acceptable.
I'd rather publish the new tables with some of the rows marked as disputed/under discussion– Well, I think we have our hands full just coming up with tables which faithfully and uncontroversially centralize what is now scattered all over creation. And that would be quite an achievement. Changes to what's being recommended should be a follow-on effort.
@ David Eppstein: @ EEng: Given the above discussion, do either of you have any remaining objections to posting the revised guidelines? -- Beland ( talk) 00:07, 5 August 2018 (UTC)
Assuming we're still doing this, let's while we're at it do something about this insane pile of technical minutiae: WP:How_to_make_dashes. E Eng 23:48, 4 August 2018 (UTC)
E=mc<sup>2</sup>
copy pastes as E=mc2, and can be used in citation templates without boogering the COinS output. I'm wondering if this conflicts with anything in
MOS:NUM and
MOS:TM, and the main MoS page. If so, we need to figure out how to reconcile that. —
SMcCandlish
☏
¢ 😼 22:56, 27 July 2018 (UTC)
H<sub>2</sub>O
is no harder than copy-pasting H₂O
. This is also an accessibility concern, as screen-readers will often chock on Unicode superscripts.
Headbomb {
t ·
c ·
p ·
b} 14:16, 5 August 2018 (UTC)EEng keeps messing with the table layout, forcing them to take huge amounts of vertical space, breaking consistency, scaling/zoom functionality, and forcing unnatural breaks for AFAICT, no real reason but personal preferences. What looks better, [5] + [6] (inline) or [7] + [8] (random vertical breaks)? Headbomb { t · c · p · b} 10:52, 27 July 2018 (UTC)
Proposed for posting to Wikipedia:Manual of Style/Text formatting § HTML character entity references and replacing the second paragraph of "Keep markup simple" at Wikipedia:Manual of Style § Miscellaneous with a link to this new section.
HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name,
decimal number, or
hexadecimal number. For example, €
is the same as €
, €
, or including the character €
directly.
On Wikipedia, characters should be used directly unless doing so is confusing for editors or causes technical problems. Numerical references should not be used if a named reference is available. For example, −
should be used instead of −
, and é
should be used instead of é
. For a comprehensive list of available named references, see
[9].
Wikipedia stores articles with
Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the
Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki>
tag can also be used instead of
character escaping to prevent interpretation of special characters as wiki markup.
Please note: It is always OK, whether using manual or semi-automated means, to fix broken HTML entities by replacing them with characters or correct HTML entities (whichever is preferred in the specific case). (Fully automated fixes would need bot approval.) However, when changing existing text from a disfavored to favored form, especially when making large numbers of changes, WP:MEATBOT asks that editors making manual edits please pay attention to the context and be aware of exceptions to the guidelines. When using automated and semi-automated tools, remember that WP:COSMETICBOT and WP:AWBRULES ask that these tools not be used to make changes of this type unless accompanied by a more substantive (reader-visible) change. Check Wikipedia error 11 is disabled for this reason.
Characters to avoid | | ||
Avoid | Instead use | Note |
---|---|---|
… (… )
|
... (i.e. 3 periods)
|
See MOS:ELLIPSIS. |
Unicode Roman numerals like Ⅰ Ⅱ ⅰ ⅱ
|
Latin letters equivalent (I II i ii )
|
MOS:ROMANNUM |
Unicode fractions like ¼ ½ ¾ ⁄
|
{{ frac}}, {{ sfrac}} | See MOS:FRAC. |
Unicode subscripts and superscripts like ¹ ⁺ ⁿ ₁ ₊ ₙ
|
<sup></sup> <sub></sub>
|
See
WP:SUPSCRIPT. In article titles, use {{DISPLAYTITLE:...}} combined with <sup></sup> or <sub></sub> as appropriate.
|
µ (µ )
|
μ (μ )
|
See MOS:NUM#Specific units |
Ligatures like Æ æ Œ œ
|
Separate letters (AE ae OE oe )
|
Generally avoid except in proper names and text in languages in which they are standard. See MOS:LIGATURES. |
∑ (∑ ) ∏ (∏ ) ― (― )
|
Σ (Σ ) Π (Π ) — (— )
|
(Not to be confused with \sum and \prod, which are used within <math> blocks.) |
‘ (‘ ) ’ (’ ) ‚ (‚ ) “ (“ ) ” (” ) „ („ ) ´ (´ ) ′ (′ ) ″ (″ ) ` (` )
|
Straight quotes (" and ' )
|
Use {{ coord}}, {{ prime}} and {{ pprime}} for mathematical notation; elsewhere use straight quotes unless discussing the characters themselves. See MOS:QUOTEMARKS. |
‹ (‹ ) › (› ) « (« ) » (» )
|
Use ⟨ and ⟩ for math notation.
|
In foreign quotations normalize angle quote marks to straight, per MOS:CONFORM, except where internal to non-English text, per MOS:STRAIGHT. |
       
|
Normal space | These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking ( ) and regular spaces are normally sufficient. Exceptions:
MOS:ACRO,
MOS:NBSP.
|
In vertical lists
|
*
|
Proper wiki markup should be used to create vertical lists. See HELP:LIST#List basics. |
‍ ‌
|
see note | Used in certain foreign-language words, see zero-width joiner/ zero-width non-joiner. Should be avoided elsewhere. |
₤ | £ for GBP, keep ₤ for Italian Lira and other lira currencies that use ₤ (see the main article for that currency) | MOS:CURRENCY; find broken instances |
Potentially confusing or technically problematic characters | | ||
Category | coded form (direct form )
|
Notes |
---|---|---|
Miscellany | & (& ) < (< ) > (> ) [ ( ) ] ( ) ' (' ) | (| )
|
Use these characters directly in general, unless they interfere with HTML or wiki markup. Apostrophes and pipe symbols can alternatively be coded with {{
'}} and {{
!}} or {{
pipe}} . See also
character-substitution templates and
WP:ENCODE.
|
Greek letters | Α (Α ) Β (Β ) Ε (Ε ) Ζ (Ζ ) Η (Η ) Ι (Ι ) Κ (Κ ) Μ (Μ ) Ν (Ν ) Ο (Ο ) Ρ (Ρ ) Τ (Τ ) Υ (Υ ) Χ (Χ ) κ (κ ) ο (ο ) ρ (ρ )
|
In isolation, use coded forms to avoid confusion with similar-looking Latin letters; in a Greek word or text, use the direct characters. |
Quotes | ‘ (‘ ) ’ (’ ) ‚ (‚ ) “ (“ ) ” (” ) „ („ ) ´ (´ ) ′ (′ ) ″ (″ ) ` (` )
|
Can be confused with straight quotes (" and ' ), commas, and with one another.
MOS:STRAIGHT generally requires conversion to straight quotes, except when discussing the characters themselves or sometimes with non-English languages. See next row for prime characters.
|
Apostrophe-like | ' ` ′ ´ ʻ ʼ ʽ ʾ ʼ ʽ ʻ ʼ |
|
Dashes, minuses, hyphens | – (– ) — (— ) − (− ) - (hyphen) ­ (soft hyphen)
|
Can be confused with one another. For dashes and minuses, both forms are used (as well as {{
endash}} and {{
emdash}} ).
Soft hyphens should always be coded with the HTML entity or template. Plain hyphens are usually direct, though at times {{
hyphen}} may be preferable (e.g.
Help:CS1#Pages). See
MOS:DASH,
MOS:SHY, and
MOS:MINUS for guidelines.
|
Whitespace |         ‍ ‌
|
In direct form these are nearly impossible to distinguish from a normal space. See also MOS:NBSP. |
Non-printing | ‎ ‏
|
In direct form these are nearly impossible to identify. See MOS:RTL. |
Mathematics-related | ∧ (∧ ) ∨ (∨ ) ⟨ (⟨ ) ⟩ (⟩ )
|
Can be confused with x ^ v < > . In some cases TeX markup is preferred to Unicode characters; see
MOS:FORMULA. Use {{
angbr}} instead of ⟨ ) / (⟩ )
|
Dots | ⋅ (
⋅ ) · (
· ) • (
• )
|
Can be confused with one another.
Interpuncts (· ) are common in horizontal lists and to indicate syllables in words.
Multiplication dots (⋅ ) are used for math. In practice, the dots are used directly instead of the HTML entities.
|
@ David Eppstein: I thought your opinions might have changed or been refined in response to the comments by SMcCandlish in the discussion of the third draft. SMcCandlish said some interesting things about how to formulate advice against disruptive editing, which I think helped evolve my position. I've tried to integrate both your views in the new paragraph in the above fourth draft. How does that sound to you? -- Beland ( talk) 23:35, 6 August 2018 (UTC)
€
to €
or –
to –
(or vice versa) without strong consensus to do so per
WP:COSMETICBOT. Some of those could end up as minor
WP:GENFIXES (and only when "Unicodify page" is manually enabled), but that's already an option for a lot of things. I think (not sure) AWB exposes invisible characters (non-breaking spaces to
for instance), and I know for a fact that
WP:WikED does it. I'd support changing obscure hex (€
) and dec (€
) codes to their regular (€
) or readable (€
) equivalents on a character-per-character basis though.
Headbomb {
t ·
c ·
p ·
b} 02:28, 7 August 2018 (UTC)
I find what Headbomb says about this sort of misbehavior being disallowed more convincing than your rationales for why you think it should happen, but whatever. There are also still specific problems with your draft.
— David Eppstein ( talk) 06:49, 10 August 2018 (UTC)
-- Beland ( talk) 07:22, 10 August 2018 (UTC)
FTR, I just advertised this proposal on Wikipedia talk:WikiProject Mathematics and Wikipedia talk:WikiProject Science, since these are the topic areas with the most articles that would eventually have to be changed to reach full compliance with this new recommendation. -- Beland ( talk) 06:54, 10 August 2018 (UTC)
@ David Eppstein: You requested evidence above; evidence was given. What are the implications of those findings for you? -- Beland ( talk) 08:32, 15 August 2018 (UTC)
Beland, I thought you said you weren't going to go about making trivial changes that don't alter what the reader sees. [12] E Eng 22:30, 7 August 2018 (UTC)
we should not make invisible and semantics-neutral changes to articles except as part of more substantive edits to the same articles. I continue to be concerned that you seem to be on some kind of uniformity-for-uniformity's-sake crusade, and that your machinery for "automatically find misspellings, mistakes in English grammar, and violations of the Wikipedia:Manual of Style" ( WP:Typo_Team/moss) will lead to mindless gnomish "corrections" of things that were right in the first place or simply don't need to be changed. E Eng 11:54, 12 August 2018 (UTC)
This is certainly something that should be left up to the individual editor, for various good reasons.
One good reason is that... there is no one clear correct or better way.
A second good reason is that adding another needless rule bogs down the MOS with more detail and makes it harder to learn and harder to use.
A third good reason is that creating a rule means enforcement, it puts interactions about the matter into an enforcement mode where editors are playing rules cop with other editors and this is not as functional as peer-to-peer interactions.
A fourth good reason is that there's zero evidence that it matters to the reader.
A fifth good reason is that micromanaging editors to this level is demoralizing and not how you attract and nurture a staff of volunteer editors – for instance we have a stupid micromanaging rule that I have to write "in June 1940" and not "in June of 1940" which is how I naturally write, and every stupid micromanaging rule like this is just another reason to just say screw it. As the Bible says "Thou shalt not muzzle the ox that treadeth out the corn" ( 1 Timothy 5:18, paraphrased from Deuteronomy 25:4) which updated means "Let the editor who did the actual work of looking up the refs and writing the friggen thing -- you know, the actual work of the project -- be at least allowed the satisfaction of presenting it as she thinks best, within reasonable constraints"...
This means different articles will do it differently. This annoys a certain type of editor. Oh well...
Please read that carefully and think about it. E Eng 23:19, 12 August 2018 (UTC)
@ EEng: I don't think we should blame the teacher in this example for failing to find the phrase in question because they have not chosen the words a "sensible" person would. A lot of times it makes sense to choose the rarest thing to search on because TFIDF ranks that highly. A sensible person would expect the search engine to find the phrase in question regardless of whether the searcher picked your words or mine.
In real life, I'm a programmer, and I often need to look up operators, which are usually punctuation. For privacy reasons and because they are a potential competitor, I generally prefer to avoid using Google and if I'm searching Wikipedia I use Wikipedia's internal search engine. In the case of special characters, that is often mandatory. For example, if you search Google for "site:en.wikipedia.org 0.75 ‰" you will also not see Great Western main line in the top search results either, because Google drops ‰ from that search entirely. I'm also a linguist, and sometimes I need to research symbols in various languages. For example, if I'm doing machine translation work with French, I might need to know more about how « is used by that language. Right now if I do a full-text search for that on the Wikipedia site, I only get one article in the search results. It's Guillemet and that's very helpful, but if I search for « there are dozens more results. If I search for "«" or "« site:en.wikipedia.org" on Google, I don't get Guillemet at all. I can file a bug report with Google that they may or may not do anything about, but I can fix Wikipedia's search engine right now by converting all the « to «.
If the number of editors this affects is "vanishingly small" in comparison to the size of the project, then the number of changes needed to implement the proposed guidelines is similarly small, and thus the amount of disruption is similarly small. If we have consensus that such changes are either neutral or small improvements (opinions range) but no one thinks they are negative or would want to undo them, and I'm willing to put in the work to do them, then what's the problem?
As for Herostratus' wisdom, I think I agree with most or all those points applied to the situation upon which they are commenting. English is complicated and people can tell the difference between clear and unclear prose without having an enormous rulebook. But most of these arguments don't jive for me with this case, which is not about how to phrase English prose.
As for the second reason, it's a valid concern that the Manual of Style not get too long. I don't think anyone actually reads it end-to-end, though, so this is not my biggest worry. When I use it, I tend to be looking for the section that answers a particular question that's come up. It seems to me like the most logical place to put this information, but I'm open to putting it somewhere else if it's considered too obscure for a general audience. Would that be preferable? -- Beland ( talk) 02:43, 13 August 2018 (UTC)
failed to give a plausible example of what's not working now which all this will fix. E Eng 23:06, 17 August 2018 (UTC)
insource:"search_term"
) work for an exact search (some exceptions apply, mostly punctuation), whereas the regular expression delimiters will actually execute a regex search (which take longer). Probably the bug you've run into is a result of search folding, were I a guessing man. --
Izno (
talk) 02:58, 19 August 2018 (UTC)Hi all, not sure where the best place to ask this question is, but quickly: If we find a reference like this, how are we supposed to present the title of the article in our reference formatting?
The broad question is, how much (if any) work do we do to conform the reference's title to our MOS? Thanks, Cyphoidbomb ( talk) 14:43, 19 August 2018 (UTC)
Reading through Italian Greyhound, I noticed that both Italian and Greyhound were both capitalised. I see why 'Italian' would be, but 'Greyhound' didn't seem right, so I started changing them. I just checked pages for other dog breeds though, and noted that they are the same - see Whippet, Greyhound, Sloughi. I've looked at MOS:COMMONNAMES and it does not give any justification for this - are there any other rules I'm missing, or should I edit these pages to conform with normal capitalisation rules? Girth Summit ( talk) 18:23, 23 August 2018 (UTC)
Radiohead have a song called Go to Sleep, which is listed as "Go To Sleep." on some tracklists. (In fact, this goes for every track on their album Hail to the Thief.) Should the period be included when we mention the song on Wikipedia? My vote would be no, because I see it as a stylization, but I can't find anything in the MOS that specifically backs me up or shoots me down. MOS:CONFORMTITLE might apply but it doesn't seem black and white. Popcornduff ( talk) 09:39, 25 August 2018 (UTC)
In an article, the main intro (lede) summarizes the article overall.
But it may have a subsection with its own intro (lede) and subsections. For example:
==Political activity controversies== [Summary of section/section lede] == Involvement in X== ... == Donations to Y== ... == Consultancy to Z== ...
My question is hard to express exactly so I've worded it a few different ways in the hope it expresses the underlying concern.
Thanks for any help and insight. FT2 ( Talk | email) 11:00, 25 August 2018 (UTC)
==Political activity controversies== General information on political activity and anything not covered in the subsections below. === Involvement in X=== X specific material
====2016 exposé==== What he did re X
====2017 impeachment attempt==== Moves to impeach him for the exposé
=== Donations to Y=== Y specific material
HTH, Martin of Sheffield ( talk) 12:00, 25 August 2018 (UTC)
Where a large number of articles are very closely related, and contain sections which are interchangeable or nearly so are duplicated within those articles, should Wikipedia encourage use of a boilerplate system instead of copying the same section over three hundred or more articles? I think the "template" system might end up being too rigid? Any suggestions? Can this be done? Or should we keep on cut-and-pasting such sections? Thanks! Collect ( talk) 18:55, 3 August 2018 (UTC)
{{
unreferenced section}}
on lots of
Family tree templates. In those case I also included instructions of how those templates could contain be self contained inline-citations using {{
efn-lr}}
and {{
notelist-lr}}
(see for example
Template:Kennedy family tree).@ Deacon Vorbis: This came about as a result of a discussion at Talk:List of chemical elements#spelling, after I queried why aluminium and sulfur appeared in the same article. I was refered to WP:ALUM. That made it clear that the internationally accepted spellings should be used. It doesn't just refer to article titles, as it states "... even if they conflict with the other national spelling varieties used in the article." As I'm sure I'm not the only person who's not aware of that convention, I believe it should be listed as an exception to ENGVAR. Voice of Clam (formerly Optimist on the run) ( talk) 14:20, 24 August 2018 (UTC)
In the context of automobile articles, we would be pushing it uphill to get Americans to talk about aluminum wheels. Very few of the editors on automobile articles would know anything about chemistry or know about the WP policies for chemistry. Indeed, I only found out about WP:ALUM today and I have been contributing for over 10 years, have been working in various engineering related fields for 30 years and remember most of my high school chemistry. Stepho talk 03:58, 25 August 2018 (UTC)
we would be pushing it uphill to get Americans to talk about aluminum wheels– say what? E Eng 04:37, 25 August 2018 (UTC)
What does IUPAC mean when it says "The alternative spelling 'aluminum' is commonly used." and "The alternative spelling 'cesium' is commonly used." in its Recommendations (Table I of the Red Book)? DrKay ( talk) 07:41, 25 August 2018 (UTC)
Opinions are needed at Talk:Health and appearance of Michael Jackson#Structure. The latest discussion in that section concerns whether or not articles should have see also sections and whether what MOS:MED states at Wikipedia:Manual of Style/Medicine-related articles#Standard appendices should apply to what to do with this particular article's See also section. A permalink for the discussion is here. Flyer22 Reborn ( talk) 23:43, 29 August 2018 (UTC)
A requested move of –30– (The Wire) to -30- (The Wire) was just relisted due to lack of input. Regulars of this page are the only people I can think of who are knowledgeable of and interested in en-dashes vis-a-vis hyphens, so I notify you of the RM. (I assume my notice is neutral because I am neutral and don't care which line is used here.) -sche ( talk) 16:02, 30 August 2018 (UTC)
Do we have a rule about using bold, capital letters, exclamation points, color, font size, etc., to give one section on a talk page more prominence? Because whatever I want to say is obviously far more important than anything anyone else has to say... (: -- Guy Macon ( talk) 20:20, 23 August 2018 (UTC)
"try to avoid using bold markup", but it's hard to see what reasonable justification someone could give for using bold in section headings. -- tronvillain ( talk) 20:51, 23 August 2018 (UTC)
...but is it worth adding a specific rule forbidding it? I hear little birds, and they are chirping WP:CREEP, WP:CREEP. On the other hand, look at the current table of contents at Wikipedia:Village pump (technical)... -- Guy Macon ( talk) 23:34, 23 August 2018 (UTC)
WP:JG doesn't seem to have a strong opinion one way or the other, if I'm reading it right. I just added a Japanese figure whose death date I converted to the Gregorian calendar using an online tool. I don't actually know whether all the other dates included in the list are Julian or Gregorian. Is there a rule here? It seems like lists like that should be internally consistent. The only solid example I could think of off the top of my head, where English Wikipedia would definitely list two specific people in different countries who died on the same date, according to different calendars so that they actually died several days apart, was the famous Cervantes/Shakespeare mess here, which explicitly notes that the Shakespeare date is OS. But are all dates for 11th-century Europe assumed to be Julian? Or what? Should I change the Japanese date to Julian? Hijiri 88 ( 聖 やや) 11:59, 1 September 2018 (UTC)
In cases like this, does MOS have any guidance on what is "better"? Gråbergs Gråa Sång ( talk) 08:31, 3 September 2018 (UTC)
Many address, for places large and small, have the city state and zip code based on the post office delivering mail. It is not so unusual that this disagrees with the actual city boundary. Sometimes this matters, such as indications in articles about specific cities. Gah4 ( talk) 06:42, 4 September 2018 (UTC)
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 200 | ← | Archive 205 | Archive 206 | Archive 207 | Archive 208 | Archive 209 | Archive 210 |
A rhyme scheme is a pattern that appears in the lines of a poem, and generally letters are used to notate the pattern, for example "ABAB CDCD". Sometimes these sequences are very long, the longest I could find on Wikipedia being "abacabadabacabaeabacabadabacabafabacabadabacabaeabacabadabacaba". There is considerable variation in how this notation is capitalized and punctuated:
In some cases, the main article says that the notation requires a specific capitalization. For example, "aBaBccDDeFFeGG" distinguishes masculine and feminine rhymes with lower vs. upper case. I think it would be nice, for human readability reasons, to settle on a consistent style for this notation. To me the sequences are nicely distinguished from sentence prose when they are either in quotes, all caps, or both. I'm open to whatever poetry-editing editors want to do, but for the sake of having a starting point for discussion, how about the below? I have asked for input from Wikipedia talk:WikiProject Poetry. -- Beland ( talk) 22:00, 25 July 2018 (UTC)
Unless otherwise required by a specific notation, rhyme schemes should generally be written when appearing in prose:
Example: This poem uses the "ABAB" rhyming pattern.
I'd like to see, at the least, evidence for A1 or A2 before we even think about embarking on such a debate, because if MOS does not need to have a rule on something, then it needs to not have a rule on that thing. E Eng 21:55, 26 July 2018 (UTC)
For A1, Rhyme scheme uses both lowercase-in-quotes and uppercase-no-quote styles in its prose, but mostly uses uppercase when explaining the different notations. I think this looks ugly and unprofessional because it is inconsistent, and the inconsistency continues when compared to other poetry pages, found by a moss scan:
-- Beland ( talk) 02:23, 28 July 2018 (UTC)
"abab"
and ABAB CDCD EFEF GHGH
to describe traditional rhymes, which should be using the same subnotation. We could also just say something like "Unless differences between uppercase and lowercase letters are being used in a meaningful way, all letters should be uppercase" and "Unless spacing or punctuation is being used in a meaningful way (such as to separate stanzas), patterns should be written without spaces or dashes; spaces are preferred over dashes when indicating groups of lines." --
Beland (
talk) 07:32, 7 August 2018 (UTC)I need some guidance from folks who do care about this stuff so I can correctly program my database scanner and advise other editors how to fix this type of problem– How about if you just don't do that? That would save us all a huge amount of trouble. E Eng 22:44, 6 August 2018 (UTC)
Someone else is going to have to take over. This is hopeless. E Eng 03:21, 7 August 2018 (UTC)
@ David Eppstein: Did you have any thoughts on the alternatives raised in my comment from 07:32, 7 August 2018, above? -- Beland ( talk) 17:52, 12 August 2018 (UTC)
Well, I have a practical problem in front of me I'm trying to solve, which will require investing some work, regardless of the outcome of this discussion. David, you expressed a concern that any guideline Wikipedia puts out would need to follow academic sources. That's an actionable concern. First, it's a debatable question that has pros and cons that can be explored. We also started to look at potential sources to follow, which I think are inconsistent enough to say we can set our own style to some degree. We currently disagree about how to proceed; the only way I know how to come to consensus is to discuss arguments pro and con on their merits and compare opinions and try to find compromise. There are also alternatives to modifying the Manual of Style; for example, I could advise editors who are looking at these instances to check articles for sources and see if there is a particular style being used for a particular reason, and we could develop practices bottom-up rather than top-down. I was hoping to get at least local consensus before posting an RFC, but if neither of you wish to attempt to reach consensus among us, I can just go ahead and open that now and see what other editors have to say. -- Beland ( talk) 03:09, 13 August 2018 (UTC)
Greetings all, I'm currently updating the style-checking code that reports to Wikipedia:Typo Team/moss, and I need some clarity on which HTML character entity references (things like &) are allowed or preferred. Variations that are not allowed or which are disfavored would be brought to the attention of human editors, along with other suspected style and spelling errors. There are occasional mentions of such entities in the Manual of Style, but no general rules that I could find. I would propose the following:
(edited to reflect the below comments)
HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name, decimal number, or hexadecimal number. For example, "€" is the same as "€", "€", or including the character "€" directly. For a comprehensive list, see List of XML and HTML character entity references. Wikipedia editors are encouraged to follow these guidelines to make it easier for editors to read and understand wikitext, especially those not familiar with HTML notation.
What do folks think? -- Beland ( talk) 19:39, 14 July 2018 (UTC)
The proposal should be revised to make it clear how it relates to the advice already in the MOS at WP:MOS#Keep markup simple,
Α
is explicit whereas Α
(the upper-case form of Greek α
) may be misidentified as the Latin A
.Also the proposal should indicate where this addition would go into the MOS; context matters.
The proposal contains the statement "The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards." That's only partially true; in the version I use, there are a variety of special characters to choose from, but when I hover over them, there isn't any little hint that pops up telling me what the name of the character is. So it is hard to be sure if a character is an n dash or a minus. In another case, it's hard to tell a prime from an apostrophe. I've learned to tell an n dash from a hyphen, but I'll bet there's lots of editors who can't. Jc3s5h ( talk) 22:18, 14 July 2018 (UTC)
(Edited to reflect the below discussion)
HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name,
decimal number, or
hexadecimal number. For example, €
is the same as €
, €
, or including the character €
directly. For a comprehensive list, see
List of XML and HTML character entity references
[2].
In choosing between the numeric reference, named reference, and direct character methods, Wikipedia never uses the numeric reference when a named reference is available, and it usually prefers direct character input over named references (and edits in this direction are made by semi-automated systems like
AutoWikiBrowser). For example, −
should be used instead of −
, and é
should be used instead of é
. Wikipedia stores articles with
Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the
Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki>
tag can also be used instead of character escaping to prevent interpretation of special characters as wiki markup. These preferences are detailed in the table below, and some instances where a given character is preferably not used at all (except where that character is itself the topic of discussion) are noted. Wikipedia editors are encouraged to follow these guidelines to make it easier for editors to read and understand wikitext, especially those not familiar with HTML notation.
Category | Preferred forms | Exceptions and notes |
---|---|---|
ASCII characters | ! " % & ' + < = > [ ] | Sometimes proximity to other characters causes misinterpretation of & , < , > , , , or ' as part HTML markup or wiki markup. In these cases, use & , < , > , [ , ] or ' .
|
Latin and Germanic letters | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ Œ œ Š š Ÿ | Instead of ligatures (Æ, æ, Œ, œ) write two separate letters, except in proper names and in text in languages in which they are standard – see Wikipedia:Manual of Style § Ligatures. |
Greek letters | Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω ϑ ϒ ϖ | When written standalone (not part of a Greek word with other Greek characters), the following can be used to reduce confusion with similar-looking Latin alphabet letters: Α Β Ε Ζ Η Ι Κ Μ Ν Ο Ρ Τ Υ Χ κ ο ρ . μ (mu) and Σ (sigma) are nearly identical to µ (micro) and ∑ (sum), but the other characters are not used in Wikipedia so there is no potential for confusion.
|
Quote marks | ‘ ’ ‚ “ ” „ ´ ′ ″
|
ASCII quote marks are
generally preferred.
Wikipedia:Manual of Style/Dates and numbers § Specific units says not to use ′ and ″ for inches and feet.
|
Dashes | –/– —/— ― ­
|
― is not used by Wikipedia. For more info on ­ (optional hyphen) see
MOS:SHY.
|
Whitespace and non-printing |       ‌ ‍ ‎ ‏
|
  ,   , ‌ , and ‍ are generally unnecessary. For more info on text direction, see
MOS:RTL.
|
Math | × ÷ √ ∝ ∝ ¬ ± ∂ ∇ ℵ ℜ ℑ ℘ ∀ ∃ ∈ ∉ ∋ ∅ ∏ ∑ ∠ ∧ (∧ confused with ^) ∨ (∨ confused with v) ∩ ∪ ∫ ∴ ∼ ≅ ≈ ≠ ≡ ≤ ≥ ⊂ ⊃ ⊄ ⊆ ⊇ ⊕ ⊗ ⊥ ⌈ ⌉ ⌊ ⌋ ⟨ (⟨ confused with <) ⟩ (⟩ confused with >)
|
In some cases TeX markup is preferred to Unicode characters; see
Wikipedia:Manual of Style/Mathematics § Typesetting of mathematical formulae. × (× ) is used in article titles and also for hybrid species. ∑ (sum) should not be used; Wikipedia uses the nearly identical Σ (sigma).
|
Currency | ¢ £ ¤ ¥ € $ | |
Non-English punctuation | ¿ ¡ « » ‹ ›
|
‹ and › are not used by Wikipedia; < and > can be used instead.
|
Dots | · • ⋅ …
|
"..." is preferred to "…" - see MOS:ELLIPSIS. Wiki markup should be used instead of these for lists; see Wikipedia:Manual of Style/Lists § List layout. |
Diacritics | ¨ ¸ ‾ ˜ ˆ | |
Arrows | ← ↑ → ↓ ↔ ↵ ⇐ ⇑ ⇒ ⇓ ⇔ | |
Other symbols | ¦ § © ® ™ ° µ ¶ † ‡ ƒ ‰ ◊ ♠ ♣ ♥ ♦ | µ (micro) is not used by Wikipedia; use μ (lowercase Greek letter mu) instead - see Wikipedia:Manual of Style/Dates and numbers § Specific units |
Superscript and subscript | ¹ ² ³ ª º | Do not use
Unicode subscripts and superscripts like these for numbers, per
Wikipedia:Manual of Style/Superscripts and subscripts; use <sup> and <sub> instead.
|
Fractions | ¼ ½ ¾ ⁄
|
These are not used unless discussing the characters themselves; for alternatives, see Wikipedia:Manual of Style/Dates and numbers § Fractions and ratios |
Above is is a draft of a definitive list of whether the HTML reference or the character itself should be used, as suggested by other editors above. I noticed a few things:
∼
) and ~ (ASCII tilde) seem to be used interchangably but ∼
itself is used very rarely.-- Beland ( talk) 08:12, 15 July 2018 (UTC)
usually prefers direct character input over named references– That's too sweeping. I can see this is gonna take a lot of discussion. For starters, pinging David Eppstein for his thoughts on literal or symbolic for math symbols (not meaning to imply there's one simple answer to that). Not pinging SM because he'll find his was here without doubt and his user name is too hard to get right and it's late and I'm tired. E Eng 08:32, 15 July 2018 (UTC)
−
as otherwise it's too difficult to distinguish from &ndash. Otherwise I don't feel strongly but I know I have seen legions of random AWB users replace &
times; (e.g.) by its unicode character. So we should not encourage replacements that go the other way. —
David Eppstein (
talk) 16:30, 15 July 2018 (UTC)
μ
as the metric prefix for micro. I know some Unicode characters were created for obscure reasons such that Wikipedia has no interest in using those characters; I infer from it's low numerical code value µ
(U+00B5, µ) exists as a way of coding the micro symbol that was used in some pre-Unicode character codes that didn't provide for most Greek letters, to permit round-tripping between those older character codes and Unicode. According to the
Unicode Consortium, the Greek letter character is preferred,
[1]. Maybe use the Greek letter mu directly, whether in a Greek word, the archaic stand-alone symbol for micrometer, or the metric prefix, and explicitly encourage editors to replace µ (U+00B5) with μ (U+03BC).
Jc3s5h (
talk) 10:31, 15 July 2018 (UTC)
References
{{
cite web}}
: Cite has empty unknown parameter: |dead-url=
(
help)
EEng made a good find, that $
was missing. It turns out that this is because
List of XML and HTML character entity references only goes up to HTML 4, and HTML 5 has a ton more, listed
here. Given the length of the resulting table if we include all of them, maybe we should just say "use the character itself except for those listed below" and list the ones where named references should be used? (And maybe continue to list the characters that should not be used at all?) --
Beland (
talk) 03:53, 16 July 2018 (UTC)
a-zA-Z0-9`~!@#$%^&*()-_=+[]{};':",./<>?
should be given via &foo;
or {some template}. Also, the table mixes advice on how to express various characters with advice on whether and when to use various characters. Not saying that's bad, just worth noting.
E
Eng 04:12, 16 July 2018 (UTC)
á
. More generally I am in favor of using unicodes over html entities or templates in most cases, with exceptions for characters like &
(when written next to something that would cause it to expand to a different entity) or −
(because there is too much possibility for confusion with other dash-like characters). Also, as an aside, the text above about avoiding ligatures is too strong; when these characters occur in the standard spelling of a name (
e.g.), we should write them that way even when we are writing in English. —
David Eppstein (
talk) 04:25, 16 July 2018 (UTC)
–
by a ratio of about 10.6:1.Well, you were using your guess that the numbers were the other way around– No, you're mixing up two different things. I conjectured that ndashes and mdashes, together, make up the bulk (counting each use separately) of all these not-on-the-keyboard characters; that was without regard to how those characters were expressed (literal vs. symbolic).
that the UI is the way that it is may be an indication that there is not great support for using – and friends– WP's facilities and interfaces are full of debris that's little used or even "impossible" to use (e.g. template parameters that want to present information that an RfC has determined should never be presented). Trying to infer how things are spozed to be based on things you see in the UI will get you way off track very, very fast.
I can generally tell the difference between dashes of different lengths– So can I, easily in the rendered page, but in the wikisource only with a bit of effort, if I make a point of looking. It's that last bit that's the rub: in the rendered page an ndash vs. mdash look like – vs. —, but in the wikisource they're much more similar i.e.
–
vs. —
. (What you see in that sentence may depend on your skin, so your mileage may very.) Thus it's easy in copyediting to not notice that the wrong one is present, and that's why symbolic names should be used instead. (If we really cared we'd suggest that hyphens be rendered as &hyp; as well. I actually tried that once in an article but got laughed off the stage, so we'll just have to live with using the literal -
. What I usually do is when I see e.g. a date range like 1899-1920
, I just change the literal hypheny-dashy thing that's there to &ndash, so that I know it's the right thing.)I haven't heard a good argument for why those shouldn't just be used directly– Clearly a quotation in a language using a non-Roman script should just present that text literally. For everything else, there are a lot of pros and cons relating to how many different special symbols are used (in a given article), the extent to which each one is used repeatedly, how potentially confuse-able they are for one another or for something else not even used on the page page, the likely sophistication of editors who might work on the article, and a lot more. Here's a random example: WP:MOSNUM says arcminutes should be denoted by a prime and not an apostrophe or a single quote i.e. ′ but not ‘ or ' . Once again, you have to be looking to notice if the wrong one is there; thus MOSNUM suggests that the markup ′ be used to save editors squinting. Unfortunately different considerations come into play for different symbols, so separate analyses are needed in each case. That's why I predicted this discussion would take a long time.
As for the general direction of the advice, using characters directly seems to be the recommended best practice for web development generally. It's more WYSIWYG and easier for web editors to read and think about. It also fits the goal of not forcing editors to learn HTML in order to be able to use Wikipedia; they can just input and edit these characters in the same way they do elsewhere like Word or phone apps or other web sites. We also have a UI right below the text-being-edited box which encourages people to add the characters directly; it would be weird if the advice is to generally use the references because that's not what the system is designed to encourage. The escaping system was originally designed to allow input of special characters that were part of SGML or HTML itself (like angle brackets). Later it became a way to work around the limitations of ASCII. But modern web sites all use Unicode now, as does Wikipedia, so it's a bit of an obsolete workaround. I think any system where you have to learn a special language for telling a computer something is less user-friendly than a system where you can express your intention in the way you would express it to other humans. -- Beland ( talk) 06:29, 16 July 2018 (UTC)
General comment This discussion may affect WP:CHECKWIKI error 11. The error is currently disactivated. -- 11:10, 16 July 2018 (UTC)
<code>...</code>
or perhaps with {{
kbd}}
, whatever looks better (semantically, it can be either – it's code when viewed in the wikitext but also input when you're entering it). If we don't like any of the faint-background effects, use bare <kbd>...</kbd>
, which just uses monospace. I would go with <code>
because the table already uses a light grey and it blends in well, while also not requiring any template calls.–
—/—
". Try: "– (–
), — (—
),". Also, "For more info on (optional hyphen) see MOS:SHY" is a misuse of parentheses (round brackets), seeming for some kind of emphasis. Should just remove them. 
; like  
it is generally only used for kerning in templates and such; there is usually not any reason to manually insert either into an article.‹
and ›
are not used by Wikipedia; <
and >
can be used instead" is wrong; the are not the same character and should not be confused. If we need to illustrate French quotation style, etc., use the correct characters, not lesser-than and greater-than, which serve an entirely different purpose. This is pretty much exactly like hyphen vs. dash vs. minus.Posted to Wikipedia:Manual_of_Style/Text_formatting#HTML_character_entity_references
Proposed as new subsection titled "HTML character entity references" under Wikipedia:Manual of Style § Miscellaneous, replacing the second paragraph of "Keep markup simple".
HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name,
decimal number, or
hexadecimal number. For example, €
is the same as €
, €
, or including the character €
directly.
On Wikipedia, characters should be used directly unless doing so is confusing for editors or causes technical problems. Numerical references should not be used if a named reference is available. For example, −
should be used instead of −
, and é
should be used instead of é
. Edits favoring these conventions are made by semi-automated systems like
AutoWikiBrowser. For a comprehensive list of available named references, see
[3].
Wikipedia stores articles with
Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the
Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki>
tag can also be used instead of
character escaping to prevent interpretation of special characters as wiki markup.
Characters to avoid | | ||
Avoid | Instead use | Note |
---|---|---|
… (… )
|
... (i.e. 3 periods)
|
See MOS:ELLIPSIS. |
Unicode Roman numerals like Ⅰ Ⅱ ⅰ ⅱ
|
Latin letters equivalent (I II i ii )
|
MOS:ROMANNUM |
Unicode fractions like ¼ ½ ¾ ⁄
|
{{ frac}}, {{ sfrac}} | See MOS:FRAC. |
Unicode subscripts and superscripts like ¹ ⁺ ⁿ ₁ ₊ ₙ
|
<sup></sup> <sub></sub>
|
See
WP:SUPSCRIPT. In article titles, use {{DISPLAYTITLE:...}} combined with <sup></sup> or <sub></sub> as appropriate.
|
µ (µ )
|
μ (μ )
|
See MOS:NUM#Specific units |
Ligatures like Æ æ Œ œ
|
Separate letters (AE ae OE oe )
|
Generally avoid except in proper names and text in languages in which they are standard. See MOS:LIGATURES. |
∑ (∑ ) ∏ (∏ ) ― (― )
|
Σ (Σ ) Π (Π ) — (— )
|
(Not to be confused with \sum and \prod, which are used within <math> blocks.) |
‘ (‘ ) ’ (’ ) ‚ (‚ ) “ (“ ) ” (” ) „ („ ) ´ (´ ) ′ (′ ) ″ (″ ) ` (` )
|
Straight quotes (" and ' )
|
Use {{ coord}}, {{ prime}} and {{ pprime}} for mathematical notation; elsewhere use straight quotes unless discussing the characters themselves. See MOS:QUOTEMARKS. |
‹ (‹ ) › (› ) « (« ) » (» )
|
Use ⟨ and ⟩ for math notation.
|
In foreign quotations normalize angle quote marks to straight, per MOS:CONFORM, except where internal to non-English text, per MOS:STRAIGHT. |
       
|
Normal space | These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking ( ) and regular spaces are normally sufficient. Exceptions:
MOS:ACRO,
MOS:NBSP.
|
In vertical lists
|
*
|
Proper wiki markup should be used to create vertical lists. See HELP:LIST#List basics. |
‍ ‌
|
see note | Used in certain foreign-language words, see zero-width joiner/ zero-width non-joiner. Should be avoided elsewhere. |
₤ | £ for GBP, keep ₤ for Italian Lira and other lira currencies that use ₤ (see the main article for that currency) | MOS:CURRENCY; find broken instances |
Potentially confusing or technically problematic characters | | ||
Category | coded form (direct form )
|
Notes |
---|---|---|
Miscellany | & (& ) < (< ) > (> ) [ ( ) ] ( ) ' (' ) | (| )
|
Use these characters directly in general, unless they interfere with HTML or wiki markup. Apostrophes and pipe symbols can alternatively be coded with {{
'}} and {{
!}} or {{
pipe}} . See also
character-substitution templates and
WP:ENCODE.
|
Greek letters | Α (Α ) Β (Β ) Ε (Ε ) Ζ (Ζ ) Η (Η ) Ι (Ι ) Κ (Κ ) Μ (Μ ) Ν (Ν ) Ο (Ο ) Ρ (Ρ ) Τ (Τ ) Υ (Υ ) Χ (Χ ) κ (κ ) ο (ο ) ρ (ρ )
|
In isolation, use coded forms to avoid confusion with similar-looking Latin letters; in a Greek word or text, use the direct characters. |
Quotes | ‘ (‘ ) ’ (’ ) ‚ (‚ ) “ (“ ) ” (” ) „ („ ) ´ (´ ) ′ (′ ) ″ (″ ) ` (` )
|
Can be confused with straight quotes (" and ' ), commas, and with one another.
MOS:STRAIGHT generally requires conversion to straight quotes, except when discussing the characters themselves or sometimes with non-English languages. See next row for prime characters.
|
Apostrophe-like | ' ` ′ ´ ʻ ʼ ʽ ʾ ʼ ʽ ʻ ʼ |
|
Dashes, minuses, hyphens | – (– ) — (— ) − (− ) - (hyphen) ­ (soft hyphen)
|
Can be confused with one another. For dashes and minuses, both forms are used (as well as {{
endash}} and {{
emdash}} ).
Soft hyphens should always be coded with the HTML entity or template. Plain hyphens are usually direct, though at times {{
hyphen}} may be preferable (e.g.
Help:CS1#Pages). See
MOS:DASH,
MOS:SHY, and
MOS:MINUS for guidelines.
|
Whitespace |         ‍ ‌
|
In direct form these are nearly impossible to distinguish from a normal space. See also MOS:NBSP. |
Non-printing | ‎ ‏
|
In direct form these are nearly impossible to identify. See MOS:RTL. |
Mathematics-related | ∧ (∧ ) ∨ (∨ ) ⟨ (⟨ ) ⟩ (⟩ )
|
Can be confused with x ^ v < > . In some cases TeX markup is preferred to Unicode characters; see
MOS:FORMULA. Use {{
angbr}} instead of ⟨ ) / (⟩ )
|
Dots | ⋅ (
⋅ ) · (
· ) • (
• )
|
Can be confused with one another.
Interpuncts (· ) are common in horizontal lists and to indicate syllables in words.
Multiplication dots (⋅ ) are used for math. In practice, the dots are used directly instead of the HTML entities.
|
FTR, as of the July 1, 2018 database dump, [ is used about 329 times and &lbracket; is used about 91 times, so I picked the more common one. -- Beland ( talk) 15:04, 18 July 2018 (UTC)
I posted this to Wikipedia:Manual_of_Style/Text_formatting#HTML_character_entity_references (there's another section there that talks about Unicode PUA and RTL characters) and cross-referenced from Wikipedia:Manual of Style § Miscellaneous. Feel free to edit the live version as needed. -- Beland ( talk) 05:56, 20 July 2018 (UTC)
Might be worth adding a comment in the Greek notes that the same sort of thing applies to Cyrillic letters that look like Latin and Greek ones; use the entity codes for clarity when discussing particular characters, but use the Unicode in actual Russian, Ukranian, etc. words. We probably needn't dwell on the details, since there's another proposal open for centralizing all the scattered Cyrillic-related material to one page. Then again, that's mostly to be about transliteration, so maybe the Greek section in the table should be Greek and Cyrillic? — SMcCandlish ☏ ¢ 😼 04:11, 22 July 2018 (UTC)
So after I posted the tables proposed above, David Eppstein reverted, with the edit summary "what part of "I think you should be more patient"..."Try proposing something narrower and more specific" do you not understand?".
I think I did not see those remarks by David Eppstein and SMcCandlish because they were posted in the discussion ("Fraction slash" below) about the "Slashes" section of the main MOS page, which I did not check for comments before updating the "Text formatting" MOS subpage. SMcCandlish wanted a one-word change to the "Slashes" section, which he implemented. I think David Eppstein was commenting on the change he reverted, as he then wrote:
"Bludgeon" sounds pretty ugly and mean. I started a project to spell-check all Wikipedia, which is intended to improve its readability and credibility. Along the way I noticed that editors have also occasionally misspelled HTML character entity references. I thought as long as we're cleaning up the misspellings, we might as well clean up any undesirable forms, because right now we don't seem to be representing them consistently. I started this discussion because I couldn't find any guidance in the Manual of Style to help me write the code to correctly flag undesirable forms vs. ignore desirable forms.
Mediawiki markup uses this part of HTML syntax, and if we have a preferred form for these things we'd want to communicate that to editors, and the Manual of Style is the place to document choices of style rather than technical how-to for the benefit of editors, so I don't understand the criticism that this is not the right place for this sort of guideline. Especially since Wikipedia:Manual of Style#Keep markup simple already discusses exactly this point, and the other sections linked from the proposed tables also address which characters are preferred.
We already encourage editors to make edits that have no reader-visible changes but do have editor-visible changes intended to make wikitext easier to read and thus articles easier to edit. That's the whole point of Wikipedia:WikiProject Wikify and wikification. I do agree there are some edits that don't improve readability all that much that aren't that worthwhile on their own, like changing "==xx==" to "== xx ==". This seems less trivial than that. I'd also note we have Wikipedia:HTML5, a project which is doing nothing but replacing obsolete HTML tags with newer ones, with hopefully no user-visible changes.
There are less than 20,000 articles that even have HTML character entity references at all, less than 3.5% of all articles. Even if we changed all of them today, given the sheer volume of changes to the encyclopedia it would not be a big deal, and in reality it will probably take months or years to manually change all the instances, if that's what we want to do. At worst, editors who notice these changes happening will be educated about the desired way of doing things, and be more likely to input characters that way when adding new text.
Given that editors seem to use characters a lot more than references, and given that characters are built into the Wikipedia UI, it seems a lot less disruptive to move toward characters than away from them.
To illustrate the difference it makes to editors, consider an editor who comes across "São Paulo" in wikitext. To most people who are not web developers, that looks like a typographical error. Some English-speaking people might correct it to "Sao Paulo" which is often seen in English, or, getting the idea there might be an accent there, to "Sáo Paulo", which is incorrect. "São Paulo" is what Portuguese speakers are expecting to see - it's what they type with their keyboards, and it's what appears in Word docs and on the Portuguese Wikipedia and on Google Translate, and in the readable parts of other web sites. With "São Paulo", everyone knows exactly what's going on, and there's no need to waste time doing a search on the meaning of "atilde" or "ã" or whatnot.
If I were making the rules, I think I'd keep it simple and say to use characters directly except for otherwise invisible characters and those that cause technical problems when used directly. I'd actually be fine if we used ASCII hyphens for all of our dashes, but I'm not complaining if people who can see the difference on their monitors want to upgrade some of them to emdashes to make things look pretty as in the golden years of paper typography. That would make a much smaller table than the one proposed above, but given that other editors seem to feel more strongly about making it easy to tell the difference between certain lookalike characters, I think that table now represents a pretty good compromise. Leaving dashes and quotes as they are takes the biggest chunks of potential work off the table, anyway.
Given that this is proposing a simple general rule and then listing all the desirable exceptions to it, I'm not sure that a narrower proposal would make sense. The volume of comments has been relatively small, so having multiple discussions about the same topic it seems would just burn more editor time. I am, however, open to actionable suggestions. -- Beland ( talk) 08:03, 22 July 2018 (UTC)
Even if we changed all of them today, given the sheer volume of changes to the encyclopedia it would not be a big dealthen there are some things you really don't understand; if you made changes like this to 3% of articles in one day, or one week, or even one month, you'd be strung up by your URLs.
I haven't been following that last week of discussion so I don't know where we are and what the open issues are, but if you want this to see the light of day you need to be prepared to keep plugging for quite some time to work through all the details with all interested parties (not that I even know how to find them). I've gone through an effort like this myself elsewhere in MOS and it can be an exhausting task, though you will be quite rightly congratulated by all in the end if you can pull it off, because it will be a very useful achievement for the project. E Eng 19:05, 23 July 2018 (UTC)
How do other editors feel about David Eppstein's proposal for a rule that "such changes be made only as part of other substantive changes to articles"? Personally, I don't see the need for that, given the arguments I made above, but of course I'll implement whatever the consensus is. -- Beland ( talk) 20:46, 23 July 2018 (UTC)
This is a whole lot of stuff being discussed at once. I'll cover it in the order in which I'm seeing it come up above:
...
not …
or …
, at
MOS:ELLIPSIS; and use μ or μ
not µ
, at
MOS:UNITSYMBOLS; and so on), so the idea that it's off-topic or out-of-scope for MoS doesn't fly. B) MoS has already been updated with a footnote against automated "enforcement" of MoS stuff, including cross-references to the COSMETICBOT policy and to ArbCom decisions about it. The fact that someone could go on an bot-mediated enforcement rampage is not an argument against MoS having line-items about various stuff; the fact that we have rules against doing that is already sufficient to address the rare problem. Given that someone just lost their AWB access as a result of doing something like that should discourage a repeat. Rules do not need 100% compliance to be useful, nor does failure to achieve 100% compliance mean they're insufficient; otherwise civil society would be impossible.And what was once an okay idea can become a poor one over time as circumstances change. E.g., the cutover last month to a new HTML linter for the parser broke all kinds of stuff that used to "okay" or "we don't care", but which is no longer okay, and thus we now do care. The most obvious of these is that unclosed inline elements used to be forcibly closed at the opening of a block element and this is no longer the case, resulting in badly broken, mis-rendering HTML in at least tens of thousands of pages. People have been cleaning this up, including with semi-automation tools like AWB and JWB, yet no one having a shit-fit about it. People will have shit-fits about such activity if it's PoV pushing (e.g. changing all "U.S." to "US", or changing all unspaced em-dash parenthesizing to use spaced en dashes), but they don't lose it over technical cleanup. Another example is that <br>
breaks the output of at least two of the available edit-mode syntax highlighters, and needs to be changed to <br />
; I've already fixed one "Help:"-namespace page from the 2000s that was recommending <br>
, and there are probably some others that need fixing in this regard.
— SMcCandlish ☏ ¢ 😼 16:38, 26 July 2018 (UTC)
Whitespace other than the non-breaking and regular space should be avoided in prose.
–
to –
: I actually proposed that several years ago for the same reason, and did not get consensus. Apparently the average editor, with their fonts, can see the difference clearly, and people were dismissive of the idea because the editing tools below the edit window provide a – button for directly inserting the Unicode character. I think, therefore, this is a lost cause. Editors having trouble seeing the difference between –
, —
, −
, and -
need to use
WP:User CSS or their browser's font settings to use a font for editing that works better for them. I wrote instructions on how to do this at
Help:User style#User CSS for a monospaced coding font. It's not absolutely perfect; the minus and hyphen are still hard to distinguish. If I find a better, free coding font than Roboto Mono I'll put it at the front of the font stack.Apparently the average editor, with their fonts, can see the difference clearly. I suspect instead that that the great majority of editors don't even know there is a difference (and just use hyphen), most of those who know the difference are inserting directly using the click-to-insert gizmo but don't really notice or care what it looks like in the edit window since they never look back, and the very small number of us who are copyediting and checking these things have learned to deal somehow with the difficulty of distinguishing them – in my case, wherever I see a direct/literal character which I know should be an ndash but I'm not sure, I just change it to {ndash} so I know it's right. But I'd rather we encouraged editors to use a coded form in the first place to save that trouble. Unfortunately that would create a new flashpoint for my next point, which is...
MoS doesn't constrain editors in any way as to adding new material– You know that and I know that, but as sure as day follows night someone's gonna paste in a direct rho, someone else is gonna change that to ρ (as recommended in the table), and the first guy's gonna change it back, saying "I like it this way." Having said that, looking over the whole table now I don't see very many cases where that might happen (unless we adopt a recommendation to use coded forms of ndash and mdash) but I still think the wider this is advertised for comment in advance the less trouble there will be.
"Humor is Mandkind's greatest blessing." — Mark Twain". Also not a rule; it just looks better. Neither of these uses is vital. But they're not objectionable. So, we have a handful of use cases we can document, and then discourage it otherwise. Put it in a footnote, probably. I'm a big fan of footnoting "there are some geeky exceptions" stuff instead of clouding the central advice. On horizontal marks: Well, you can try proposing glyph-to-code conversion if you want, but don't hold your breath. With my font tweaking solution, I have no difficulty at all telling en dashes and hyphens apart, in rendered or source view. "The wider this is advertised": Sure, but not while we're still banging on it just with 3 or 4 people. Iron out the obvious kinks, or even more surely that day follows night, people will "strongly oppose" the whole thing on the basis of some nitpick we should have already anticipated. — SMcCandlish ☏ ¢ 😼 20:58, 27 July 2018 (UTC)
These are sometimes used for precision positioning in templates but should not be used in prose. Use either non-breaking (
) or regular spaces
. So who's OK with my formulation These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking
and regular space are normally sufficient
(with or without a footnote as suggested by SM)? I'm fine with the rest of what SM has said.
E
Eng 21:27, 27 July 2018 (UTC)I feel like keeping to the spirit of Wikipedia:Manual of Style#Keep markup simple means saying that &thinsp and &hairsp should not be used around italics, dashes, and §, since either a regular space or no space works just fine. And I agree with that general approach; HTML is not well-suited to pixel-perfect character control, and as long as there are no horribly ugly problems like actually-overlapping characters I don't think we should fuss about that sort of small thing. This sort of layout issue may be better addressed by making web browses render text more beautifully than by throwing in a bunch of site-specific directives.
If we were to start putting &thinsp around, say, emdashes, then I think that would be a good argument for doing that in an {{ emdash}} template, since we'd want it everywhere consistently. I don't think it's a good idea to do that sort of fine-control typography on an article-by-article basis, since then it will not be done consistently.
If {{ endash}}, &endash, and – all do exactly the same thing with no fancy spacing, I can see an argument for having two different ways to do it (one HTML-free and one for easier identification), but three ways seems like too many, when two of them serve almost exactly the same purpose.
That said, I'd rather publish the new tables with some of the rows marked as disputed/under discussion than hold the whole thing until there's consensus on every single part, so at least we can start making progress on the items that everyone agrees on, which seems like 95% of it. -- Beland ( talk) 02:16, 28 July 2018 (UTC)
keeping to the spirit of Wikipedia:Manual of Style#Keep markup simple means saying that &thinsp and &hairsp should not be used around italics, dashes, and §– No, what the linked guideline says is "Other things being equal, keep markup simple... Use HTML and CSS markup sparingly". That's not "should not be used".
HTML is not well-suited to pixel-perfect character control, and as long as there are no horribly ugly problems like actually-overlapping characters– It may not be well-suited, but at times we need to do the best we can, and we're not talking about "pixel-perfect". David Eppstein's example is an excellent one in which neither regular space nor no space is at all acceptable.
I'd rather publish the new tables with some of the rows marked as disputed/under discussion– Well, I think we have our hands full just coming up with tables which faithfully and uncontroversially centralize what is now scattered all over creation. And that would be quite an achievement. Changes to what's being recommended should be a follow-on effort.
@ David Eppstein: @ EEng: Given the above discussion, do either of you have any remaining objections to posting the revised guidelines? -- Beland ( talk) 00:07, 5 August 2018 (UTC)
Assuming we're still doing this, let's while we're at it do something about this insane pile of technical minutiae: WP:How_to_make_dashes. E Eng 23:48, 4 August 2018 (UTC)
E=mc<sup>2</sup>
copy pastes as E=mc2, and can be used in citation templates without boogering the COinS output. I'm wondering if this conflicts with anything in
MOS:NUM and
MOS:TM, and the main MoS page. If so, we need to figure out how to reconcile that. —
SMcCandlish
☏
¢ 😼 22:56, 27 July 2018 (UTC)
H<sub>2</sub>O
is no harder than copy-pasting H₂O
. This is also an accessibility concern, as screen-readers will often chock on Unicode superscripts.
Headbomb {
t ·
c ·
p ·
b} 14:16, 5 August 2018 (UTC)EEng keeps messing with the table layout, forcing them to take huge amounts of vertical space, breaking consistency, scaling/zoom functionality, and forcing unnatural breaks for AFAICT, no real reason but personal preferences. What looks better, [5] + [6] (inline) or [7] + [8] (random vertical breaks)? Headbomb { t · c · p · b} 10:52, 27 July 2018 (UTC)
Proposed for posting to Wikipedia:Manual of Style/Text formatting § HTML character entity references and replacing the second paragraph of "Keep markup simple" at Wikipedia:Manual of Style § Miscellaneous with a link to this new section.
HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name,
decimal number, or
hexadecimal number. For example, €
is the same as €
, €
, or including the character €
directly.
On Wikipedia, characters should be used directly unless doing so is confusing for editors or causes technical problems. Numerical references should not be used if a named reference is available. For example, −
should be used instead of −
, and é
should be used instead of é
. For a comprehensive list of available named references, see
[9].
Wikipedia stores articles with
Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the
Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki>
tag can also be used instead of
character escaping to prevent interpretation of special characters as wiki markup.
Please note: It is always OK, whether using manual or semi-automated means, to fix broken HTML entities by replacing them with characters or correct HTML entities (whichever is preferred in the specific case). (Fully automated fixes would need bot approval.) However, when changing existing text from a disfavored to favored form, especially when making large numbers of changes, WP:MEATBOT asks that editors making manual edits please pay attention to the context and be aware of exceptions to the guidelines. When using automated and semi-automated tools, remember that WP:COSMETICBOT and WP:AWBRULES ask that these tools not be used to make changes of this type unless accompanied by a more substantive (reader-visible) change. Check Wikipedia error 11 is disabled for this reason.
Characters to avoid | | ||
Avoid | Instead use | Note |
---|---|---|
… (… )
|
... (i.e. 3 periods)
|
See MOS:ELLIPSIS. |
Unicode Roman numerals like Ⅰ Ⅱ ⅰ ⅱ
|
Latin letters equivalent (I II i ii )
|
MOS:ROMANNUM |
Unicode fractions like ¼ ½ ¾ ⁄
|
{{ frac}}, {{ sfrac}} | See MOS:FRAC. |
Unicode subscripts and superscripts like ¹ ⁺ ⁿ ₁ ₊ ₙ
|
<sup></sup> <sub></sub>
|
See
WP:SUPSCRIPT. In article titles, use {{DISPLAYTITLE:...}} combined with <sup></sup> or <sub></sub> as appropriate.
|
µ (µ )
|
μ (μ )
|
See MOS:NUM#Specific units |
Ligatures like Æ æ Œ œ
|
Separate letters (AE ae OE oe )
|
Generally avoid except in proper names and text in languages in which they are standard. See MOS:LIGATURES. |
∑ (∑ ) ∏ (∏ ) ― (― )
|
Σ (Σ ) Π (Π ) — (— )
|
(Not to be confused with \sum and \prod, which are used within <math> blocks.) |
‘ (‘ ) ’ (’ ) ‚ (‚ ) “ (“ ) ” (” ) „ („ ) ´ (´ ) ′ (′ ) ″ (″ ) ` (` )
|
Straight quotes (" and ' )
|
Use {{ coord}}, {{ prime}} and {{ pprime}} for mathematical notation; elsewhere use straight quotes unless discussing the characters themselves. See MOS:QUOTEMARKS. |
‹ (‹ ) › (› ) « (« ) » (» )
|
Use ⟨ and ⟩ for math notation.
|
In foreign quotations normalize angle quote marks to straight, per MOS:CONFORM, except where internal to non-English text, per MOS:STRAIGHT. |
       
|
Normal space | These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking ( ) and regular spaces are normally sufficient. Exceptions:
MOS:ACRO,
MOS:NBSP.
|
In vertical lists
|
*
|
Proper wiki markup should be used to create vertical lists. See HELP:LIST#List basics. |
‍ ‌
|
see note | Used in certain foreign-language words, see zero-width joiner/ zero-width non-joiner. Should be avoided elsewhere. |
₤ | £ for GBP, keep ₤ for Italian Lira and other lira currencies that use ₤ (see the main article for that currency) | MOS:CURRENCY; find broken instances |
Potentially confusing or technically problematic characters | | ||
Category | coded form (direct form )
|
Notes |
---|---|---|
Miscellany | & (& ) < (< ) > (> ) [ ( ) ] ( ) ' (' ) | (| )
|
Use these characters directly in general, unless they interfere with HTML or wiki markup. Apostrophes and pipe symbols can alternatively be coded with {{
'}} and {{
!}} or {{
pipe}} . See also
character-substitution templates and
WP:ENCODE.
|
Greek letters | Α (Α ) Β (Β ) Ε (Ε ) Ζ (Ζ ) Η (Η ) Ι (Ι ) Κ (Κ ) Μ (Μ ) Ν (Ν ) Ο (Ο ) Ρ (Ρ ) Τ (Τ ) Υ (Υ ) Χ (Χ ) κ (κ ) ο (ο ) ρ (ρ )
|
In isolation, use coded forms to avoid confusion with similar-looking Latin letters; in a Greek word or text, use the direct characters. |
Quotes | ‘ (‘ ) ’ (’ ) ‚ (‚ ) “ (“ ) ” (” ) „ („ ) ´ (´ ) ′ (′ ) ″ (″ ) ` (` )
|
Can be confused with straight quotes (" and ' ), commas, and with one another.
MOS:STRAIGHT generally requires conversion to straight quotes, except when discussing the characters themselves or sometimes with non-English languages. See next row for prime characters.
|
Apostrophe-like | ' ` ′ ´ ʻ ʼ ʽ ʾ ʼ ʽ ʻ ʼ |
|
Dashes, minuses, hyphens | – (– ) — (— ) − (− ) - (hyphen) ­ (soft hyphen)
|
Can be confused with one another. For dashes and minuses, both forms are used (as well as {{
endash}} and {{
emdash}} ).
Soft hyphens should always be coded with the HTML entity or template. Plain hyphens are usually direct, though at times {{
hyphen}} may be preferable (e.g.
Help:CS1#Pages). See
MOS:DASH,
MOS:SHY, and
MOS:MINUS for guidelines.
|
Whitespace |         ‍ ‌
|
In direct form these are nearly impossible to distinguish from a normal space. See also MOS:NBSP. |
Non-printing | ‎ ‏
|
In direct form these are nearly impossible to identify. See MOS:RTL. |
Mathematics-related | ∧ (∧ ) ∨ (∨ ) ⟨ (⟨ ) ⟩ (⟩ )
|
Can be confused with x ^ v < > . In some cases TeX markup is preferred to Unicode characters; see
MOS:FORMULA. Use {{
angbr}} instead of ⟨ ) / (⟩ )
|
Dots | ⋅ (
⋅ ) · (
· ) • (
• )
|
Can be confused with one another.
Interpuncts (· ) are common in horizontal lists and to indicate syllables in words.
Multiplication dots (⋅ ) are used for math. In practice, the dots are used directly instead of the HTML entities.
|
@ David Eppstein: I thought your opinions might have changed or been refined in response to the comments by SMcCandlish in the discussion of the third draft. SMcCandlish said some interesting things about how to formulate advice against disruptive editing, which I think helped evolve my position. I've tried to integrate both your views in the new paragraph in the above fourth draft. How does that sound to you? -- Beland ( talk) 23:35, 6 August 2018 (UTC)
€
to €
or –
to –
(or vice versa) without strong consensus to do so per
WP:COSMETICBOT. Some of those could end up as minor
WP:GENFIXES (and only when "Unicodify page" is manually enabled), but that's already an option for a lot of things. I think (not sure) AWB exposes invisible characters (non-breaking spaces to
for instance), and I know for a fact that
WP:WikED does it. I'd support changing obscure hex (€
) and dec (€
) codes to their regular (€
) or readable (€
) equivalents on a character-per-character basis though.
Headbomb {
t ·
c ·
p ·
b} 02:28, 7 August 2018 (UTC)
I find what Headbomb says about this sort of misbehavior being disallowed more convincing than your rationales for why you think it should happen, but whatever. There are also still specific problems with your draft.
— David Eppstein ( talk) 06:49, 10 August 2018 (UTC)
-- Beland ( talk) 07:22, 10 August 2018 (UTC)
FTR, I just advertised this proposal on Wikipedia talk:WikiProject Mathematics and Wikipedia talk:WikiProject Science, since these are the topic areas with the most articles that would eventually have to be changed to reach full compliance with this new recommendation. -- Beland ( talk) 06:54, 10 August 2018 (UTC)
@ David Eppstein: You requested evidence above; evidence was given. What are the implications of those findings for you? -- Beland ( talk) 08:32, 15 August 2018 (UTC)
Beland, I thought you said you weren't going to go about making trivial changes that don't alter what the reader sees. [12] E Eng 22:30, 7 August 2018 (UTC)
we should not make invisible and semantics-neutral changes to articles except as part of more substantive edits to the same articles. I continue to be concerned that you seem to be on some kind of uniformity-for-uniformity's-sake crusade, and that your machinery for "automatically find misspellings, mistakes in English grammar, and violations of the Wikipedia:Manual of Style" ( WP:Typo_Team/moss) will lead to mindless gnomish "corrections" of things that were right in the first place or simply don't need to be changed. E Eng 11:54, 12 August 2018 (UTC)
This is certainly something that should be left up to the individual editor, for various good reasons.
One good reason is that... there is no one clear correct or better way.
A second good reason is that adding another needless rule bogs down the MOS with more detail and makes it harder to learn and harder to use.
A third good reason is that creating a rule means enforcement, it puts interactions about the matter into an enforcement mode where editors are playing rules cop with other editors and this is not as functional as peer-to-peer interactions.
A fourth good reason is that there's zero evidence that it matters to the reader.
A fifth good reason is that micromanaging editors to this level is demoralizing and not how you attract and nurture a staff of volunteer editors – for instance we have a stupid micromanaging rule that I have to write "in June 1940" and not "in June of 1940" which is how I naturally write, and every stupid micromanaging rule like this is just another reason to just say screw it. As the Bible says "Thou shalt not muzzle the ox that treadeth out the corn" ( 1 Timothy 5:18, paraphrased from Deuteronomy 25:4) which updated means "Let the editor who did the actual work of looking up the refs and writing the friggen thing -- you know, the actual work of the project -- be at least allowed the satisfaction of presenting it as she thinks best, within reasonable constraints"...
This means different articles will do it differently. This annoys a certain type of editor. Oh well...
Please read that carefully and think about it. E Eng 23:19, 12 August 2018 (UTC)
@ EEng: I don't think we should blame the teacher in this example for failing to find the phrase in question because they have not chosen the words a "sensible" person would. A lot of times it makes sense to choose the rarest thing to search on because TFIDF ranks that highly. A sensible person would expect the search engine to find the phrase in question regardless of whether the searcher picked your words or mine.
In real life, I'm a programmer, and I often need to look up operators, which are usually punctuation. For privacy reasons and because they are a potential competitor, I generally prefer to avoid using Google and if I'm searching Wikipedia I use Wikipedia's internal search engine. In the case of special characters, that is often mandatory. For example, if you search Google for "site:en.wikipedia.org 0.75 ‰" you will also not see Great Western main line in the top search results either, because Google drops ‰ from that search entirely. I'm also a linguist, and sometimes I need to research symbols in various languages. For example, if I'm doing machine translation work with French, I might need to know more about how « is used by that language. Right now if I do a full-text search for that on the Wikipedia site, I only get one article in the search results. It's Guillemet and that's very helpful, but if I search for « there are dozens more results. If I search for "«" or "« site:en.wikipedia.org" on Google, I don't get Guillemet at all. I can file a bug report with Google that they may or may not do anything about, but I can fix Wikipedia's search engine right now by converting all the « to «.
If the number of editors this affects is "vanishingly small" in comparison to the size of the project, then the number of changes needed to implement the proposed guidelines is similarly small, and thus the amount of disruption is similarly small. If we have consensus that such changes are either neutral or small improvements (opinions range) but no one thinks they are negative or would want to undo them, and I'm willing to put in the work to do them, then what's the problem?
As for Herostratus' wisdom, I think I agree with most or all those points applied to the situation upon which they are commenting. English is complicated and people can tell the difference between clear and unclear prose without having an enormous rulebook. But most of these arguments don't jive for me with this case, which is not about how to phrase English prose.
As for the second reason, it's a valid concern that the Manual of Style not get too long. I don't think anyone actually reads it end-to-end, though, so this is not my biggest worry. When I use it, I tend to be looking for the section that answers a particular question that's come up. It seems to me like the most logical place to put this information, but I'm open to putting it somewhere else if it's considered too obscure for a general audience. Would that be preferable? -- Beland ( talk) 02:43, 13 August 2018 (UTC)
failed to give a plausible example of what's not working now which all this will fix. E Eng 23:06, 17 August 2018 (UTC)
insource:"search_term"
) work for an exact search (some exceptions apply, mostly punctuation), whereas the regular expression delimiters will actually execute a regex search (which take longer). Probably the bug you've run into is a result of search folding, were I a guessing man. --
Izno (
talk) 02:58, 19 August 2018 (UTC)Hi all, not sure where the best place to ask this question is, but quickly: If we find a reference like this, how are we supposed to present the title of the article in our reference formatting?
The broad question is, how much (if any) work do we do to conform the reference's title to our MOS? Thanks, Cyphoidbomb ( talk) 14:43, 19 August 2018 (UTC)
Reading through Italian Greyhound, I noticed that both Italian and Greyhound were both capitalised. I see why 'Italian' would be, but 'Greyhound' didn't seem right, so I started changing them. I just checked pages for other dog breeds though, and noted that they are the same - see Whippet, Greyhound, Sloughi. I've looked at MOS:COMMONNAMES and it does not give any justification for this - are there any other rules I'm missing, or should I edit these pages to conform with normal capitalisation rules? Girth Summit ( talk) 18:23, 23 August 2018 (UTC)
Radiohead have a song called Go to Sleep, which is listed as "Go To Sleep." on some tracklists. (In fact, this goes for every track on their album Hail to the Thief.) Should the period be included when we mention the song on Wikipedia? My vote would be no, because I see it as a stylization, but I can't find anything in the MOS that specifically backs me up or shoots me down. MOS:CONFORMTITLE might apply but it doesn't seem black and white. Popcornduff ( talk) 09:39, 25 August 2018 (UTC)
In an article, the main intro (lede) summarizes the article overall.
But it may have a subsection with its own intro (lede) and subsections. For example:
==Political activity controversies== [Summary of section/section lede] == Involvement in X== ... == Donations to Y== ... == Consultancy to Z== ...
My question is hard to express exactly so I've worded it a few different ways in the hope it expresses the underlying concern.
Thanks for any help and insight. FT2 ( Talk | email) 11:00, 25 August 2018 (UTC)
==Political activity controversies== General information on political activity and anything not covered in the subsections below. === Involvement in X=== X specific material
====2016 exposé==== What he did re X
====2017 impeachment attempt==== Moves to impeach him for the exposé
=== Donations to Y=== Y specific material
HTH, Martin of Sheffield ( talk) 12:00, 25 August 2018 (UTC)
Where a large number of articles are very closely related, and contain sections which are interchangeable or nearly so are duplicated within those articles, should Wikipedia encourage use of a boilerplate system instead of copying the same section over three hundred or more articles? I think the "template" system might end up being too rigid? Any suggestions? Can this be done? Or should we keep on cut-and-pasting such sections? Thanks! Collect ( talk) 18:55, 3 August 2018 (UTC)
{{
unreferenced section}}
on lots of
Family tree templates. In those case I also included instructions of how those templates could contain be self contained inline-citations using {{
efn-lr}}
and {{
notelist-lr}}
(see for example
Template:Kennedy family tree).@ Deacon Vorbis: This came about as a result of a discussion at Talk:List of chemical elements#spelling, after I queried why aluminium and sulfur appeared in the same article. I was refered to WP:ALUM. That made it clear that the internationally accepted spellings should be used. It doesn't just refer to article titles, as it states "... even if they conflict with the other national spelling varieties used in the article." As I'm sure I'm not the only person who's not aware of that convention, I believe it should be listed as an exception to ENGVAR. Voice of Clam (formerly Optimist on the run) ( talk) 14:20, 24 August 2018 (UTC)
In the context of automobile articles, we would be pushing it uphill to get Americans to talk about aluminum wheels. Very few of the editors on automobile articles would know anything about chemistry or know about the WP policies for chemistry. Indeed, I only found out about WP:ALUM today and I have been contributing for over 10 years, have been working in various engineering related fields for 30 years and remember most of my high school chemistry. Stepho talk 03:58, 25 August 2018 (UTC)
we would be pushing it uphill to get Americans to talk about aluminum wheels– say what? E Eng 04:37, 25 August 2018 (UTC)
What does IUPAC mean when it says "The alternative spelling 'aluminum' is commonly used." and "The alternative spelling 'cesium' is commonly used." in its Recommendations (Table I of the Red Book)? DrKay ( talk) 07:41, 25 August 2018 (UTC)
Opinions are needed at Talk:Health and appearance of Michael Jackson#Structure. The latest discussion in that section concerns whether or not articles should have see also sections and whether what MOS:MED states at Wikipedia:Manual of Style/Medicine-related articles#Standard appendices should apply to what to do with this particular article's See also section. A permalink for the discussion is here. Flyer22 Reborn ( talk) 23:43, 29 August 2018 (UTC)
A requested move of –30– (The Wire) to -30- (The Wire) was just relisted due to lack of input. Regulars of this page are the only people I can think of who are knowledgeable of and interested in en-dashes vis-a-vis hyphens, so I notify you of the RM. (I assume my notice is neutral because I am neutral and don't care which line is used here.) -sche ( talk) 16:02, 30 August 2018 (UTC)
Do we have a rule about using bold, capital letters, exclamation points, color, font size, etc., to give one section on a talk page more prominence? Because whatever I want to say is obviously far more important than anything anyone else has to say... (: -- Guy Macon ( talk) 20:20, 23 August 2018 (UTC)
"try to avoid using bold markup", but it's hard to see what reasonable justification someone could give for using bold in section headings. -- tronvillain ( talk) 20:51, 23 August 2018 (UTC)
...but is it worth adding a specific rule forbidding it? I hear little birds, and they are chirping WP:CREEP, WP:CREEP. On the other hand, look at the current table of contents at Wikipedia:Village pump (technical)... -- Guy Macon ( talk) 23:34, 23 August 2018 (UTC)
WP:JG doesn't seem to have a strong opinion one way or the other, if I'm reading it right. I just added a Japanese figure whose death date I converted to the Gregorian calendar using an online tool. I don't actually know whether all the other dates included in the list are Julian or Gregorian. Is there a rule here? It seems like lists like that should be internally consistent. The only solid example I could think of off the top of my head, where English Wikipedia would definitely list two specific people in different countries who died on the same date, according to different calendars so that they actually died several days apart, was the famous Cervantes/Shakespeare mess here, which explicitly notes that the Shakespeare date is OS. But are all dates for 11th-century Europe assumed to be Julian? Or what? Should I change the Japanese date to Julian? Hijiri 88 ( 聖 やや) 11:59, 1 September 2018 (UTC)
In cases like this, does MOS have any guidance on what is "better"? Gråbergs Gråa Sång ( talk) 08:31, 3 September 2018 (UTC)
Many address, for places large and small, have the city state and zip code based on the post office delivering mail. It is not so unusual that this disagrees with the actual city boundary. Sometimes this matters, such as indications in articles about specific cities. Gah4 ( talk) 06:42, 4 September 2018 (UTC)