From Wikipedia, the free encyclopedia
Archive 1 Archive 3 Archive 4 Archive 5 Archive 6

Move

Wikipedia:Article sizeWikipedia:Manual of Style (article size) — Consolidating naming per Wikipedia_talk:Manual_of_Style#Poll Gnevin ( talk) 16:28, 24 May 2010 (UTC)

Fine by me. -- Eraserhead1 < talk> 17:30, 24 May 2010 (UTC)
Oppose: Per, the descriptions in our List of guidelines, this strikes me much more as an editing guideline ("non-content advice about categorization, navigation or other how-to-edit advice") than a style guideline ("advice on writing style, formatting, grammar, and more").— DCGeist ( talk) 22:56, 25 May 2010 (UTC)
Remove from the MOS? Gnevin ( talk) 12:33, 26 May 2010 (UTC)
Rebranded as editing guideline, its natural category.— DCGeist ( talk) 18:53, 26 May 2010 (UTC)
Yeah, this is definitely not a style page. — SMcCandlish Talk⇒ ʕ(Õلō Contribs. 09:08, 17 September 2010 (UTC)

Time to revisit the technical problems argument, advise against splitting most long list articles

It's getting toward 2011 now, and I'm feeling more and more that we need to revisit the technical side of this. At some point WP has to stop catering to broken, obsolete technology like browsers from the 1990s and early 2000s. While "reader fatigue" is a real issue, and WP:SUMMARY provides a way to solve that problem, not all articles are intended to be read from top to bottom, and will become user-unfriendly and downright editor-hateful if split into multiple pages. I have anecdotal but to me rather strong evidence that the technical aspects are essentially an obsolete – a long and very, very linked-to glossary list article, has reported absolutely zero length-related problems in over 4.5 years.

The most obvious cases are list articles of various sorts, including glossaries. People (other than really bored people with way too much time on their hands) do not usually try to read such articles from top to bottom; they load the page and search for the term they are interested in, if a #-link didn't bring them directly to it from another page. Splitting such pages makes in-page search more difficult, and actually frustrates readers' ability to find information. E.g. if I search for "foo" in a long one-page list, I may find a "foo" entry, and/or various mentions of "foo" as applied in several contexts, in various other entries, while if the article is split, I might not find a "foo" entry because it's in another page, and assume there's no information on the topic, and/or I may miss a lot of contextual information about "foo" because I have not realized there is more of it in another article in the split series).

I have put off converting Glossary of cue sports terms, one of the articles I have worked on the most, into a split article for some time (it's been tagged with {{ Longish}} for 2.5+ years), because I have yet to see one single case of someone's browser accidentally truncating the page, a user reporting a crash or other technical problem, or any reader suggesting that the document is too long for simple human usability reasons. This despite the article being over 240K, being linked to (usually many times at different entries) in almost all cue sports articles, and being edited nearly two-thousand times, by registered and anon users from all over the world, with greatly varying levels of technological currency/obsolescence. I've also resisted splitting because {{ Cuegloss}} would have to be redone in a very complex way that would so complicate use that most editors who bother to use it now to create helpful glossary links for non-billiards-expert users would surely abandon it. I can no longer see any good reason for (and do see several good reasons against) splitting this or any similar article, even though until recently I have long been tinkering with test code for splitting the article and adapting the templates that work with it, to comply with this guideline's article length advice.

At any rate, if after 1900+ edits by hundreds of users the page has never been truncated by a browser that can't handle long textedit fields, this strongly suggests that the truncation concern is no longer a valid one in anything near significant numbers of incidents; such browsers are today so rare that the odds of it happening are now so low that it need not be even mentioned here, and if it does happen, it will be obvious and someone will fix it.

I propose that a partial rewrite is also in order to strongly suggest that most types of list articles remain unsplit, either regardless of length, or unless longer than X where this variable is some number we arrive at that is very much more than the current number, like maybe 1MB. Lists that are easily divided into clear sub-topical sections each with numerous entries could be given as a clear exception, something that perhaps should be split after 100K or so, such as events relating to some topic in the 1700s, 1800s, 1900s, or vehicles manufactured by Ford, BMW, Toyota, etc.). But most list articles, including glossaries, are not divisible logically this way, only arbitrarily, and WP:SUMMARY cannot logically apply to them. They are not intended for start-to-end reading, but for in-page searching. Meanwhile, splitting them not only greatly impedes such searching, it makes creation and use of tools that work with such articles (e.g. Template:Cuegloss) much more difficult.

SMcCandlish Talk⇒ ʕ(Õلō Contribs. 10:03, 17 September 2010 (UTC)

The only thing I would add to the above excellent summary is that this particular article is high use. It get approximately 600 views per day. Guessing that about 80% of those views are not repeats, we're talking about 175,000 different people viewing the article per year without a single length complaint (600 x .8 = 480 x 365).-- Fuhghettaboutit ( talk) 12:08, 17 September 2010 (UTC)
Cool. I didn't know about that stats tool, or had forgotten about it. — SMcCandlish Talk⇒ ʕ(Õلō Contribs. 18:05, 17 September 2010 (UTC)
I agree and would cite the additional problem of repeating footnotes in long lists. The large (190kb) List of islands of Maine features some footnotes repeated 5 times, 10 times, in one case close to 40 times. The only feasible way to split up that table would be arbitrarily by letters of the alphabet, ie. Maine Islands from A-G, H-Q, R-Z etc. But that would leave the footnotes a hopeless tangle on separate pages. So I am going to modify the guidance on the main page to suggests lists be broken into separate pages only when the organizational logic of the list suggests it. ElijahBosley (talk ☞) 16:40, 21 November 2010 (UTC)
Sounds good. -- Eraserhead1 < talk> 16:48, 21 November 2010 (UTC)
I removed this sentence in follow-up to this thread. — W F C— 11:02, 27 February 2011 (UTC)

"Article Length" and "Long article"

I keep these two as redirects to "Wikipedia talk:Article size" because I had significant trouble finding the article in the first place.-- Jax 0677 ( talk) 12:15, 9 January 2011 (UTC)

"hit preview to see the page size warning"?

I tried following the instructions at Wikipedia:Article size#Measuring "readable prose" size with the Time travel article, but I didn't see any "page size warning" in the preview. Did I miss it, or do you only get such a warning if the size is above a certain limit? Either way I think the specifics of where and when such a warning appears should be mentioned in this section. Hypnosifl ( talk) 04:33, 3 March 2011 (UTC)

The page size notice was removed a little while ago. I have updated the page. SilkTork * YES! 11:15, 6 May 2011 (UTC)

Exceptions: Lists, Tables

The guide appears to be referring to standalone lists - "Lists, tables, and articles summarizing certain fields are exceptions." Though, surely, the advice relating to not splitting certain lists would also apply to embedded lists. If a list is constructed in such a way that splitting it or summarising it would be inappropriate, then it doesn't really matter if the list is a standalone or is embedded, if it shouldn't be split, then it shouldn't be split.

What does "articles summarizing certain fields" refer to?

Possible new wording: Lists, tables, and material summarizing certain fields are exceptions. If there is no "natural" way to split long lists or tables, it may be best to leave them intact. They act as summaries and starting points and in the case of some broad subjects or lists either do not have a natural division point or are more easily word-searched as a single set. This is especially the case when buttressing cites are repeated throughout the list or table. In such cases, the list or table should nonetheless be kept as short as feasible.

Does the paragraph regarding "Major subsections..." belong in the Exceptions: Lists, Tables section? SilkTork * Tea time 18:49, 26 June 2011 (UTC)

"If a list is constructed in such a way that splitting it or summarising it would be inappropriate, then it doesn't really matter if the list is a standalone or is embedded," -- I have not yet seen a list that could not be summarised and embedded lists can always be "split". In such a case "splitting" just wouldn't involve breaking the embedded list into parts, but scooping it out of the article and creating a stand-alone list with the embedded list as its core.

As far as your suggestion goes, I don't have an opinion yet. I find this to be one of the most puzzling guidelines on Wikipedia, particularly since it's been labeled an "editing guideline". Good raise 08:29, 27 June 2011 (UTC)

I'm working on the guideline now, seeing if it can be made clearer. I take your point that lists can be summarised. I think the Exceptions section is pointing to certain material than cannot easily be summarised or split per WP:Summary style. I understand the thinking that a summary is already condensed, but as professional writers, journalists, students and teachers know, even summarised material can be reduced. Doing a précis was once a standard English language task; when doing professional writing and journalism one works to word lengths, rather than what the writer feels the topic "needs" (and much writing tends to benefit from cutting back to the bare essentials - encyclopedic writing benefits more than most); and students who can précis their notes down into short manageable bites find that of immense benefit. I think there is a balance to be struck between being excessive and being elliptical - though this would apply to all forms of writing within Wikipedia articles, not just summaries or lists.
I think the main point of this guideline is that articles should not be too long that they overwhelm the reader, so when material becomes too long or detailed, it should be split out into a sub-article per WP:Summary style, and that process can carry on for as long as there is useful and notable information. So we have The Rolling Stones with a section on Band members which splits off into Mick Jagger which has a section on Albums which splits off into Primitive Cool which has a track listing which splits off into Let's Work (Mick Jagger song).
An informed decision needs to be made as to what information should be in a parent article, and which is better contained within a sub article. I feel this guideline should be helping editors to decide when to summarise and split, how to summarise and split, and what to summarise and split. In a sense, as well being stand alone advice on article size, it also stands between Wikipedia:Layout and Wikipedia:Summary style, as it includes elements of both.
I'm wondering not only why lists are exceptions as regards summarising, but also why they are considered not to be part of the "readable prose". Readers will read and study lists if they contain important information. If a list contains information that is considered not to be part of the essential reading in the article, then one would question why the list is there in the first place. Lists should not be purely decorative, and when a list deals in excessive statistics it rubs up against WP:NOTSTATS. SilkTork * Tea time 10:06, 27 June 2011 (UTC)
I suspect lists (or rather tables) are not considered as "readable prose" to simplify measuring of "article size". Perhaps the original author reckoned most lists would require an inconsiderable amount of time to study compared to the length of article source text necessary to generate them. As for what this guideline ought to do... While I appreciate the value of an editing guideline helping editors (newer ones in particular) making the kind of decisions you name, I'd just as much like to have a style guideline describing where community wide consensus lies in regards to how long or short articles should be. Presently this page tries to be both and is neither very well. Good raise 12:56, 27 June 2011 (UTC)

Size no longer viewable in history?

The "How to find articles by size" section says "You can find the size of a page including the markup in kilobytes [kb] from the page history". I have tried this for several long articles (examples [1] [2]) and I do not see anything on the history pages indicating article size. Am I just missing it, or was this feature removed? Does one have to use one of the "external tools" to find an article's size now? Should this guideline be updated? -- IllaZilla ( talk) 17:31, 1 September 2011 (UTC)

They look fine to me; the first one shows "10:52, August 29, 2011 IllaZilla (talk | contribs | block) (107,104 bytes)" which is definitely showing the size. Wizardman Operation Big Bear 18:48, 1 September 2011 (UTC)
D'oh! I must be blind...I didn't think to look down the revisions list, I was looking for something along the top or the left, like when you used to open the edit window and it would say "this article is xx kb" up top. However, it does seem that the size is now displayed in bytes rather than kb, so perhaps a minor change in the wording is needed. -- IllaZilla ( talk) 18:59, 1 September 2011 (UTC)

Depreciating the WP:SIZERULE shortcut

On 6 September I introduced the WP:SIZEGUIDE shortcut as a replacement for WP:SIZERULE, on the grounds that the criteria are not a hard-and-fast rule, and because "readable prose size" is all too often mistaken for "article size". It was reverted on 19 September without a refutal of my reason for doing so. Per WP:BRD, I am starting this discussion to see whether that was one editor, or whether there is a wider consensus to continue referring to it as a "rule". — WFC— 09:51, 20 September 2011 (UTC)

I agree with your change, it's definitely just a guideline (and one that many of WP's best articles ignore, as a look at User:Dr pda/Featured article statistics will reveal) not a rule. Wasted Time R ( talk) 11:26, 21 September 2011 (UTC)
While it may be just a guideline, the section is titled "A rule of thumb", not "A guide of thumb", which is why WP:SIZERULE was used. It really doesn't matter what we use since it's pretty obvious to anyone that it's not a rule and WP:SIZERULE is not likely to be deleted. -- AussieLegend ( talk) 14:02, 17 July 2012 (UTC)

Unsplitting

Over discussions over the articles iOS version history, Android version history, and History of iOS jailbreaking, I've noticed this guideline been trumped out as a reason that, despite failing policy (in the former two, flagrantly and inherently violating WP:NOT#DIRECTORY, in the latter, systemic violations of WP:V throughout the parent article leading to blind application of the guideline), the articles should remain separate if split due to SIZE, which leads to an interesting question: if split articles fall below 32-40KB of readable prose, should they be merged back? And on the point of the latter article, should cleanup be recommended before splitting? Sceptre ( talk) 23:38, 22 October 2011 (UTC)

Edit request

Include redirect WP:AS in "Shortcuts" box. 71.146.20.62 ( talk) 03:44, 28 November 2011 (UTC)

Images as part of the total download of an article to a browser

Images have been discussed here before with regard to the total size of an article. Even though an image may only have a few tens of characters of text in the edit box, the thumbnail of that image will require a few 100 kb of download bandwidth. The TOOLONG guideline should give some indication of the upper limit of acceptable image use. For instance, this version of the article List of American Civil War Generals (Confederate) contains 237 thumbnail images, each requiring about 162 kb in my 1024x768 browser window. This makes for a very unwieldy article of more than 38 Mb! Let's add a paragraph about the TOOLONG problems associated with too much bandwidth taken by images. Binksternet ( talk) 20:25, 31 March 2012 (UTC)

Is there any feature to automatically split off an article into pages?

Apologies if this is the wrong venue to ask such a question, if somewhere in meta might be better to ask, but, sometimes, for very long pages, like the RfC for Mohammad images, or some old archives, it significantly slows down my browser when viewing them. If only there was some kind of feature that could allow me to set it so that when viewing pages or page histories, it would automatically break off the page at a user-specified amount (like, break off to "page 2" or "page 3" etc. if the next section makes the page exceed 200kb). This is especially a problem in very old archive pages where it would be inconvenient to break apart a page despite being long. Does such a feature exist?-- New questions? 18:21, 10 April 2012 (UTC)

No. If you create a book or download as pdf you convert it into a pdf that you can view a page at a time, but you still have to download the whole article. If you are looking at old archives they can be split into smaller archives. Commonly archives are split somewhere between 60k and 150k bytes. This is the talk page, though, not the article. Wikipedia:Requests for comment/Muhammad images is a single closed discussion that goes on for a whopping 933,207 bytes, and can certainly be split into sections. Normally RfC's are maintained intact while they are open, but for logistic reasons when they get over 200k I would argue for putting them into subsections. Apteva ( talk) 21:29, 19 September 2012 (UTC)
Please see Wikipedia:Requests for comment/Muhammad images/Intro. That discussion has been split up into sections for anyone who wishes to read it more easily. Apteva ( talk) 22:05, 22 September 2012 (UTC)

Mobile

I notice the article does much to address questions concerning antique (turn of the century) computers and browsers, but doesn't mention something new, namely Help:Mobile access. I do nearly all my editing by five years old hardware with a nice big screen and DSL connection, but much of my reading is away from home, on my palm sized smartphone or my hand sized Android tablet. These automatically go to the .m. mobile page which shows only the lead and the top section titles until I tap the title.

Alas, such accommodation becomes inadequate when the article is long. Either the list of sections is too long, or it inadequately guides me to the desired information, or each section upon opening overwhelms my ability to understand it on the little screen, or all the above. And where there's no Wi-Fi and 3g coverage is poor, it takes a long time to load the page. Surely I'm not unusual among readers in facing these problems, and the number of affected users will only increase with the popularity of smartphones and smaller tablet computers (even with the relatively large iPad it's somewhat a problem). Do we need a new section? Jim.henderson ( talk) 17:42, 27 April 2012 (UTC)

Discussion about split of large articles at an arbitrary point

I have raised a query at Wikipedia:Village_pump_(policy)#Splitting_articles_arbitrarily about the bit in WP:SIZE#Very large articles where it says very large articles may be split arbitrarily. I think this is okay for very large lists but not articles and see no sense in this end run round notability. I believe is an article is large enough to require splitting there will always be subtopics which satisfy notability. Dmcq ( talk) 00:52, 24 May 2012 (UTC)

I think arbitrarily splitting mainspace article should be avoided at all costs. I agree that there are subtopics that can satisfy satisfy notability (but not "always"). Is there a page that this is in relation to? -- Alan Liefting ( talk - contribs) 01:02, 24 May 2012 (UTC)
The guideline at Wikipedia:Article_size#Very_long_articles needs rewording:
If possible, such (V)ery large articles should be split . If possible, split the content into logically separate articles. If necessary, split the article arbitrarily. Avoid arbitrarily splitting mainspace articles unless there is a demonstrated technical problem loading the page on at least one major browser. If you do split an article arbitrarily, be careful to link the resulting parts to each other. For non-mainspace articles, consider splitting off the top and bottom parts of the article and transcluding them into the split parts.
Probably should add something about summary style as well -- Alan Liefting ( talk - contribs) 01:11, 24 May 2012 (UTC)
I've asked for some example at VPP where this arbitrary split business makes sense other than a stand alone list. Still waiting but there's a lot of theoretical waffling. I cAn't see the point without having a clear need, WP:IINFO about indiscriminate information covers anything else I think. Dmcq ( talk) 02:14, 24 May 2012 (UTC)

The first issue here is that Dmcq is arguing against 5 years of consensus on the verbatim phrasing "If necessary, split the article arbitrarily", and that this text should be preserved on that ground alone. Many many editors have read that sentence and understood that "arbitrarily" means, not "randomly" or "at an arbitrary point", but basically "according to some clear local-consensus method". Therefore there is no need to change the wording now.

The second issue is that Dmcq has affirmed the POV that the many many subarticles split according to some clear local-consensus method are actually "notable", such as Later life of Isaac Newton (arbitrarily starting 1693), House and Senate career of John McCain, until 2000 (arbitrarily 1981-2000), and Cultural impact of the Guitar Hero series (arbitrary subset of notable topic). If more examples are needed I can oblige. Another POV is that these are not notable but widely accepted because they are spinouts. Given Dmcq's POV, if the word "arbitrarily" is understood as it has been for 5 years, there is no reason to change it, because the articles will by the POV's definition be "notable"; and given the alternate (stricter) POV of notability, the word "arbitrarily" is very necessary to permit articles that fail this strict N standard.

Now, as to Dmcq's Alan's wording (assuming we remove the stray "them" in the last line), the last sentence seems to be chopped up for no reason; the rest assumes that it is always possible to split the content into logically separate articles. Dmcq has rejected all counterexamples, but they at least prove that it is at least sometimes colorable that the content is not split logically and notably. I don't think this assumption necessary in case there should arise a consensus that there is no notable way to split a very long article.

Further, Dmcq has opened the same discussion on two talk pages for some reason; I have invited the VPP to centralize here.

I think the whole problem is that Dmcq is reading "arbitrarily" as "at an arbitrary point", which is a novel or original reading of the guideline. There is no evidence the guideline needs adjustment. JJB 02:58, 24 May 2012 (UTC)

Please discuss at VPP where this was raised as a centralized issue rather than at one of the separate guidelines and where people were directed . Dmcq ( talk) 03:02, 24 May 2012 (UTC)
By the way that was not my wording above, and I don't agree with the new wording anyway. Firstly though one should decide on the central issue of notability of the split off articles which is the discussion at VPP. Dmcq ( talk) 03:05, 24 May 2012 (UTC)
I have yet to wade my way through the discussion at VPP but any discussion of this guideline should be made on this page. The notability of a split out article is covered elsewhere and obviously a split of a larger article would be into a sibling topic or topics that have notability as a standalone article. Is there a wording that you can suggest. -- Alan Liefting ( talk - contribs) 03:50, 24 May 2012 (UTC)
Can someone more knowledgeable about venue just be bold, make a good case of where it should be (audience, draw, etc), and copy/paste the conversation over? At this point it matters more it's not in different places than where it ends up. We can always change venue later. Thanks. Agent00f ( talk) 04:35, 24 May 2012 (UTC)
We are discussing this guideline so therefore this is the correct venue. I will slap an {rfc} on it. -- Alan Liefting ( talk - contribs) 05:03, 24 May 2012 (UTC)
JJB, longevity of a guideline is not a reason to keep it. Also, the guideline should use the commonly accepted use of the word " arbitrary". It prevents confusion. -- Alan Liefting ( talk - contribs) 03:50, 24 May 2012 (UTC)

Alan, sorry I didn't realize that was your wording. Longevity of a guideline is a silent consensus that it works for many editors. Yes, the word "arbitrarily" could be clarified based on your concerns, but the assumption that it's always possible to split logically should not be suddenly added to the text. Possible proposed change: For instance, the meaning of the sentence could be clarified as, "If this is not possible, split the article according to local consensus." But the rest should stand for the reasons above.

Dmcq, the discussion I previously linked shows that there is no supermajority consensus on notability of spinoffs, so this will not be decided at VPP today. The fact is that we have many nonnotable spinoffs, and they often survive AFD (or more commonly are never nommed). The stated rationales vary (one reason for unclear consensus): sometimes it's SNG shoehorning, sometimes it's a relatively loose N affirmation like your own, sometimes it's recognized as a spinoff, sometimes it's recognized as a pointy nom that would imbalance a set, sometimes it's a merge that affirms the spinoff principle. Since you seem to define "notable" as including most of the adhoc local-consensus splits as well as allowing the various nonnotable large-list splits (although that would include the list of poker events), I really don't know that there is an issue for VPP besides your finding the word "arbitrarily" to be ambiguous. JJB 04:36, 24 May 2012 (UTC)

The vernacular def of arbitrary is vague, and people often conflate it with random. Probably best to clarify that it's actually based on domain knowledge first, then gradually moving to less desirable methods to if it's not possible. That's probably how it tends to work in practice, but given how playing semantics is popular, better be clear than some big fight over just how random the process should be. Agent00f ( talk) 04:51, 24 May 2012 (UTC)
The whole issue needs to be revisited anyway. The entire rationale for arbitrary splits was that some artices were too long to be edited in certain browsers. They are now ridiculously obsolete, and no longer pose a technical problem worth mentioning. WP:SUMMARY is all we need any longer. When normal-prose articles (i.e. non-list, narrative articles intended to be read from start to finish) become unwieldy, SUMMARY provides a clear roadmap for how to split them up for better reader usability. For long lists, e.g. glossary articles, no one is going to sit and read them from top to bottom; their principal modes of use are a) being in-page-searched for specific entries and b) having specific entries linked to from other articles; splitting them destroys the very foundation of their functions in Wikipedia. At least one consensus discussion has already concluded that we should stop splitting such articles. — SMcCandlish   Talk⇒ ɖ∘¿¤þ   Contrib. 02:42, 29 May 2012 (UTC)
Shall we just delete the complete section headed "Very long articles"? The following section headed "Web browsers which have problems with long articles' is sufficient to cover any possible occurrences of browsers having issues with long articles. -- Alan Liefting ( talk - contribs) 03:43, 29 May 2012 (UTC)
Looking through it I can't see anything in the whole 'Technical issues' section which I feel needs to be kept. 400k sounds far larger than the 100k in the almost certainly should split bit even if it was a big issue nowadays. An article has already become a too big long before that is reached. Arbitrary split is I think only for lists and that is covered elsewhere. No-one is going to try splitting normal articles into anything except logical sections and the summary style guideline says about doing that. Dmcq ( talk) 09:35, 29 May 2012 (UTC)
I think the continuing value of "arbitrarily" is in pointing us to not arguing about which split is most logical. Maybe it could be replaced with an indicator saying something like past semilogical splits that look arbitrary need not be rewritten, and as long as new splits are semilogical they're fine. Recognize that some of the section is dated but some needs preserving. JJB 16:13, 29 May 2012 (UTC)

There may or may not be a technical reasons for splitting an article (not all access is via machines with large memories (virtual or otherwise), but there is definitely an economic and other reasons for doing so. Many people have to pay for every byte they download (either because they live in a country where the Telecoms use that model for charging for broadband access), or because their mobile service provider charges that way for mobile hand held devices (phones, tablets etc). There is also the case when a person accessing the net is connected via a free wifi service with a data limit on downloading (eg at a library). -- PBS ( talk) 11:36, 31 May 2012 (UTC)

I've never heard of splitting an article and don't see any examples where it was done. However, the proposed language does make sense to avoid unnecessarily contentious discussions and messy articles. CarolMooreDC 16:47, 7 June 2012 (UTC)
Basically it just means splitting out big subsections like the later life of Newton when he concentrated on alchemy or the early life of Mitt Romney before he became a presidential candidate. Dmcq ( talk) 22:53, 7 June 2012 (UTC)

I'll try just deleting the whole technical issues section. The 400k limit in it is far larger than the recommended size limits anyway. We could add a bit in the size section about larger sizes causing problems with slow connections as well as causing readability problems but that's about it I think. Dmcq ( talk) 22:53, 7 June 2012 (UTC)

I've kept bits about download speed and size for mobile phones and probles with slow connection speed. Solutions are left to the splitting section Dmcq ( talk) 23:04, 7 June 2012 (UTC)

We could bring it back, replacing the tersely ambiguous "arbitrary" with something clumsier and more precise such as "regardless of other considerations" and the overly precise "400K" with something longer and vaguer such as "hunreds of kilobytes". Jim.henderson ( talk) 13:40, 10 June 2012 (UTC)

What is 'it' exactly and why? What would it convey that you think is missing or needs to be said? Dmcq ( talk) 16:37, 10 June 2012 (UTC)
The stuff about defunct browsers has been added back. Does anyone see a point in having historical information in this guideline? Also anyone know what what is a 'non-mainspace article'? Dmcq ( talk) 20:00, 11 June 2012 (UTC)

RfC: Should the summary style guideline quote WP:Notability and if so in what place

You are invited to join the discussion at Wikipedia talk:Summary style#RfC: Should the summary style guideline quote WP:Notability and if so in what place.

This RfC is to decide the specific changes discussed at in Wikipedia:VPP#Splitting_articles_arbitrarily. This may affect the notability of subarticles and is related to the RfC above. Dmcq ( talk) 19:09, 1 June 2012 (UTC)

"The Biggest Loser South Africa" article

I would like to start a discussion about how to split the Biggest Loser South Africa article. In my opinion, it should either be a minimum number of tables, or should be split into several smaller articles, as 600 kB is ridiculous. Thoughts?-- Jax 0677 ( talk) 18:51, 10 June 2012 (UTC)

I think there's plenty of scope for irony. Unfortunately, I can't understand how the page is laid out, so I don't know how it should be divided. I would however lay down a bet on the fact that a huge amount of that information is just repeating trivia from the show and doesn't belong in an encyclopaedia anyway. CMD ( talk) 19:04, 10 June 2012 (UTC)
Well okay lets ignore for the moment that it all seems to pretty much go against WP:NOTREPOSITORY, WP:NOTEVERYTHING, WP:NOTSTATSBOOK and WP:UNDUE. Where did all that data come from? The references don't seem to have them. And if the references had them we could just summarize and point to the references. So first step is {{ citation needed}}. Dmcq ( talk) 20:45, 10 June 2012 (UTC)
Many of the tables were created by an anonymous user. I say we give it a week just like other things. If nothing is done, then we can eliminate all of the tables until references are inserted.-- Jax 0677 ( talk) 21:35, 10 June 2012 (UTC)
Sounds like a plan to me. Dmcq ( talk) 22:25, 10 June 2012 (UTC)

Issues with this guideline?

With recent events, such as deletion of Ashton Kutcher on Twitter, Personal life of Jennifer Lopez, and Rihanna on Twitter, bad splitting (and awful transclusion) of List of Codename: Kids Next Door episodes, and bad use of Template:very long, is there something generally wrong with this guideline? Is it consistent with other policies and guidelines? -- George Ho ( talk) 00:00, 4 July 2012 (UTC)

Like what? What is the problem you see? Dmcq ( talk) 00:12, 4 July 2012 (UTC)
The "Rule of thumb": Human rights (  | talk | history | protect | delete | links | watch | logs | views) article is over 100kb, even after splits, and... no condensation or further splitting is needed. Nevertheless, I am not sure if there is something wrong with this guideline anymore; I'm too frustrated that I don't know what to do. The "No need for haste" is getting ignored more often per List of Codename: Kids Next Door episodes. -- George Ho ( talk) 00:21, 4 July 2012 (UTC)
The prose size of Human rights is only 54kB. It's big (and oh wow, what a table of contents), but not above the 60kB limit given here (although I reckon further condensation would definitely be useful). CMD ( talk) 00:34, 4 July 2012 (UTC)
So no issues with this guideline? If no issues, then how can this guideline be consistent with WP:What Wikipedia is not and cases of Twitter articles and unnecessary personal forks of people? -- George Ho ( talk) 02:21, 4 July 2012 (UTC)
Please explain your point in more detail like I asked you to. I don't know what your point is. As far as I'm concerned what you have written so far could have been just strung together by a chatbot, that is how little meaning I have extracted from your question about consistency.
As to the Human rights article I definitely think it could do with cutting it down and summarizing more. All the subtopics are notable so there is no problem about moving things out into other articles. With a smaller article a person would be able to read it all easier and then just click on the links for the bits they want to know about. Wikipedia is an internet encyclopaedia, more use should be made of links.
The Human rights article is 111kb as seen by the history which means the guideline definitely suggest splitting. That confirms my own feeling that the article is oversze and should hav bits split out better Dmcq ( talk) 08:04, 4 July 2012 (UTC)
Careful, that number is not usually considered very big at all. Article size on the history list can be deceptive and apparently too big mainly because references can be very extensive without that making the article unreadable. The human rights article has a quite long list of references.
Breaking articles up, and ruthlessly pruning them generally damages articles, information generally falls down the cracks between the subarticles.
At 54k readable text it's borderline, it could do with being perhaps only slightly shorter, it's not really oversize.
Basically, it's not that big, you could leave it alone with a clean conscience or give it a light pruning. Teapeat ( talk) 12:29, 4 July 2012 (UTC)
Pruning means discarding. No discarding is involved in splitting off and summarizing. In fact splitting off a notable subtopic tends to give them more space to grow without people complaining about the length and them being unreadable. My feeling about the Human rights article is that it has definitely become too big and is not easily read as an entity. There are too many little bits which do not relate to the main topic directly but only through subtopics. I find the substantive rights section rather worrying in particular as there seems to be no basis for the decision on what to include in it. ANd when I looked at Universal Declaration of Human Rights it is a straight copy of the declaration with no analysis and no links to articles about the subjects covered whereas it could provide a good basis for structuring references to individual rights. Dmcq ( talk) 13:07, 4 July 2012 (UTC)−
Look, you have permission to edit the text if you think it's too long. These rules of thumb indicate that it's a bit too long, but they're only rules of thumb: ultimately it depends on the article. Teapeat ( talk) 14:14, 4 July 2012 (UTC)

As Dmcq said, I must respond. Writing about one topic can result a big article. Nevertheless, writing about a subtopic must be consistent with applicable policies and guidelines; otherwise, a subtopic article may be at risk of deletion, like Ashton Kutcher on Twitter. I'll rephrase the "consistency" part: Does this guideline have to mention WP:What Wikipedia is not when it comes to articles of topics and subtopics? Why can't this guide mention about any other policies and guidelines? -- George Ho ( talk) 14:24, 4 July 2012 (UTC)

It can, but they're really implicit. If something isn't due enough to have a large section on the main article, and isn't notable enough to stand up on its own, it should probably not be on wikipedia. CMD ( talk) 15:59, 4 July 2012 (UTC)
Why not explicit explanation? Sometimes, people tend to take size too seriously without considering consequences, like Twitter articles. -- George Ho ( talk) 16:05, 4 July 2012 (UTC)
Such as what? I don't know the circumstances surrounding the twitter articles, so I can't really help with them at the moment. CMD ( talk) 16:12, 4 July 2012 (UTC)
Twitter aside, as I realized, notability is not easy to define, so notability is not a good example to include in this guideline. It might have been included before, but notability is often misinterpretted as a guarantee of a "valuable article", so I guess it is best not to include it again. Therefore, what about adding a section of what Wikipedia is and is not, so readers may not be forced into reading them further? If that's not it, what about mentioning WP:Manual of Style? -- George Ho ( talk) 16:23, 4 July 2012 (UTC)

Rule of thumb (4 July)

I've changed the rule of thumb to refer to the markup size as given in the history. It is pretty obvious that people have used this normally and mean this. There would be no point talking about limits on sortable tables otherwise as readable prose doesn't include tables according to the bit at the beginning. Dmcq ( talk) 14:07, 4 July 2012 (UTC)

No, you just messed up the intro, I reverted it.
Look, the really common problem we have with this guideline (as opposed to your specific article) is that people assume that any article with a wiki markup size of 100k needs savage pruning, but more than half of that can be references and other things that just don't count. They then try to shrink the article by about 50%, which is usually far, far too much. Teapeat ( talk) 14:14, 4 July 2012 (UTC)
The more technical issues, like wiki markup size and browser limits can be problematic, but much, much less often. Teapeat ( talk) 14:14, 4 July 2012 (UTC)
I changed the intro because it doesn't reflect reality. People are not using the script mentioned in this to measure article sizes, they use the markup size. The guidelines should reflect reality. If you don't agree with that then perhhaps we could set up an RfC to resolve the issue. If you would like to state a case for prose size rather than markup size that would be good. Dmcq ( talk) 14:17, 4 July 2012 (UTC)
The table states that it only applies to readable text size. It's completely impractical to use markup size because every time you added references and other markup you would have to shrink the prose size to compensate. Teapeat ( talk) 14:22, 4 July 2012 (UTC)
This is a guideline not a policy. There is no 'have to' about it. The rule is a 'rule of thumb'. It is supposed to be easy to follow. The markup roughly gives the amount of information in an article. If an article has large numbers of citations that outweigh the straight text then it should still be split. Just because it is a citation does not stop it costing money or taking time to download to a phone or cause it to print on less pages. The readability bit talks about 50k being an upper limit whilst the rule of thumb says 100k. That is plenty of room for citations. Dmcq ( talk) 14:29, 4 July 2012 (UTC)
It's definitely the readable prose size that people use. For example in FAC or GAN discussions about whether an article is too long or not, it's always readable prose size that is used, not the total markup size that the 'history' command shows. Yes, you have to install the User talk:Dr pda/prosesize.js tool to get the readable prose size, but most serious editors doing reviewed article work do that. Wasted Time R ( talk) 14:32, 4 July 2012 (UTC)
Okay I will set up an RfC on this question. Dmcq ( talk) 14:34, 4 July 2012 (UTC)
Seriously don't bother. It won't go the way you want; and I don't even understand why you want it; if you want to reduce the size of the particular article you're concerned about, go ahead, the guideline indicates it's a bit too big anyway (the target is no bigger than 50k, and it's currently at almost 55k). Teapeat ( talk) 14:43, 4 July 2012 (UTC)

RfC: Should the rule of thumb for article size refer to readable prose size or markup size?

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section.

Should WP:SIZERULE, a rule of thumb guideline for saying an article is too large, refer to the 'markup size' of an article given by the size in its history - the number of bytes downloaded when an edit is done, or should it refer to the 'readable prose size' given by the text excluding any footnotes and reference sections ("see also", "external links", bibliography, etc.); diagrams and images; tables and lists; Wikilinks and external URLs; and formatting and mark-up as given by the script User:Dr_pda/prosesize? There is a discussion above in Wikipedia talk:Article size#Rule of thumb. Dmcq ( talk) 14:52, 4 July 2012 (UTC)

  • Markup size I believe this is how the rule of thumb has normally been used. The markup includes lists and it is double the 50k mentioned in Wikipedia:Article size#Readability issues as being an upper limit for readable prose size, 100k allows plenty of room for citations. There would have been no point mentioning tables in that section if people meant readable prose as tables are not included in that. Dmcq ( talk) 14:52, 4 July 2012 (UTC)
  • Readable prose size. This has always been the meaning, such as in FAC and GAN discussions when issues of whether an article is too long has arisen. Note that User:Dr pda/Featured article statistics has a bunch of statistics of lengths of featured articles; the stats all use readable prose size, not markup size. And if you look at these back-links, you can see all the times these readable-prose-size-based statistics have been referenced in FAC, GAN, and other talk page discussions. Wasted Time R ( talk) 15:03, 4 July 2012 (UTC)
    • That give about 95 references. There are about 6500 references to WP:Article size. The question is what have all those other ones done? Dmcq ( talk) 15:11, 4 July 2012 (UTC)
  • Readable prose size, call for snow close This RFC is a waste of everyone's time, using markup size would discourage proper referencing of articles since references take up lots of markup space but aren't readable prose. The guideline already explains why readable prose is a useful thing to limit, and readable prose is already routinely being used in FA and GA reviews. This is a very bad idea indeed. Teapeat ( talk) 15:05, 4 July 2012 (UTC)
    • Why aren't you using 50k as the limit for readable prose size? Dmcq ( talk) 15:15, 4 July 2012 (UTC)
      • The 50k readable prose size mention comes from being roughly 10,000 words, but in experience 10,000 words more often equates to 60k readable prose size, so 60k is the point where discussion usually begins. And in any case, this is just a guideline, not a hard limit. As you can see from that list of featured articles, there are plenty that go over 60k readable prose size, and more are being added all the time. Wasted Time R ( talk) 15:24, 4 July 2012 (UTC)
        • So you are not in fact following this guideline or believe this guideline should have the figures in Wikipedia:Article size#Readability issues upped quite a bit to double the amount? Dmcq ( talk) 15:31, 4 July 2012 (UTC)
          • I'm stating two bits of reality: the rule of thumb has always been interpreted as being readable prose size, and the whole guideline has always been interpreted as just that and not a hard limit. Anyway, do you have any objection if I invite editors at Wikipedia talk:Featured article candidates and Wikipedia talk:Good article nominations to comment here? If your proposal of a 100k cap on markup size goes into effect, that means many articles will have to go undergo a drastic cutting down. Wasted Time R ( talk) 15:55, 4 July 2012 (UTC)
  • What about page load size? There are two reasons we should be limiting article size: one for reader attention span which is about readable prose, and one for the bandwidth the article takes to load into the browser. The second metric should include the size of image thumbnails, adding considerably to the plain text and markup. Page load size is the only other metric worth considering; markup size as a measure of bandwidth is incomplete. For what it's worth, I think we should have two limits: one plainly stated for readable prose, and another for page load. Binksternet ( talk) 16:02, 4 July 2012 (UTC)
    • Because it is a rule of thumb which any user can check easily rather than some complicated thing requiring experts to measure. The various measure are fairly loosely linked. The section on readable prose size says 50k, actual articles tend to be limited at about 100k in the size of the file edited. The download size is quite a bit more and if you include images it goes up again, plus one probably wouldn't want to include the sizes of the scripts and style pages as they would normally be cached after a couple of accesses. Dmcq ( talk) 16:12, 4 July 2012 (UTC)
  • Prose size Prose size can be extremely different to markup size. Obviously they're going to correlate somewhat, but not nearly closely enough for one to estimate the other well. Prose size gives a rough estimate of the time a reader has to spend reading to finish the entire article, and that seems to be the entire point behind the guideline, establishing a consistent standard of a comfortable reading time. That's not to say we can't establish a separate limit on markup size or anything like that, if that's feasible. (I don't know what script I'm running to do it, but Page size is in my toolbox.) CMD ( talk) 16:03, 4 July 2012 (UTC)
    • Then please explain 'They also apply less strongly to list articles, especially if splitting them would require breaking up a sortable table.' The readable prose size of a table given by that script is zero. Plus there are other issues like time to download and cost for mobiles. Plus could somebody please explain why the 50k in Wikipedia talk:Article size#Rule of thumb which is the section which justifies this idea is being ignore? Dmcq ( talk) 16:21, 4 July 2012 (UTC)
      • For instance we have for List of bus routes in London
        File size: 565 kB
        Prose size (including all HTML code): 7882 B
        References (includng all HTML code): 168 B
        Wiki text: 185 kB
        Prose size (text only): 3973 B (653 words) "readable prose size"
        References (text only): 9 B
        Do people really mean this is fine by this guideline because it only includes 3973 bytes when in fact the markup size is 185 kB? Dmcq ( talk) 16:30, 4 July 2012 (UTC)
      • List articles aren't read in the same way normal ones are. I'm willing to bet that readers will just go through the top 10 or so, or use ctrl+F if they're finding something. As I said, I have nothing against a separate limit on markup size, or perhaps a separate list guideline. My response to this RfC was based on its idea of changing the current Sizerule from prose to just markup, because I think a prose limit is quite important for keeping our articles concise and engaging. CMD ( talk) 16:31, 4 July 2012 (UTC)
    • Then look at the last few articles that were tagged with {{ too long}} at the top and ignoring list articles. Economy of Pakistan has markup size 98kB and prose size 52kB so it has a problem according to Wikipedia:Article size#Readability issues and a problem by rule of thumb viewed as markup size but not when viewed as prose size. Socialism gives 120kB and 63 kB. Talcott Parsons gives 167 kB and 101 kB - too big by both. Phil Keaggy 55 kB and 38 kB, Clara Bow 66 kB and 28 kB, Capitol records 44 kB and 32 kB, # Humanitarianism 111 kB and 84 kB, Kunming 112 kB and 68 kB, Stress (mechanics) 77 kB and the script failed, Odisha 104 kB and 67 kB, Commodity Futures Modernization Act of 2000 213 kB and 46 kB.
      The thing I take from this is that people get worried about articles being too long at a point long before the prose length is 100 kB. Commodity Futures Modernization Act of 2000 is an example where there aren't tables but the citation sizes bump the file size up considerably - have a look at the citations an see if there is a problem with them! The talk about the citations is no excuse for using prose size instead of the markup size as far as I can see. Dmcq ( talk) 17:13, 4 July 2012 (UTC)
      • I removed this template from 150 pages a while back, including Education in Singapore, Cat, Human rights, and Social Security (United States). I don't think readability is that bad at all or problematic. -- George Ho ( talk) 17:25, 4 July 2012 (UTC)
        • What's your point? That you are happier with longer articles than other people? Dmcq ( talk) 17:48, 4 July 2012 (UTC)
          • I was pointing out the possible misuse of "very long" tag; that's all. Does "Cat" article have to be condensed or tagged as "very long" just because it's "long"? -- George Ho ( talk) 18:18, 4 July 2012 (UTC)
            • I think personally the cat article really could do with trimming and moving bits out. A particular problem I see with it is that the summaries in the cat article seem to be developed independently of the subtopic articles. For example cat genetics is a stub article but there is a bit on it plus a separate section on taxonomy and evolution which pointed to cat evolution which was a small section in cat gap. The Health section is also a mess developed independently of the cat health article. The whole behaviour section was a larger and even more messy version of the same problem. Keeping the size in check would help avoid problems like these. Dmcq ( talk) 22:52, 4 July 2012 (UTC)
              • If you want to tag it as "very long" or {{ overly detailed}}, go ahead. Still, I wonder if read of thumb is really helpful at this time or in the past. -- George Ho ( talk) 23:02, 4 July 2012 (UTC)
                • You're just missing the point. Overly detailed is not appropriate, what is appropriate is that the section on cat genetics should have a summary corresponding to the lead of the cat genetics and cat gap articles. However the cat genetics article remains a stb and stuff is being shoved into the cat article. This is what I mean about WP:NOTPAPER. The cat article is being developed as a book rather than a page in a hyperlinked encyclopaedia rather than following WP:SS guidelines for instance. Dmcq ( talk) 14:24, 7 July 2012 (UTC)
Unfortunately, for you, I like the article the way it is. And I'm amazed that everybody here wants to keep a "Rule of Thumb" section. I don't know how that section is related to the quality of this article. I get a feeling that, without a "Rule of Thumb", this "guideline" would become nothing more than an essay. Unfortunately, for me, I must condense it in favor of reducing length if that section were kept. Would this affect my writing quality? -- George Ho ( talk) 14:27, 7 July 2012 (UTC)
Thanks for addressing the issue instead of just stating a preference. I have put in a new section at #Why wikitext instead of prose text size to try and address the question fully as I see it. No I was not intending the rule of thumb should go, I just want it to be fit for purpose. I don't understand what you mean about "Unfortunately, for me, I must condense it in favor of reducing length if that section were kept. Would this affect my writing quality?". If you like that article as it is I guess you are saying you prefer a monolithic article built like a chapter of a book rather than using links. Both ways can have good writing but I would point again at WP:NOTPAPER as encouraging writing for an internet based medium. Dmcq ( talk) 15:37, 7 July 2012 (UTC)
I'll rephrase: Must I change my writing ability for the sake of length and condensation? Must everybody else? To me "Rule of thumb" helps "Article size" page become a guideline; without it, its guideline status would be doomed to failure. I couldn't and wouldn't let loading and reading issues get to me. I hope these issues are too minor to everyone; in fact, if too lengthy, anybody can resolve one issue or another by editing or addressing one problem of an article in the talk page, not here. -- George Ho ( talk) 16:29, 7 July 2012 (UTC)
If you cannot adjust your style to the internet as outlined in WP:NOTPAPER then possibly somebody else can fix the problems. It is not necessary that everybody be able to do everything well, people have different talents. However if your style is such that you would resist other people putting the stuff into subtopics when article become large because you want everything in one large article rather than use hyperlinks, then you would definitely be acting against the express consensus in the policy. The issues are not minor. Dmcq ( talk) 23:00, 7 July 2012 (UTC)
  • Both markup size and readable prose size have important ramifications. Prose size has to do with striking a balance between informing and exhausting the reader. Markup size has to do with keeping the page's bandwidth requirements within a reasonable size for people who have limited access to the internet. There should be two plainly stated limits for these measurements. Binksternet ( talk) 17:57, 4 July 2012 (UTC)
    • I gave a few figures above foir Economy of Pakistan etc., are there any there or ones you know of where you think the prose size said something useful which the markup size does not indicate just as well? We are talking about a rule of thumb. Dmcq ( talk) 22:58, 4 July 2012 (UTC)
      • See for example Mulholland Drive (film), which is 53 kB (9,063 words) readable prose size and 93,884 bytes in markup size. Now look at John McCain, which is 54 kB (8,832 words) readable prose size but 164,027 bytes in markup size. Both are FA articles and although very close to each other in readable prose size, their markup sizes are very different. Why? Mulholland Drive has 121 footnotes, while John McCain has 331 (our political BLPs tend to be heavily cited, for obvious reasons). Under your proposed guideline change, John McCain would have to undergo a 40 percent reduction in size, when in reality there's nothing wrong with it. Wasted Time R ( talk) 10:33, 5 July 2012 (UTC)
        • I would consider both articles as at the limit for readable prose, which is accord with what Wikipedia:Article size#Readability issues says about 50kB of readable prose as being at about the limit. However the section Wikipedia:Article size#Rule of thumb gives a limit of 100KB. Do you really think that these two articles are only about half the size of an article which is too long? That is what you are saying if you say the rule of thumb applies to readable prose instead of the 50kB of the readability issues section. Yes if the 100KB was interpreted as markup size then the John McCain article would be considered as too long whereas Mulholland Drive would scrape in. That certainly accords far better with my assessment of the size of these articles as being at the top limit. The John McCain article has a very large number of citations many of which have text associated so a case could be made for it being kept as a unit but personally I feel it should be smaller. Why for instance does it have a big section Early life and military career, 1936–1981 which contains practically the whole of the subtopic Early life and military career of John McCain instead of summarizing it better? The same with House and Senate career of John McCain, until 2000? Have these people never heard of just summarizing a subtopic in the main article in accord with WP:SS? I get the feeling the people there don't really know what hyperlinks are in aid of or trust them, I don't know how it ever became a featured article. Dmcq ( talk) 11:57, 5 July 2012 (UTC)
          • You're wrong about your ratios. The "Early life and military career, 1936–1981" section of John McCain is 9.8 kB (1626 words) readable prose size. The Early life and military career of John McCain subarticle is 46 kB (7728 words) readable prose size. That's a 1-to-4.7 ratio, which is hardly "contains practically the whole of the subtopic" as you claim. Doing any less in the main article would shortchange the reader about one of the most important, and most written-about, periods of McCain's life. The ratio for the other section/subarticle you mention is roughly similar. As for the people on that article not knowing what they are doing, I'm the primary author of all of these McCain articles, and I assure you I did know what I was doing. If you are really convinced that John McCain should be stripped of its FA status because its markup size is over 100kB, put it up at WP:FAR and I'll see you there. Wasted Time R ( talk) 12:35, 5 July 2012 (UTC)
            • Sticking so much into the main article is just wrong. Wikipedia is not a paper encyclopaedia with pages one after the other. If a reader wants to look at that they only need to click on the link and the main points could have been summarized far better like they are in the lead to the subtopic. What has been done is that the main article has been cluttered up with stuff that shouldn't be there. Problems have been created by people not choosing the amount of detail to put into the main article properly. It is a badly structured article. What we've got is an article which has been grown unnecessarily to some limit rather than one that has a reasonable structure. There is no clear limit on the size of the early life part, it was obviously split off because the article was too big but then something went wrong. It is not a summary and it is not a proper description. If the article was done properly people scanning it linearly would be able to read a short description of the early life and then know whether to click on the link or not. What is there makes reading difficult.
              I repeat again since this article clearly has the problem - Wikipedia is not a paper encyclopaedia. That is the very first section of WP:NOT. Notice also the statement there "Keeping articles to a reasonable size is important for Wikipedia's accessibility, especially for dial-up and mobile browser readers, since it directly affects page download time (see Wikipedia:Article size)." The idea of readable prose is not covered there and it refers to this guideline. Dmcq ( talk) 13:08, 5 July 2012 (UTC)
  • Readable prose per the comments made by others, above. -- Dweller ( talk) 14:04, 6 July 2012 (UTC)
    • It would be nice if somebody could elaborate on their support or give a reason why one should not consider 50k as being a strong hint of an article being too long as per the section about readable prose size or say what on earth the business about lists in the section about the rule of thumb is in aid of or why WP:NOTPAPER only talks about download time. Dmcq ( talk) 15:07, 6 July 2012 (UTC)
  • The meaning has always been total load size. The underlying reason for the rule is that, despite the advertising claims of ISPs and high-bandwidth device manufacturers, many of our readers still read Wikipedia articles on slow connections such as dial-up. The number of high-speed users is increasing but oddly, the number of low-speed users is holding steady. The footnotes, references, bibliographies, embedded templates, etc, all feed into the total pageload time and must be considered as part of the burden on readers. We should not deliberately create problems for those readers unnecessarily. We can reasonably debate about whether the 50k limit is the right balancing point but it is disingenuous to try to arbitarily define large blocks of content away and ignore the real problems that it creates. Rossami (talk) 17:58, 6 July 2012 (UTC)
I don't believe so. On that measure Phoenix Park would already be starting to go over the 100k limit in the rule of thumb but I don't think many would consider it as anywhere near a limit. (download file 102Kb, markup size 23kB, prose 10KB). On the other hand Venus (410 kB, 107 kB, 44 kB) would be considered as being double the 200kB at which one should start consider splitting according to the technical section, at the limit according to the markup size, but not yet near considered for division according to readable prose size rule of thumb despite the readability issues section saying 50kB is right on the limit of the average concentration span of 40 to 50 minutes - so really anything more is practically definitely wasting bandwidth compared to organizing the material differently. Dmcq ( talk) 21:48, 6 July 2012 (UTC)
Well, factually, the original size used back in 2003 was actually the wikitext size. For example see: [3] GliderMaven ( talk) 01:05, 7 July 2012 (UTC)
Over the years, this seems to have changed to being readable text size; and the recommended size has increased, and a rationale about readability has been added. For example by 2006, the recommendation has increased and certainly seems to exclude markup, which presumably means it's by then referring to readable text: [4] GliderMaven ( talk) 01:05, 7 July 2012 (UTC)
  • Mostly readable prose size: I think we have to be conscious of bandwidth issues too, but it's mainly to keep articles on topic and to a readable level of detail, with proportionate weight. Shooterwalker ( talk) 23:48, 6 July 2012 (UTC)
  • Readable prose is what really matters. — Kusma ( t· c) 11:55, 7 July 2012 (UTC)
  • Readable prose size is what matters in terms of the educational use of Wikipedia. Beyond that is worth limiting overall size to facilitate access by easy download, particularly in disadvantaged regions with slow internet connection, but that is a distinct and secondary issue, as it mostly relates to non-prose (images, tables, etc) . -- ELEKHH T 08:28, 12 July 2012 (UTC)
  • Readable Prose size — These rules of thumb only apply to readable prose (found by counting the words) and not to wiki markup size (as found on history lists or other means).  Brendon is  here 14:23, 16 July 2012 (UTC)
  • Readable Prose size - Though noting that certain transclusions of prose, and also some content-displaying templates could be included. (Though obviously not template coding, nor navboxes or infoboxes or the like). - jc37 16:01, 16 July 2012 (UTC)
  • Mostly readable prose size - or we'll have the crazy situation where good referencing and an eye-relieving quote box punishes the article (through forcing important info out into subarticles), whereas poor referencing and walls of text encourages even more of the same bad. I wouldn't mind an additional recommendation for markup size, but that's for a different and IMO secondary concern (e.g. because image sizes matter much more than extra markup sizes). – sgeureka tc 13:17, 23 July 2012 (UTC)
Why does everyone here completely ignore the fact that the current upper limit on prose text is something that shouldn't be reached anyway? One should have looked at splitting the article long before that is reached. This interpretation is a very rigid and damaging one. Dmcq ( talk) 19:41, 23 July 2012 (UTC)
Probably because that hasn't been raised. All people are saying is that there should be a limit on prose length, in response to your RfC question. What the limit should be is another discussion. CMD ( talk) 05:46, 24 July 2012 (UTC)
Well my feeling is people here haven't the foggiest idea what rule of thumb meanswhich is mentioned in the title: "It is an easily learned and easily applied procedure for approximately calculating or recalling some value, or for making some determination." the meaning given in Wikipedia is accurate. Whereas prose text size is a not an easy measure, it doesn't apply properly for a large percentage of articles, and it is being applied absolutely at the limit which is far above where the guideline indicated. Dmcq ( talk) 08:54, 24 July 2012 (UTC)
And my experience is that on Wikipedia, "guideline" is understood by a significant proportion of editors (even established ones) to mean "gospel". Take my recent attempt to change the shortcut to WP:SIZEGUIDE for instance, quickly reverted in favour of WP:SIZERULE. If the problem is that pages take too long to load, images and citation templates are far more problematic than the amount of text: deal with those. If the problem is that some articles are simply far too long, then readable prose is the appropriate measure. — WFC— 10:08, 24 July 2012 (UTC)
Is there closure on this issue now? Does anything need to be added to this guideline to make this a bit more clear, to avoid future disputes? Shooterwalker ( talk) 03:16, 3 August 2012 (UTC)
  • Readable Prose size - did think momentarily about closing, but realised I'm not impartial so voted instead. Soon we'll all have fast enough connections not to worry too much about page size anyway...it's about measuring what folks read and (presumably) their attention span.... Casliber ( talk · contribs) 14:36, 4 August 2012 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Changing the nutshell?

"Articles should not be either too big or too small."

Is this the right way to say about articles, or must this nutshell be reworded? -- George Ho ( talk) 03:25, 6 July 2012 (UTC)

Rule of thumb

How is one supposed to calculate the Readable Prose size. Can the method be included in the article somewhere? Op47 ( talk) 13:00, 6 July 2012 (UTC)

There's a script one can include referenced in the Readability issues. I've had it crash on me after just outputting the first line but it normally works okay. Treating it as a simple thing for a 'rule of thumb' is pretty silly though I think, and the size articles are allowed using it are quite a bit larger than many people normally think of as too large when they stick that template in. I'd like to see some statistics on the sizes of featured articles and see what the FAC editors have been up to. Dmcq ( talk) 13:25, 6 July 2012 (UTC)
I see a bit in the guideline about size not being a reason to remove stuff. Sort of, but not quite true either. Size is a very good indicator that trivia is being put into an article if it can't be split out into subtopics that have some notability. It doesn't apply that way for lists but lists can be arbitrarily split for instance by ones starting A-E so the bits don't become too huge. Dmcq ( talk) 13:29, 6 July 2012 (UTC)
Now the "Content removal" is becoming disputed, unless it's not... -- George Ho ( talk) 15:41, 6 July 2012 (UTC)
I get the feeling you are approaching the guideline as something legalistic rather than as a guideline. If an article gets very long then yes people do start wondering about the content. It mat be that legalistically size is not a reason for removal but the article should be inspected if it is too large and stuff would often be removed that would be kept in a smaller article. Policies are supposed to describe what is done. So yes I disagree with the wording. Dmcq ( talk) 17:52, 6 July 2012 (UTC)
Now with all discussions attempting to keep that section by proposing amendments, why am I the only one who feel that this section is causing nothing but harm to Wikipedia, even with such amendments? -- George Ho ( talk) 01:34, 9 July 2012 (UTC)

Why wikitext instead of prose text size

The major thing one should do if things are too long is see if large subtopics can be cut out as detailed in WP:SS. If there are no major subtopics and the article is very long then it is worth wondering whether there is sufficient weight for the inclusion of some of the contents. Personally I haven't seen any very long articles not filled with trivia or where large bits shouldn't be cut out into notable subtopics. The only ones that aren't like that are list articles which have special provisions for splitting.

I don't think the prose text argument for the rule of thumb holds any weight because the WP:Article size#Readability issues says 50k is a limit and yet the rule of thumb is being taken as meaning there are no such issues till the prose text goes over 100k. People should be thinking about size before getting to 50k and should have it definitely in mind over 50k, 100k is a very negative aspect of a featured article.

Wikitext size tends to be about twice the size of prose text for featured articles though it can be more if citations have a lot of extra text in them, and it can be much more if there are tables since prose text ignores table sizes. The advantage of wikitext is that it is directly supported by Special:Longpages and the article history whereas prose text is not well supported. Thus if it is a reasonable measure it is a better measure as a rule of thumb. As to the current rule of thumb if 100k was interpreted as wikitext it would put prose size at about 50k before one should definitely consider splitting which is about right by the readability issues section.

The other point in wikitext's favour is a consideration of a mjor problem in Wikipedia. People are dumping in databases of sports results, election results, manga characters etc etc into Wikipedia. These articles come out as only a few hundred bytes as far as prose text is concerned but often involve downloading over a megabyte and quite long times to display even ignoring mobile browsing. Rule of thumb is supposed to be something easy, if featured article people want to polish an article that is another business and they can take time over it, but dealing with the great mass of rubbish being stuck into Wikipedia requires more everyday tools.

For straightforward articles which are okayy wikitext is as good as prose text as a rule of thumb and for everyday use talking about articles which are too large consisting of huge tables prose text is simply useless. As to lists they have their own rules and can be split fairly arbitrarily but their rules stop them being misused as database dumps in quite the same way as normal articles. Dmcq ( talk) 15:26, 7 July 2012 (UTC)

By the way I have also just set up WP:VPT#Section viewing to start thinking about coping better with long articles on mobile devices. Probably somebody else has been at this sort of thing before but getting changes in to the wiki software isn't that easy.. Dmcq ( talk) 15:44, 7 July 2012 (UTC)

Proposed load size rule of thumb

Ok, so I did some research, I read this article: Loading today's sites over dialup about load times.

I also played around with this web analyser tool: [5], I pointed it at a few article pages, the main page, and a couple of featured articles.

It looks like, roughly, an article that is about 1 megabyte of load size takes about 2-2.5 minutes on a 56K modem. The main page is about 80K and takes about 20 seconds. Yesterdays Calgary Stampede FA was 222K and would load in about 51 seconds, and the rocket page is 500K, 1:52 on a 56K modem (note that long articles tend to load a bit quicker than you would expect because of queue latency at the webserver that hurts short articles more.)

I don't have any reason to think that those articles are particular large, but I tentatively suggest we write down 1 megabyte to limit the maximum size, and to have a rule of thumb that it's all good up to 250K (a load time of under a minute).

To put this in perspective, according to that article the average size on the wider web is a bit over 1M and the load time on a 56K modem is about 2 minutes 30, so although 2 minutes is a long load, it's still above average.

I mean in an ideal world I would prefer everything to load in 5 seconds on 56K modem, but that's not going to happen, I don't think people want an encyclopedia web page that looks like it's 1993. So we have to be a bit reasonable.

webpage load size Header text
<250K good
250K-500K acceptable
500K-1M consider shrinking or splitting
>1M should be split

So I would suggest that we add that, in addition to the rule of thumb on prose length. Does that sound OK? GliderMaven ( talk) 16:12, 7 July 2012 (UTC)

The biggest problem I think with that is that a rule of thumb should be something easy, and measuring the file size is not easy. Download file size can differ between whether maths is displayed as png or Tex for MathJax and only people with some technical nous can check it. It is much easier to find than prose text size but it is still a problem. That is why I'd prefer the wikitext size as shown in the history pages for every revision.
As to the actual sizes comparisons the relative sizes tend to be about 8 download bytes to every 2 wikitext bytes to every 1 prose text byte. The major difference is if there are tables prose text ignores them, you'll be amused (?) to learn the prose text script says there is (0 bytes) of prose on the main page! On that basis your advice is roughly equivalent to the prose text guideline and at least catches the huge tables that have been pushed by some sports database dumpers.
If people could actually consider acceptable as meaning that and consider shrinking or splitting as really meaning to consider that articles that long have some problems I would be happy, but it seems they are taking maximum as meaning okay. It should be I feel that in a FAC consider splitting really means that and that special pleading should need to be done for accepting an article in that range as a featured article.
Converting the table above with the 8:2:1 equivalence to wikitext and putting in some encouragement to make smaller gives:
wikitext size What to do
< 30k normally too small to split
< 60k good readability
60k - 120k acceptable but splitting may be helpful
120k - 250k can have readability issues, consider shrinking or splitting
> 250k almost certainly should be split
This would make it easier to judge without special tools and makes it clearer that the maximum is a maximum and not an optimium. Dmcq ( talk) 23:57, 7 July 2012 (UTC)
Oh sure, it's a lot easier to do that, but it just doesn't work, it's not in any way equivalent, and there are no stable conversion ratios you can use. The trouble is that most of the load time and load size is due to loading images, but images take up virtually no wikitext, just a hundred bytes or so. But they balloon up when you load the page to many kilobytes, by a factor of 50 or more. Also references are very bulky in terms of wikitext but don't add much to the load time. GliderMaven ( talk) 00:13, 8 July 2012 (UTC)
Your idea just doesn't work at all. GliderMaven ( talk) 00:13, 8 July 2012 (UTC)
For the Calgary Stampede file size of 260k is the size of the individual html corresponding to the wikitext without images. In that case the wikitext was 72k and the prose 37k so in fact it was quite close to the ratio 8:2:1. I see rocket is one of those for which the prosesize script failed so I can only give the first two figures of 362k and 119k which is also close enough to 8:2 for an easy rule of thumb. We're not talking about something exact.
As to image sizes, that is not included in the file size of 260k of Calgary Stampede that you quoted so you weren't including them yourself. In fact the images for that come to about 628kB. One doesn't notice images so much as they get loaded last normally and you don't see the empty space except by scrolling quickly to the bottom. For rocket the total size of the images is 340kB. If one were to load rocket with an empty cache, not even all the javascript and css, one would load about a megabyte. Coming from another page about 700kB needs to be loaded.
I agree image size is an important consideration but people don't notice it so much in time, it is mainly a price cost to mobile browsers. I think a separate section would be needed about images and the main way to cope is not have such large ones and to use summary style with subtopics, i.e. for people to realize this is an internet encyclopaedia. Dmcq ( talk) 11:27, 8 July 2012 (UTC)
Those numbers are just not true. The prosesize tool doesn't give you load times or load sizes. You have to use a proper web analysis tool like the one I already linked to. The breakdown of Calgary Stampede loading it gives is that it's 222K as follows GliderMaven ( talk) 13:44, 8 July 2012 (UTC)
URL: http://en.wikipedia.org/wiki/Calgary_Stampede
Title: Calgary Stampede - Wikipedia, the free encyclopedia
Date: Report run on Sun Jul 8 09:07:20EDT2012
Diagnosis
Global Statistics
Total HTTP Requests: 33
Total Size: 222714 bytes
Object Size Totals
Object type Size (bytes) Download @ 56K (seconds) Download @ T1 (seconds)
HTML: 52324 10.63 0.48
HTML Images: 124925 29.50 5.26
CSS Images: 15941 3.78 0.68
Total Images: 140866 33.28 5.94
Javascript: 23997 5.78 1.13
CSS: 5527 1.30 0.23
Multimedia: 0 0.00 0.00
Other: 0 0.00 0.00
As you can see 2/3 of it is in images, the HTML markup is only 52K. As I pointed out, the load size is dominated by images, which are loaded as thumbnails, but they will take up virtually no wikitext at all. GliderMaven ( talk) 13:44, 8 July 2012 (UTC)
The Rocket page analyses as follows, as you can see it's 500K:
URL: http://en.wikipedia.org/wiki/Rocket
Title: Rocket - Wikipedia, the free encyclopedia
Date: Report run on Sun Jul 8 09:30:45EDT2012
Diagnosis
Global Statistics
Total HTTP Requests: 79
Total Size: 500598 bytes
Object Size Totals
Object type Size (bytes) Download @ 56K (seconds) Download @ T1 (seconds)
HTML: 81986 16.54 0.63
HTML Images: 367824 86.91 15.55
CSS Images: 15941 3.78 0.68
Total Images: 383765 90.69 16.23
Javascript: 29320 7.04 1.36
CSS: 5527 1.30 0.23
Multimedia: 0 0.00 0.00
Other: 0 0.00 0.00
You're basically, repeatedly looking at a surrogate number, wikitext, but there's no reliable correlation at all with the actual things that people actually care about, like page load time or reading time. GliderMaven ( talk) 13:44, 8 July 2012 (UTC)
I wasn't looking at wikitext, I was looking at the actual sizes. I will list the images loaded specifically for Calgary stampede and their sizes so you can check that your figures are wrong:
13.2K 220px-Stampede_chuckwagon_race.JPG
13.4K 220px-Patsy_Rodgers_stage_coach_1a.jpg
13.7K 220px-Steerwrestling-c01.jpg
14.1K 220px-Calgary_Stampede_Logo.svg.png
14.3K 220px-Calgarystampede.jpg
15.8K 220px-StampedeRodeo2002.JPG
15.9K 350px-Saddledome_from_Calgary_Tower.JPG
16.2K 220px-1923_Calgary_Stampede_parade.jpg
21.6K 200px-Barrel-Racing-Szmurlo.jpg
30.2K 220px-Bull-Riding-Szmurlo.jpg
31.0K 220px-Program_for_1912_Calgary_Stampede.jpg
73.8K 220px-Sale_Pelletier_ice_show.png
76.8K 250px-Chinook_Stampede_Breakfast.png
80.3K 220px-Stampede_Protest.png
83.0K 220px-Stampede_Midway_2011.png
91.0K 220px-Indian_Village.png
total 604.3K for page specific images. You can see from this that this sort of thing is not suitable for a 'rule of thumb' without a lot more work to make an easy tool. Dmcq ( talk) 15:36, 8 July 2012 (UTC)
It looks like the webtool is getting a bad response from the wikimedia servers perhaps; it's only getting a 5k rejection message instead of the actual image. So the actual size of the images is going to be significantly bigger. This only underlines how bad your idea of using wikitext size is; it correlates very, very poorly with anything the user actually care about; as everyone else keeps telling you; even when the tool very significantly underestimated the image size it still dominated load size. GliderMaven ( talk) 22:52, 8 July 2012 (UTC)
I've explained above about the gradual loading of images leading to an improved perceived response and people don't worry so much about the empty boxes anyway when they know they will be filled later. Images should be dealt with as a separate problem as they can mostly be adjusted independently of the text. As I said before the main fix always is to use content splitting and we need an easy rule of thumb we don't need something exact. As you have demonstrated above total download per page is not an easy measure. Dmcq ( talk) 00:11, 9 July 2012 (UTC)
No, we need a realistic measure; one that includes the size of image thumbnails and other files. If there is a way to automate such a measurement then we need to encourage the server side people to help us make it easy. First, though, is accuracy. Second is ease. Binksternet ( talk) 01:12, 9 July 2012 (UTC)
What would one be accurate about and why is it important to be accurate about it? The featured article people with their prose size do have a point, the first consideration should be readability. It is just they have ignored the real limit and pushed something that is hard for most editors and causes real problems when dealing with fanatics shoving in their databases of sports facts that really makes prose text unworkable and bad for general use so we need something that is fairly consistent with what is really wanted there. A tool that outputs total download size, total article specific images or other downloaded media size, total article specific html text size, wikitext size plus the prose text size would be nice, but we do need a simple rule of thumb to get people looking deeper and image size + text size just is not that as it can totally swamp text size problems, whereas if the text size is okay it is normally fairly easy to adjust for image size problems. Dmcq ( talk) 11:36, 9 July 2012 (UTC)
I'm in favor of three plainly stated size limits: readable prose (requires a tool), HTML markup (seen at a glance), and page load (with image thumbnails; needs a new tool). Binksternet ( talk) 13:05, 9 July 2012 (UTC)
Dmcq, you keep pushing markup size, but as everyone here has confirmed, it's the least meaningful number to use. The User:Dr pda/prosesize tool is easy to install and with one click gives all three sizes: readable prose, markup, and load size. If it's so important that everybody be looking at sizes all the time, then the best action would be for this tool to be included in all user accounts by default. I see from your user page that you've worked in the computer field - so you may be familiar with the simple Unix "wc" command. It prints the number of lines, words, and characters in a file. It doesn't just print the number of words and leave the user to make an estimate from that on the number of lines or characters. Wasted Time R ( talk) 04:08, 8 July 2012 (UTC)
Why are featured article candidates not penalized for going over 50k prose text size if readable size is so important? Why will you not engage with the fact that the main problem is people pushing enormous wadges of text with trivia into Wikipedia and that the featured articles even with being pushed to silly limits are not the really big problem. My experience with computers tells me I should make things simple. I have cut out or hidden features in products rather than release them to save on support costs and made sure examples included features users should use rather than showing arcane things. Sometimes a user would be told about a hidden feature if they said about a problem it could fix but keep it simple stupid is the right way to do things for something that says 'the encyclopaedia anyone can edit'. There is no point showing three figures instead of one for a simple rule of thumb. And prose text style has far too many problems compared to its use for readable prose which the FAC editors seem to be ignoring anyway. Dmcq ( talk) 08:55, 8 July 2012 (UTC)
I don't agree with you that there is any problem in the first place. I think the featured articles that go over 60kb / 10,000 words of readable prose size do so for a good reason and I don't think they're full of trivia. I also don't think your model of hypertext and link clicking is how readers actually use Wikipedia in many cases. If you look at the page view stats ( http://stats.grok.se/) for June 2012, for example, Barack Obama got 645,000 views while United States Senate career of Barack Obama got less than 3,000 views, Barack Obama social policy got less than 3,000, Economic policy of Barack Obama got 4,000, and Foreign policy of the Barack Obama administration got 3,500. Paul McCartney got 429,000 views while Paul McCartney's musical career got 2,000 views and Personal relationships of Paul McCartney got 7,000. To use your example, Cat got 406,000 views while Cat gap got 2,000, Cat genetics got 1,000, and Cat health got 3,000. These 100-to-1 or worse readership drop-off ratios are common, I've seen them across many time periods and article/subarticle combinations. So if there's something important about any of these topics, editors know it had better go in the main article, otherwise 99 percent of their readers will never see it. (I can't prove it, but I think readers mostly reach WP articles from search engines or from clicking from one main topic to another, and not from drilling down within a topic. It would be great to actually see a use study of how people reach and navigate WP.) As for making things simple to use and 'the encyclopaedia anyone can edit', that ship has already sailed. The unsuspecting reader who decides to click "Edit" for the first time is hit with a blizzard of inscrutable infobox templates and other markup. How article sizes are shown is the least of their problems ... Wasted Time R ( talk) 11:47, 8 July 2012 (UTC)
I did not say featured articles were full of trivia. I said the main problem about size was elsewhere and we really needed a simple size rule for where people were dumping huge amounts of trivia into Wikipedia. Very often they stick it into big tables which come out as zero size by prose text.
What I said about featured articles is that they should consider size as being a negative factor long before they get to the 100K. There was a justification for using prose size above on the basis that when they got to the limit with a wikitext limit they would be over it if they added a citation. There was no thought that if they have got to such a situation splitting should have been considered a long time earlier and an argument should have been given why more than 50k prose text without splitting was good.
Now you come up with this argument that one should cram as much as possible into a page because they often don't click down to subtopics. What on earth makes you think they read all the top level article anyway? Do you really think they are going to be sitting there for more than an hour reading the business? All that is happening is they get pages slower and Wikipedia wastes resources sending out stuff that people don't look at. Putting in more simply obscures the important bits.
If pages were smaller and loaded faster then they users would click more. Have a look at [6] for instance about what happens as page load time goes up. As it says even a 1 second delay decreases customer satisfaction by 16% and 40% abandon a website if it takes more than 3 seconds to load. Think gnat about attention span rather than cramming stuff in. Dmcq ( talk) 15:56, 8 July 2012 (UTC)
I agree with you that we don't know what happens when readers come to long articles. We don't know how often they just read the lead section and then leave, or how often they look at the table of contents and jump to a section that they are especially interested in, or how often they jump around different sections reading some and skimming others, or how often they read the whole thing through start to finish, or how often they read some now and come back to the article at some later time. We also don't know how often they get frustrated with the load time and abandon the whole thing at the outset. I'd love to see a usage study that shed some light on all this. But one thing that we do know is how often they click through to certain kinds of drill-down subarticles, and it's 1 percent or less of the time. Given that, authors will take their chances with longer articles. As for your belief that if page loads are faster in general, people will click through more, I haven't seen that myself. For example, Samuel Taylor Coleridge is 27 kB readable / 47kB markup / 152 kB load size and loads very quickly; it had 41,491 views last month. But Early life of Samuel Taylor Coleridge had only 362 page views, again a 100-to-1 type ratio. Wasted Time R ( talk) 23:16, 8 July 2012 (UTC)
Perhaps people aren't as interested in Coleridge's early life as that of Obama or Paul McCarty? Anyway there is quite a bit about his early life in the article instead of it just being a couple of paragraphs like the lead of the subtopic - perhaps they read enough there if they were interested? Another way of interpreting it also is that a large section about early life is always downloaded which 99 out of 100 readers aren't interested in. However what I do know is that people will go round a site much more if its response is quick and tat they don't like having a lot of stuff they're not interested in. How many people who read about Coleridge are really interested in his life rather than his poetry? Not a very high percentage is my guess and yet most of that article is about that. In fact I wonder how many read beyond the lead. Dmcq ( talk) 00:21, 9 July 2012 (UTC)

Note that the example given in a previous thread, Wikipedia talk:Article size#Images as part of the total download of an article to a browser, is 38MB (!!) worth of page load because of the hundreds of thumbnail images. Pages like that must be cut down by taking away the images or by splitting. Our guideline must recommend an upper limit for that kind of silliness. 1MB seems reasonable. Binksternet ( talk) 01:19, 9 July 2012 (UTC)

Yes there definitely does need to be some guidance about image size, I think it should be a separate section from the text rule of thumb though and that getting a proper handle on the text size would ameliorate much of the difficulty. Even now if you look at List of American Civil War Generals (Confederate) which was the article in question if you apply the prosesize script it says it only has 3135 bytes! That shows how useless the prose text size is for general use. The wikitext occupies 247kB and the file size 385 kB. This is an example where the file size to wikitext size is quite reasonable - I don't know why the ratio normally tends to be about 4 to 1. I still think it is too long but that ratio does make a case for also having a section on file size as an additional measure which could be used in special pleading to say a page isn't too long when the rule of thumb indicates it is, this is the same sort of status I'd give to prose text size. Dmcq ( talk) 08:35, 9 July 2012 (UTC)
I'm in favor of three plainly stated size limits: readable prose (requires a tool), HTML markup (seen at a glance), and page load (with image thumbnails; needs a new tool). Binksternet ( talk) 13:05, 9 July 2012 (UTC)
Actually it is wikitext that is seen at a glance, it gets expanded by templates and html tags inserted to something typically about four times the size though it may be a less than twice or can be more if lots of templates are used, that doesn't matter too much I don't think in this context though. I would typify the sizes as being good for
Page load with thumbnails (probably should not include cached javascript and css) for the total time and money overhead before a page is fully loaded. This would mainly be use din arguments that there were too many thumbnails or they were too large. We'd need good guidelines about that - currently I don't know any and WP:PERF just says not to worry about that, and WP:IMPROVING hasn't a thought in the world about it but does say ' The most important point is that now it is highly encouraged to talk, learn, and worry about performance issues before an article becomes a nightmare for admins to rescue.' which completely contradicts the other essay.
wikitext (possibly including transcluded pages which aren't templates) as a simple guide for saying pages are getting too long when dealing with problems like sports statistics. An alternative is if we can get the html file size for a standard environment easily.
prose text size for the featured article editors as an indicator of readability. If you can phrase that better please do.
Unfortunately most of the editors here seem to be arguing for what I would consider the least useful measure for general purposes as being the main rule of thumb measure. I just hope that doesn't become general knowledge or it will be exploited to death, we better get some other guidelines in before that happens. Dmcq ( talk) 16:10, 10 July 2012 (UTC)
Just use the 'Page length (in bytes)' as reported from the Page Information link on every page. Anything else is too obscure or complicated. Kaldari ( talk) 06:09, 7 December 2012 (UTC)

Measure for measure

Seems to me all three major ways of measuring size are relevant to different users. The majority of users are presumably readers who found their way via Web search and are ignorant of the article's topic. For those who use a desktop or large laptop screen and fast wired connection, quick comprehensibility is the main design consideration, which makes readable prose size the proper measure. For those using their mobile phone screen as I often do when merely reading, or a small tablet, small prose size is even more important for avoiding getting lost in an article with sections either too large or too numerous, and download size also becomes important for those of us with slow mobile radio connections.

Editors ignorant of the fine points discussed here, who will remain the majority of editors for an indefinite period, only know markup size, because that's what's in the watchlist entry. Those who edit on a small screen or slow connection are again even more interested in markup size. And of course many readers who use small mobile screens, including me often, will be reading one or another of the "Mobile Web" versions and sometimes the official Wikipedia mobile app or an unofficial one, most of which will present pictures with a smaller thumbnail than the "Desktop version" that the majority of deskbound readers use. So, yeah, all these methods of setting limits ought to be taken into consideration, but markup size is the only one the majority of editors will use until the others are as easily reported as that one. Jim.henderson ( talk) 03:13, 19 July 2012 (UTC)

I'd certainly like an easy way to measure the size of the images as well as he markup size. However I think the prose text size is a red herring herring as the size here is way beyond what is given in the readability issues, the section saying size is no reason to chop things out is also a misleading one. The prose text also gives no guideline on lists or tables or citation notes which are mini articles in themselves. The place seems to be dominated by people with an interest in cramming as much in as they can and can't or won't see the problems they cause. I view the guideline as unfit for any purpose at the moment whether it be for featured article assessment or for stopping people sticking in huge lists of wrestling bout results and every action of every monster in dungeons and dragons Dmcq ( talk) 08:22, 19 July 2012 (UTC)
Propose changes right now since you are addressing issues with "Content removal" section. -- George Ho ( talk) 14:25, 19 July 2012 (UTC)
I would have something like 'Size is not of itself a reason to remove content. List articles may be split arbitrarily. If a non-list article becomes very long without being able to be shortened by splitting off notable subtopics that is a strong indication that trivial details have been included. Consensus should be shown that excessive trivia are not included if an article is grown beyond the normal article size guidelines. It is normally easier not to split off subtopics whilst actively developing the basic structure of an article.' Not marvellous but perhaps a start for discussion. Dmcq ( talk) 16:00, 19 July 2012 (UTC)
Working off that, I think we should change "List articles may be split arbitrarily." to "List articles may be split arbitrarily, although if a logical split is viable that is preferred." I also think we should follow on from the splitting sentence at the end with "If an article is developed but too long, consider what information is more helpful to give an overview to the reader, in line with WP:Summary style, and move excess information to articles devoted to specific subtopics." This should hopefully encourage the shifting of content, rather than its simple removal. CMD ( talk) 16:44, 19 July 2012 (UTC)

Originally the only measure for size was the byte count. The words "readable prose" were introduced in 2004 to point out that tables, lists, and markup were not to be included, but did not adjust the suggested counts that involved. [7] Prior to that it was clear that the only count that was used was the byte count - see [8] and [9] to see that this article is 15 kb (that was before you could just click history to find out the current size. In fact the suggested sizes have increased, not decreased, while using a measure that gives a smaller size. This has compounded the problem of pages being too long. Apteva ( talk) 20:06, 19 September 2012 (UTC)

"Content removal" section

This section is now disputed because changes have been proposed and because information of subtopic in article dedicated to main topic may be either decent or excessive. To establish a straw poll, you can create a subheading below with a touch of RFC tag. -- George Ho ( talk) 16:52, 19 July 2012 (UTC)

RfCs aren't meant to be used as straw polls. Also, rather than creating a new section, perhaps we could keep the discussion in the section above? CMD ( talk) 17:31, 19 July 2012 (UTC)
Haven't you read the OP of above section? We can't make the above section to be another many things in one or change the subject. -- George Ho ( talk) 17:40, 19 July 2012 (UTC)
The above section is five posts long. It's hardly committed to a specific cause. CMD ( talk) 17:47, 19 July 2012 (UTC)
My impression was that summarizing a section would be a preferable way to deal with a long article, especially if splitting a section out would violate WP:what Wikipedia is not (or if that section is so unreferenced as to fail the notability guideline). Depending on who you ask, such "summarizing" would be seen as "removal". But regardless, I don't think this section is accurate, and I think it needs to be rephrased or removed. Shooterwalker ( talk) 03:20, 3 August 2012 (UTC)
Correct. It is not very well worded. In that context summarizing is removing. WP uses summary style, and expects that long sections will be split into sub articles leaving a summary paragraph. The section before this one, "Splitting an article" deals with that subject. The section in question deals with simply cutting out words to make an article shorter. I also fail to see the need for putting the template on the article. Just fix the wording and discuss it. This is not a major dispute that no one can agree on, which is what that template connotes. Apteva ( talk) 22:17, 19 September 2012 (UTC)

Origin of Marimba

Marimba is a musical instrument made and played by the Lozi people of the western province of Zambia — Preceding unsigned comment added by 101.119.24.76 ( talk) 01:35, 11 August 2012 (UTC)

Template:Size

Am I using Template:Size correctly, on the massively oversized List of historic places in Quebec? Seeing the equally massive logo it places atop, I'm unsure. Should this go on the Talk page? The template documentation is unclear, at least, to me. Shawn in Montreal ( talk) 15:42, 23 August 2012 (UTC)

Talk page, yes. Article, NO!! As for the template itself, documentation needs better, consistent explanation. -- George Ho ( talk) 16:34, 23 August 2012 (UTC)
Hmm. When I move the template to the Article talk page (which is otherwise empty) it reads zero bytes. So it's not registering the size the article, at all. I guess I shouldn't use the thing at all? Shawn in Montreal ( talk) 17:21, 23 August 2012 (UTC)
After much digging... the template you want is: Template:Very long (or one of the SeeAlso or Splitting templates listed in its docs).
We should probably list that template (and possibly some of the related templates) in this policy page? -- Quiddity ( talk) 21:05, 23 August 2012 (UTC)
Tangentially: These templates seem to be related, but are currently unused: {{ Size}}, {{ Pages}}, {{ Pages-size}}, {{ PageInfo}} - I'm not sure if we should merge them all into one, or delete them all, or what? -- Quiddity ( talk) 21:05, 23 August 2012 (UTC)
The first thing to do would be to fix {{ Size}} so that it works properly. It has a parameter that is supposed to select either a big (70px) or a small (35px) exclamation mark that is not working so you get the full 323 px.
{{#ifexpr:   <!---1---> {{PAGESIZE:{{FULLPAGENAME}}|R}} >= 102400 
|<!---2---> 
[[File:Ui Yellowexclamation.png| {{#ifeq: {{{big|no}}} | yes | 70px |35px}} 
link= Template:Longish]]
Apteva ( talk) 20:51, 19 September 2012 (UTC)
From Wikipedia, the free encyclopedia
Archive 1 Archive 3 Archive 4 Archive 5 Archive 6

Move

Wikipedia:Article sizeWikipedia:Manual of Style (article size) — Consolidating naming per Wikipedia_talk:Manual_of_Style#Poll Gnevin ( talk) 16:28, 24 May 2010 (UTC)

Fine by me. -- Eraserhead1 < talk> 17:30, 24 May 2010 (UTC)
Oppose: Per, the descriptions in our List of guidelines, this strikes me much more as an editing guideline ("non-content advice about categorization, navigation or other how-to-edit advice") than a style guideline ("advice on writing style, formatting, grammar, and more").— DCGeist ( talk) 22:56, 25 May 2010 (UTC)
Remove from the MOS? Gnevin ( talk) 12:33, 26 May 2010 (UTC)
Rebranded as editing guideline, its natural category.— DCGeist ( talk) 18:53, 26 May 2010 (UTC)
Yeah, this is definitely not a style page. — SMcCandlish Talk⇒ ʕ(Õلō Contribs. 09:08, 17 September 2010 (UTC)

Time to revisit the technical problems argument, advise against splitting most long list articles

It's getting toward 2011 now, and I'm feeling more and more that we need to revisit the technical side of this. At some point WP has to stop catering to broken, obsolete technology like browsers from the 1990s and early 2000s. While "reader fatigue" is a real issue, and WP:SUMMARY provides a way to solve that problem, not all articles are intended to be read from top to bottom, and will become user-unfriendly and downright editor-hateful if split into multiple pages. I have anecdotal but to me rather strong evidence that the technical aspects are essentially an obsolete – a long and very, very linked-to glossary list article, has reported absolutely zero length-related problems in over 4.5 years.

The most obvious cases are list articles of various sorts, including glossaries. People (other than really bored people with way too much time on their hands) do not usually try to read such articles from top to bottom; they load the page and search for the term they are interested in, if a #-link didn't bring them directly to it from another page. Splitting such pages makes in-page search more difficult, and actually frustrates readers' ability to find information. E.g. if I search for "foo" in a long one-page list, I may find a "foo" entry, and/or various mentions of "foo" as applied in several contexts, in various other entries, while if the article is split, I might not find a "foo" entry because it's in another page, and assume there's no information on the topic, and/or I may miss a lot of contextual information about "foo" because I have not realized there is more of it in another article in the split series).

I have put off converting Glossary of cue sports terms, one of the articles I have worked on the most, into a split article for some time (it's been tagged with {{ Longish}} for 2.5+ years), because I have yet to see one single case of someone's browser accidentally truncating the page, a user reporting a crash or other technical problem, or any reader suggesting that the document is too long for simple human usability reasons. This despite the article being over 240K, being linked to (usually many times at different entries) in almost all cue sports articles, and being edited nearly two-thousand times, by registered and anon users from all over the world, with greatly varying levels of technological currency/obsolescence. I've also resisted splitting because {{ Cuegloss}} would have to be redone in a very complex way that would so complicate use that most editors who bother to use it now to create helpful glossary links for non-billiards-expert users would surely abandon it. I can no longer see any good reason for (and do see several good reasons against) splitting this or any similar article, even though until recently I have long been tinkering with test code for splitting the article and adapting the templates that work with it, to comply with this guideline's article length advice.

At any rate, if after 1900+ edits by hundreds of users the page has never been truncated by a browser that can't handle long textedit fields, this strongly suggests that the truncation concern is no longer a valid one in anything near significant numbers of incidents; such browsers are today so rare that the odds of it happening are now so low that it need not be even mentioned here, and if it does happen, it will be obvious and someone will fix it.

I propose that a partial rewrite is also in order to strongly suggest that most types of list articles remain unsplit, either regardless of length, or unless longer than X where this variable is some number we arrive at that is very much more than the current number, like maybe 1MB. Lists that are easily divided into clear sub-topical sections each with numerous entries could be given as a clear exception, something that perhaps should be split after 100K or so, such as events relating to some topic in the 1700s, 1800s, 1900s, or vehicles manufactured by Ford, BMW, Toyota, etc.). But most list articles, including glossaries, are not divisible logically this way, only arbitrarily, and WP:SUMMARY cannot logically apply to them. They are not intended for start-to-end reading, but for in-page searching. Meanwhile, splitting them not only greatly impedes such searching, it makes creation and use of tools that work with such articles (e.g. Template:Cuegloss) much more difficult.

SMcCandlish Talk⇒ ʕ(Õلō Contribs. 10:03, 17 September 2010 (UTC)

The only thing I would add to the above excellent summary is that this particular article is high use. It get approximately 600 views per day. Guessing that about 80% of those views are not repeats, we're talking about 175,000 different people viewing the article per year without a single length complaint (600 x .8 = 480 x 365).-- Fuhghettaboutit ( talk) 12:08, 17 September 2010 (UTC)
Cool. I didn't know about that stats tool, or had forgotten about it. — SMcCandlish Talk⇒ ʕ(Õلō Contribs. 18:05, 17 September 2010 (UTC)
I agree and would cite the additional problem of repeating footnotes in long lists. The large (190kb) List of islands of Maine features some footnotes repeated 5 times, 10 times, in one case close to 40 times. The only feasible way to split up that table would be arbitrarily by letters of the alphabet, ie. Maine Islands from A-G, H-Q, R-Z etc. But that would leave the footnotes a hopeless tangle on separate pages. So I am going to modify the guidance on the main page to suggests lists be broken into separate pages only when the organizational logic of the list suggests it. ElijahBosley (talk ☞) 16:40, 21 November 2010 (UTC)
Sounds good. -- Eraserhead1 < talk> 16:48, 21 November 2010 (UTC)
I removed this sentence in follow-up to this thread. — W F C— 11:02, 27 February 2011 (UTC)

"Article Length" and "Long article"

I keep these two as redirects to "Wikipedia talk:Article size" because I had significant trouble finding the article in the first place.-- Jax 0677 ( talk) 12:15, 9 January 2011 (UTC)

"hit preview to see the page size warning"?

I tried following the instructions at Wikipedia:Article size#Measuring "readable prose" size with the Time travel article, but I didn't see any "page size warning" in the preview. Did I miss it, or do you only get such a warning if the size is above a certain limit? Either way I think the specifics of where and when such a warning appears should be mentioned in this section. Hypnosifl ( talk) 04:33, 3 March 2011 (UTC)

The page size notice was removed a little while ago. I have updated the page. SilkTork * YES! 11:15, 6 May 2011 (UTC)

Exceptions: Lists, Tables

The guide appears to be referring to standalone lists - "Lists, tables, and articles summarizing certain fields are exceptions." Though, surely, the advice relating to not splitting certain lists would also apply to embedded lists. If a list is constructed in such a way that splitting it or summarising it would be inappropriate, then it doesn't really matter if the list is a standalone or is embedded, if it shouldn't be split, then it shouldn't be split.

What does "articles summarizing certain fields" refer to?

Possible new wording: Lists, tables, and material summarizing certain fields are exceptions. If there is no "natural" way to split long lists or tables, it may be best to leave them intact. They act as summaries and starting points and in the case of some broad subjects or lists either do not have a natural division point or are more easily word-searched as a single set. This is especially the case when buttressing cites are repeated throughout the list or table. In such cases, the list or table should nonetheless be kept as short as feasible.

Does the paragraph regarding "Major subsections..." belong in the Exceptions: Lists, Tables section? SilkTork * Tea time 18:49, 26 June 2011 (UTC)

"If a list is constructed in such a way that splitting it or summarising it would be inappropriate, then it doesn't really matter if the list is a standalone or is embedded," -- I have not yet seen a list that could not be summarised and embedded lists can always be "split". In such a case "splitting" just wouldn't involve breaking the embedded list into parts, but scooping it out of the article and creating a stand-alone list with the embedded list as its core.

As far as your suggestion goes, I don't have an opinion yet. I find this to be one of the most puzzling guidelines on Wikipedia, particularly since it's been labeled an "editing guideline". Good raise 08:29, 27 June 2011 (UTC)

I'm working on the guideline now, seeing if it can be made clearer. I take your point that lists can be summarised. I think the Exceptions section is pointing to certain material than cannot easily be summarised or split per WP:Summary style. I understand the thinking that a summary is already condensed, but as professional writers, journalists, students and teachers know, even summarised material can be reduced. Doing a précis was once a standard English language task; when doing professional writing and journalism one works to word lengths, rather than what the writer feels the topic "needs" (and much writing tends to benefit from cutting back to the bare essentials - encyclopedic writing benefits more than most); and students who can précis their notes down into short manageable bites find that of immense benefit. I think there is a balance to be struck between being excessive and being elliptical - though this would apply to all forms of writing within Wikipedia articles, not just summaries or lists.
I think the main point of this guideline is that articles should not be too long that they overwhelm the reader, so when material becomes too long or detailed, it should be split out into a sub-article per WP:Summary style, and that process can carry on for as long as there is useful and notable information. So we have The Rolling Stones with a section on Band members which splits off into Mick Jagger which has a section on Albums which splits off into Primitive Cool which has a track listing which splits off into Let's Work (Mick Jagger song).
An informed decision needs to be made as to what information should be in a parent article, and which is better contained within a sub article. I feel this guideline should be helping editors to decide when to summarise and split, how to summarise and split, and what to summarise and split. In a sense, as well being stand alone advice on article size, it also stands between Wikipedia:Layout and Wikipedia:Summary style, as it includes elements of both.
I'm wondering not only why lists are exceptions as regards summarising, but also why they are considered not to be part of the "readable prose". Readers will read and study lists if they contain important information. If a list contains information that is considered not to be part of the essential reading in the article, then one would question why the list is there in the first place. Lists should not be purely decorative, and when a list deals in excessive statistics it rubs up against WP:NOTSTATS. SilkTork * Tea time 10:06, 27 June 2011 (UTC)
I suspect lists (or rather tables) are not considered as "readable prose" to simplify measuring of "article size". Perhaps the original author reckoned most lists would require an inconsiderable amount of time to study compared to the length of article source text necessary to generate them. As for what this guideline ought to do... While I appreciate the value of an editing guideline helping editors (newer ones in particular) making the kind of decisions you name, I'd just as much like to have a style guideline describing where community wide consensus lies in regards to how long or short articles should be. Presently this page tries to be both and is neither very well. Good raise 12:56, 27 June 2011 (UTC)

Size no longer viewable in history?

The "How to find articles by size" section says "You can find the size of a page including the markup in kilobytes [kb] from the page history". I have tried this for several long articles (examples [1] [2]) and I do not see anything on the history pages indicating article size. Am I just missing it, or was this feature removed? Does one have to use one of the "external tools" to find an article's size now? Should this guideline be updated? -- IllaZilla ( talk) 17:31, 1 September 2011 (UTC)

They look fine to me; the first one shows "10:52, August 29, 2011 IllaZilla (talk | contribs | block) (107,104 bytes)" which is definitely showing the size. Wizardman Operation Big Bear 18:48, 1 September 2011 (UTC)
D'oh! I must be blind...I didn't think to look down the revisions list, I was looking for something along the top or the left, like when you used to open the edit window and it would say "this article is xx kb" up top. However, it does seem that the size is now displayed in bytes rather than kb, so perhaps a minor change in the wording is needed. -- IllaZilla ( talk) 18:59, 1 September 2011 (UTC)

Depreciating the WP:SIZERULE shortcut

On 6 September I introduced the WP:SIZEGUIDE shortcut as a replacement for WP:SIZERULE, on the grounds that the criteria are not a hard-and-fast rule, and because "readable prose size" is all too often mistaken for "article size". It was reverted on 19 September without a refutal of my reason for doing so. Per WP:BRD, I am starting this discussion to see whether that was one editor, or whether there is a wider consensus to continue referring to it as a "rule". — WFC— 09:51, 20 September 2011 (UTC)

I agree with your change, it's definitely just a guideline (and one that many of WP's best articles ignore, as a look at User:Dr pda/Featured article statistics will reveal) not a rule. Wasted Time R ( talk) 11:26, 21 September 2011 (UTC)
While it may be just a guideline, the section is titled "A rule of thumb", not "A guide of thumb", which is why WP:SIZERULE was used. It really doesn't matter what we use since it's pretty obvious to anyone that it's not a rule and WP:SIZERULE is not likely to be deleted. -- AussieLegend ( talk) 14:02, 17 July 2012 (UTC)

Unsplitting

Over discussions over the articles iOS version history, Android version history, and History of iOS jailbreaking, I've noticed this guideline been trumped out as a reason that, despite failing policy (in the former two, flagrantly and inherently violating WP:NOT#DIRECTORY, in the latter, systemic violations of WP:V throughout the parent article leading to blind application of the guideline), the articles should remain separate if split due to SIZE, which leads to an interesting question: if split articles fall below 32-40KB of readable prose, should they be merged back? And on the point of the latter article, should cleanup be recommended before splitting? Sceptre ( talk) 23:38, 22 October 2011 (UTC)

Edit request

Include redirect WP:AS in "Shortcuts" box. 71.146.20.62 ( talk) 03:44, 28 November 2011 (UTC)

Images as part of the total download of an article to a browser

Images have been discussed here before with regard to the total size of an article. Even though an image may only have a few tens of characters of text in the edit box, the thumbnail of that image will require a few 100 kb of download bandwidth. The TOOLONG guideline should give some indication of the upper limit of acceptable image use. For instance, this version of the article List of American Civil War Generals (Confederate) contains 237 thumbnail images, each requiring about 162 kb in my 1024x768 browser window. This makes for a very unwieldy article of more than 38 Mb! Let's add a paragraph about the TOOLONG problems associated with too much bandwidth taken by images. Binksternet ( talk) 20:25, 31 March 2012 (UTC)

Is there any feature to automatically split off an article into pages?

Apologies if this is the wrong venue to ask such a question, if somewhere in meta might be better to ask, but, sometimes, for very long pages, like the RfC for Mohammad images, or some old archives, it significantly slows down my browser when viewing them. If only there was some kind of feature that could allow me to set it so that when viewing pages or page histories, it would automatically break off the page at a user-specified amount (like, break off to "page 2" or "page 3" etc. if the next section makes the page exceed 200kb). This is especially a problem in very old archive pages where it would be inconvenient to break apart a page despite being long. Does such a feature exist?-- New questions? 18:21, 10 April 2012 (UTC)

No. If you create a book or download as pdf you convert it into a pdf that you can view a page at a time, but you still have to download the whole article. If you are looking at old archives they can be split into smaller archives. Commonly archives are split somewhere between 60k and 150k bytes. This is the talk page, though, not the article. Wikipedia:Requests for comment/Muhammad images is a single closed discussion that goes on for a whopping 933,207 bytes, and can certainly be split into sections. Normally RfC's are maintained intact while they are open, but for logistic reasons when they get over 200k I would argue for putting them into subsections. Apteva ( talk) 21:29, 19 September 2012 (UTC)
Please see Wikipedia:Requests for comment/Muhammad images/Intro. That discussion has been split up into sections for anyone who wishes to read it more easily. Apteva ( talk) 22:05, 22 September 2012 (UTC)

Mobile

I notice the article does much to address questions concerning antique (turn of the century) computers and browsers, but doesn't mention something new, namely Help:Mobile access. I do nearly all my editing by five years old hardware with a nice big screen and DSL connection, but much of my reading is away from home, on my palm sized smartphone or my hand sized Android tablet. These automatically go to the .m. mobile page which shows only the lead and the top section titles until I tap the title.

Alas, such accommodation becomes inadequate when the article is long. Either the list of sections is too long, or it inadequately guides me to the desired information, or each section upon opening overwhelms my ability to understand it on the little screen, or all the above. And where there's no Wi-Fi and 3g coverage is poor, it takes a long time to load the page. Surely I'm not unusual among readers in facing these problems, and the number of affected users will only increase with the popularity of smartphones and smaller tablet computers (even with the relatively large iPad it's somewhat a problem). Do we need a new section? Jim.henderson ( talk) 17:42, 27 April 2012 (UTC)

Discussion about split of large articles at an arbitrary point

I have raised a query at Wikipedia:Village_pump_(policy)#Splitting_articles_arbitrarily about the bit in WP:SIZE#Very large articles where it says very large articles may be split arbitrarily. I think this is okay for very large lists but not articles and see no sense in this end run round notability. I believe is an article is large enough to require splitting there will always be subtopics which satisfy notability. Dmcq ( talk) 00:52, 24 May 2012 (UTC)

I think arbitrarily splitting mainspace article should be avoided at all costs. I agree that there are subtopics that can satisfy satisfy notability (but not "always"). Is there a page that this is in relation to? -- Alan Liefting ( talk - contribs) 01:02, 24 May 2012 (UTC)
The guideline at Wikipedia:Article_size#Very_long_articles needs rewording:
If possible, such (V)ery large articles should be split . If possible, split the content into logically separate articles. If necessary, split the article arbitrarily. Avoid arbitrarily splitting mainspace articles unless there is a demonstrated technical problem loading the page on at least one major browser. If you do split an article arbitrarily, be careful to link the resulting parts to each other. For non-mainspace articles, consider splitting off the top and bottom parts of the article and transcluding them into the split parts.
Probably should add something about summary style as well -- Alan Liefting ( talk - contribs) 01:11, 24 May 2012 (UTC)
I've asked for some example at VPP where this arbitrary split business makes sense other than a stand alone list. Still waiting but there's a lot of theoretical waffling. I cAn't see the point without having a clear need, WP:IINFO about indiscriminate information covers anything else I think. Dmcq ( talk) 02:14, 24 May 2012 (UTC)

The first issue here is that Dmcq is arguing against 5 years of consensus on the verbatim phrasing "If necessary, split the article arbitrarily", and that this text should be preserved on that ground alone. Many many editors have read that sentence and understood that "arbitrarily" means, not "randomly" or "at an arbitrary point", but basically "according to some clear local-consensus method". Therefore there is no need to change the wording now.

The second issue is that Dmcq has affirmed the POV that the many many subarticles split according to some clear local-consensus method are actually "notable", such as Later life of Isaac Newton (arbitrarily starting 1693), House and Senate career of John McCain, until 2000 (arbitrarily 1981-2000), and Cultural impact of the Guitar Hero series (arbitrary subset of notable topic). If more examples are needed I can oblige. Another POV is that these are not notable but widely accepted because they are spinouts. Given Dmcq's POV, if the word "arbitrarily" is understood as it has been for 5 years, there is no reason to change it, because the articles will by the POV's definition be "notable"; and given the alternate (stricter) POV of notability, the word "arbitrarily" is very necessary to permit articles that fail this strict N standard.

Now, as to Dmcq's Alan's wording (assuming we remove the stray "them" in the last line), the last sentence seems to be chopped up for no reason; the rest assumes that it is always possible to split the content into logically separate articles. Dmcq has rejected all counterexamples, but they at least prove that it is at least sometimes colorable that the content is not split logically and notably. I don't think this assumption necessary in case there should arise a consensus that there is no notable way to split a very long article.

Further, Dmcq has opened the same discussion on two talk pages for some reason; I have invited the VPP to centralize here.

I think the whole problem is that Dmcq is reading "arbitrarily" as "at an arbitrary point", which is a novel or original reading of the guideline. There is no evidence the guideline needs adjustment. JJB 02:58, 24 May 2012 (UTC)

Please discuss at VPP where this was raised as a centralized issue rather than at one of the separate guidelines and where people were directed . Dmcq ( talk) 03:02, 24 May 2012 (UTC)
By the way that was not my wording above, and I don't agree with the new wording anyway. Firstly though one should decide on the central issue of notability of the split off articles which is the discussion at VPP. Dmcq ( talk) 03:05, 24 May 2012 (UTC)
I have yet to wade my way through the discussion at VPP but any discussion of this guideline should be made on this page. The notability of a split out article is covered elsewhere and obviously a split of a larger article would be into a sibling topic or topics that have notability as a standalone article. Is there a wording that you can suggest. -- Alan Liefting ( talk - contribs) 03:50, 24 May 2012 (UTC)
Can someone more knowledgeable about venue just be bold, make a good case of where it should be (audience, draw, etc), and copy/paste the conversation over? At this point it matters more it's not in different places than where it ends up. We can always change venue later. Thanks. Agent00f ( talk) 04:35, 24 May 2012 (UTC)
We are discussing this guideline so therefore this is the correct venue. I will slap an {rfc} on it. -- Alan Liefting ( talk - contribs) 05:03, 24 May 2012 (UTC)
JJB, longevity of a guideline is not a reason to keep it. Also, the guideline should use the commonly accepted use of the word " arbitrary". It prevents confusion. -- Alan Liefting ( talk - contribs) 03:50, 24 May 2012 (UTC)

Alan, sorry I didn't realize that was your wording. Longevity of a guideline is a silent consensus that it works for many editors. Yes, the word "arbitrarily" could be clarified based on your concerns, but the assumption that it's always possible to split logically should not be suddenly added to the text. Possible proposed change: For instance, the meaning of the sentence could be clarified as, "If this is not possible, split the article according to local consensus." But the rest should stand for the reasons above.

Dmcq, the discussion I previously linked shows that there is no supermajority consensus on notability of spinoffs, so this will not be decided at VPP today. The fact is that we have many nonnotable spinoffs, and they often survive AFD (or more commonly are never nommed). The stated rationales vary (one reason for unclear consensus): sometimes it's SNG shoehorning, sometimes it's a relatively loose N affirmation like your own, sometimes it's recognized as a spinoff, sometimes it's recognized as a pointy nom that would imbalance a set, sometimes it's a merge that affirms the spinoff principle. Since you seem to define "notable" as including most of the adhoc local-consensus splits as well as allowing the various nonnotable large-list splits (although that would include the list of poker events), I really don't know that there is an issue for VPP besides your finding the word "arbitrarily" to be ambiguous. JJB 04:36, 24 May 2012 (UTC)

The vernacular def of arbitrary is vague, and people often conflate it with random. Probably best to clarify that it's actually based on domain knowledge first, then gradually moving to less desirable methods to if it's not possible. That's probably how it tends to work in practice, but given how playing semantics is popular, better be clear than some big fight over just how random the process should be. Agent00f ( talk) 04:51, 24 May 2012 (UTC)
The whole issue needs to be revisited anyway. The entire rationale for arbitrary splits was that some artices were too long to be edited in certain browsers. They are now ridiculously obsolete, and no longer pose a technical problem worth mentioning. WP:SUMMARY is all we need any longer. When normal-prose articles (i.e. non-list, narrative articles intended to be read from start to finish) become unwieldy, SUMMARY provides a clear roadmap for how to split them up for better reader usability. For long lists, e.g. glossary articles, no one is going to sit and read them from top to bottom; their principal modes of use are a) being in-page-searched for specific entries and b) having specific entries linked to from other articles; splitting them destroys the very foundation of their functions in Wikipedia. At least one consensus discussion has already concluded that we should stop splitting such articles. — SMcCandlish   Talk⇒ ɖ∘¿¤þ   Contrib. 02:42, 29 May 2012 (UTC)
Shall we just delete the complete section headed "Very long articles"? The following section headed "Web browsers which have problems with long articles' is sufficient to cover any possible occurrences of browsers having issues with long articles. -- Alan Liefting ( talk - contribs) 03:43, 29 May 2012 (UTC)
Looking through it I can't see anything in the whole 'Technical issues' section which I feel needs to be kept. 400k sounds far larger than the 100k in the almost certainly should split bit even if it was a big issue nowadays. An article has already become a too big long before that is reached. Arbitrary split is I think only for lists and that is covered elsewhere. No-one is going to try splitting normal articles into anything except logical sections and the summary style guideline says about doing that. Dmcq ( talk) 09:35, 29 May 2012 (UTC)
I think the continuing value of "arbitrarily" is in pointing us to not arguing about which split is most logical. Maybe it could be replaced with an indicator saying something like past semilogical splits that look arbitrary need not be rewritten, and as long as new splits are semilogical they're fine. Recognize that some of the section is dated but some needs preserving. JJB 16:13, 29 May 2012 (UTC)

There may or may not be a technical reasons for splitting an article (not all access is via machines with large memories (virtual or otherwise), but there is definitely an economic and other reasons for doing so. Many people have to pay for every byte they download (either because they live in a country where the Telecoms use that model for charging for broadband access), or because their mobile service provider charges that way for mobile hand held devices (phones, tablets etc). There is also the case when a person accessing the net is connected via a free wifi service with a data limit on downloading (eg at a library). -- PBS ( talk) 11:36, 31 May 2012 (UTC)

I've never heard of splitting an article and don't see any examples where it was done. However, the proposed language does make sense to avoid unnecessarily contentious discussions and messy articles. CarolMooreDC 16:47, 7 June 2012 (UTC)
Basically it just means splitting out big subsections like the later life of Newton when he concentrated on alchemy or the early life of Mitt Romney before he became a presidential candidate. Dmcq ( talk) 22:53, 7 June 2012 (UTC)

I'll try just deleting the whole technical issues section. The 400k limit in it is far larger than the recommended size limits anyway. We could add a bit in the size section about larger sizes causing problems with slow connections as well as causing readability problems but that's about it I think. Dmcq ( talk) 22:53, 7 June 2012 (UTC)

I've kept bits about download speed and size for mobile phones and probles with slow connection speed. Solutions are left to the splitting section Dmcq ( talk) 23:04, 7 June 2012 (UTC)

We could bring it back, replacing the tersely ambiguous "arbitrary" with something clumsier and more precise such as "regardless of other considerations" and the overly precise "400K" with something longer and vaguer such as "hunreds of kilobytes". Jim.henderson ( talk) 13:40, 10 June 2012 (UTC)

What is 'it' exactly and why? What would it convey that you think is missing or needs to be said? Dmcq ( talk) 16:37, 10 June 2012 (UTC)
The stuff about defunct browsers has been added back. Does anyone see a point in having historical information in this guideline? Also anyone know what what is a 'non-mainspace article'? Dmcq ( talk) 20:00, 11 June 2012 (UTC)

RfC: Should the summary style guideline quote WP:Notability and if so in what place

You are invited to join the discussion at Wikipedia talk:Summary style#RfC: Should the summary style guideline quote WP:Notability and if so in what place.

This RfC is to decide the specific changes discussed at in Wikipedia:VPP#Splitting_articles_arbitrarily. This may affect the notability of subarticles and is related to the RfC above. Dmcq ( talk) 19:09, 1 June 2012 (UTC)

"The Biggest Loser South Africa" article

I would like to start a discussion about how to split the Biggest Loser South Africa article. In my opinion, it should either be a minimum number of tables, or should be split into several smaller articles, as 600 kB is ridiculous. Thoughts?-- Jax 0677 ( talk) 18:51, 10 June 2012 (UTC)

I think there's plenty of scope for irony. Unfortunately, I can't understand how the page is laid out, so I don't know how it should be divided. I would however lay down a bet on the fact that a huge amount of that information is just repeating trivia from the show and doesn't belong in an encyclopaedia anyway. CMD ( talk) 19:04, 10 June 2012 (UTC)
Well okay lets ignore for the moment that it all seems to pretty much go against WP:NOTREPOSITORY, WP:NOTEVERYTHING, WP:NOTSTATSBOOK and WP:UNDUE. Where did all that data come from? The references don't seem to have them. And if the references had them we could just summarize and point to the references. So first step is {{ citation needed}}. Dmcq ( talk) 20:45, 10 June 2012 (UTC)
Many of the tables were created by an anonymous user. I say we give it a week just like other things. If nothing is done, then we can eliminate all of the tables until references are inserted.-- Jax 0677 ( talk) 21:35, 10 June 2012 (UTC)
Sounds like a plan to me. Dmcq ( talk) 22:25, 10 June 2012 (UTC)

Issues with this guideline?

With recent events, such as deletion of Ashton Kutcher on Twitter, Personal life of Jennifer Lopez, and Rihanna on Twitter, bad splitting (and awful transclusion) of List of Codename: Kids Next Door episodes, and bad use of Template:very long, is there something generally wrong with this guideline? Is it consistent with other policies and guidelines? -- George Ho ( talk) 00:00, 4 July 2012 (UTC)

Like what? What is the problem you see? Dmcq ( talk) 00:12, 4 July 2012 (UTC)
The "Rule of thumb": Human rights (  | talk | history | protect | delete | links | watch | logs | views) article is over 100kb, even after splits, and... no condensation or further splitting is needed. Nevertheless, I am not sure if there is something wrong with this guideline anymore; I'm too frustrated that I don't know what to do. The "No need for haste" is getting ignored more often per List of Codename: Kids Next Door episodes. -- George Ho ( talk) 00:21, 4 July 2012 (UTC)
The prose size of Human rights is only 54kB. It's big (and oh wow, what a table of contents), but not above the 60kB limit given here (although I reckon further condensation would definitely be useful). CMD ( talk) 00:34, 4 July 2012 (UTC)
So no issues with this guideline? If no issues, then how can this guideline be consistent with WP:What Wikipedia is not and cases of Twitter articles and unnecessary personal forks of people? -- George Ho ( talk) 02:21, 4 July 2012 (UTC)
Please explain your point in more detail like I asked you to. I don't know what your point is. As far as I'm concerned what you have written so far could have been just strung together by a chatbot, that is how little meaning I have extracted from your question about consistency.
As to the Human rights article I definitely think it could do with cutting it down and summarizing more. All the subtopics are notable so there is no problem about moving things out into other articles. With a smaller article a person would be able to read it all easier and then just click on the links for the bits they want to know about. Wikipedia is an internet encyclopaedia, more use should be made of links.
The Human rights article is 111kb as seen by the history which means the guideline definitely suggest splitting. That confirms my own feeling that the article is oversze and should hav bits split out better Dmcq ( talk) 08:04, 4 July 2012 (UTC)
Careful, that number is not usually considered very big at all. Article size on the history list can be deceptive and apparently too big mainly because references can be very extensive without that making the article unreadable. The human rights article has a quite long list of references.
Breaking articles up, and ruthlessly pruning them generally damages articles, information generally falls down the cracks between the subarticles.
At 54k readable text it's borderline, it could do with being perhaps only slightly shorter, it's not really oversize.
Basically, it's not that big, you could leave it alone with a clean conscience or give it a light pruning. Teapeat ( talk) 12:29, 4 July 2012 (UTC)
Pruning means discarding. No discarding is involved in splitting off and summarizing. In fact splitting off a notable subtopic tends to give them more space to grow without people complaining about the length and them being unreadable. My feeling about the Human rights article is that it has definitely become too big and is not easily read as an entity. There are too many little bits which do not relate to the main topic directly but only through subtopics. I find the substantive rights section rather worrying in particular as there seems to be no basis for the decision on what to include in it. ANd when I looked at Universal Declaration of Human Rights it is a straight copy of the declaration with no analysis and no links to articles about the subjects covered whereas it could provide a good basis for structuring references to individual rights. Dmcq ( talk) 13:07, 4 July 2012 (UTC)−
Look, you have permission to edit the text if you think it's too long. These rules of thumb indicate that it's a bit too long, but they're only rules of thumb: ultimately it depends on the article. Teapeat ( talk) 14:14, 4 July 2012 (UTC)

As Dmcq said, I must respond. Writing about one topic can result a big article. Nevertheless, writing about a subtopic must be consistent with applicable policies and guidelines; otherwise, a subtopic article may be at risk of deletion, like Ashton Kutcher on Twitter. I'll rephrase the "consistency" part: Does this guideline have to mention WP:What Wikipedia is not when it comes to articles of topics and subtopics? Why can't this guide mention about any other policies and guidelines? -- George Ho ( talk) 14:24, 4 July 2012 (UTC)

It can, but they're really implicit. If something isn't due enough to have a large section on the main article, and isn't notable enough to stand up on its own, it should probably not be on wikipedia. CMD ( talk) 15:59, 4 July 2012 (UTC)
Why not explicit explanation? Sometimes, people tend to take size too seriously without considering consequences, like Twitter articles. -- George Ho ( talk) 16:05, 4 July 2012 (UTC)
Such as what? I don't know the circumstances surrounding the twitter articles, so I can't really help with them at the moment. CMD ( talk) 16:12, 4 July 2012 (UTC)
Twitter aside, as I realized, notability is not easy to define, so notability is not a good example to include in this guideline. It might have been included before, but notability is often misinterpretted as a guarantee of a "valuable article", so I guess it is best not to include it again. Therefore, what about adding a section of what Wikipedia is and is not, so readers may not be forced into reading them further? If that's not it, what about mentioning WP:Manual of Style? -- George Ho ( talk) 16:23, 4 July 2012 (UTC)

Rule of thumb (4 July)

I've changed the rule of thumb to refer to the markup size as given in the history. It is pretty obvious that people have used this normally and mean this. There would be no point talking about limits on sortable tables otherwise as readable prose doesn't include tables according to the bit at the beginning. Dmcq ( talk) 14:07, 4 July 2012 (UTC)

No, you just messed up the intro, I reverted it.
Look, the really common problem we have with this guideline (as opposed to your specific article) is that people assume that any article with a wiki markup size of 100k needs savage pruning, but more than half of that can be references and other things that just don't count. They then try to shrink the article by about 50%, which is usually far, far too much. Teapeat ( talk) 14:14, 4 July 2012 (UTC)
The more technical issues, like wiki markup size and browser limits can be problematic, but much, much less often. Teapeat ( talk) 14:14, 4 July 2012 (UTC)
I changed the intro because it doesn't reflect reality. People are not using the script mentioned in this to measure article sizes, they use the markup size. The guidelines should reflect reality. If you don't agree with that then perhhaps we could set up an RfC to resolve the issue. If you would like to state a case for prose size rather than markup size that would be good. Dmcq ( talk) 14:17, 4 July 2012 (UTC)
The table states that it only applies to readable text size. It's completely impractical to use markup size because every time you added references and other markup you would have to shrink the prose size to compensate. Teapeat ( talk) 14:22, 4 July 2012 (UTC)
This is a guideline not a policy. There is no 'have to' about it. The rule is a 'rule of thumb'. It is supposed to be easy to follow. The markup roughly gives the amount of information in an article. If an article has large numbers of citations that outweigh the straight text then it should still be split. Just because it is a citation does not stop it costing money or taking time to download to a phone or cause it to print on less pages. The readability bit talks about 50k being an upper limit whilst the rule of thumb says 100k. That is plenty of room for citations. Dmcq ( talk) 14:29, 4 July 2012 (UTC)
It's definitely the readable prose size that people use. For example in FAC or GAN discussions about whether an article is too long or not, it's always readable prose size that is used, not the total markup size that the 'history' command shows. Yes, you have to install the User talk:Dr pda/prosesize.js tool to get the readable prose size, but most serious editors doing reviewed article work do that. Wasted Time R ( talk) 14:32, 4 July 2012 (UTC)
Okay I will set up an RfC on this question. Dmcq ( talk) 14:34, 4 July 2012 (UTC)
Seriously don't bother. It won't go the way you want; and I don't even understand why you want it; if you want to reduce the size of the particular article you're concerned about, go ahead, the guideline indicates it's a bit too big anyway (the target is no bigger than 50k, and it's currently at almost 55k). Teapeat ( talk) 14:43, 4 July 2012 (UTC)

RfC: Should the rule of thumb for article size refer to readable prose size or markup size?

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section.

Should WP:SIZERULE, a rule of thumb guideline for saying an article is too large, refer to the 'markup size' of an article given by the size in its history - the number of bytes downloaded when an edit is done, or should it refer to the 'readable prose size' given by the text excluding any footnotes and reference sections ("see also", "external links", bibliography, etc.); diagrams and images; tables and lists; Wikilinks and external URLs; and formatting and mark-up as given by the script User:Dr_pda/prosesize? There is a discussion above in Wikipedia talk:Article size#Rule of thumb. Dmcq ( talk) 14:52, 4 July 2012 (UTC)

  • Markup size I believe this is how the rule of thumb has normally been used. The markup includes lists and it is double the 50k mentioned in Wikipedia:Article size#Readability issues as being an upper limit for readable prose size, 100k allows plenty of room for citations. There would have been no point mentioning tables in that section if people meant readable prose as tables are not included in that. Dmcq ( talk) 14:52, 4 July 2012 (UTC)
  • Readable prose size. This has always been the meaning, such as in FAC and GAN discussions when issues of whether an article is too long has arisen. Note that User:Dr pda/Featured article statistics has a bunch of statistics of lengths of featured articles; the stats all use readable prose size, not markup size. And if you look at these back-links, you can see all the times these readable-prose-size-based statistics have been referenced in FAC, GAN, and other talk page discussions. Wasted Time R ( talk) 15:03, 4 July 2012 (UTC)
    • That give about 95 references. There are about 6500 references to WP:Article size. The question is what have all those other ones done? Dmcq ( talk) 15:11, 4 July 2012 (UTC)
  • Readable prose size, call for snow close This RFC is a waste of everyone's time, using markup size would discourage proper referencing of articles since references take up lots of markup space but aren't readable prose. The guideline already explains why readable prose is a useful thing to limit, and readable prose is already routinely being used in FA and GA reviews. This is a very bad idea indeed. Teapeat ( talk) 15:05, 4 July 2012 (UTC)
    • Why aren't you using 50k as the limit for readable prose size? Dmcq ( talk) 15:15, 4 July 2012 (UTC)
      • The 50k readable prose size mention comes from being roughly 10,000 words, but in experience 10,000 words more often equates to 60k readable prose size, so 60k is the point where discussion usually begins. And in any case, this is just a guideline, not a hard limit. As you can see from that list of featured articles, there are plenty that go over 60k readable prose size, and more are being added all the time. Wasted Time R ( talk) 15:24, 4 July 2012 (UTC)
        • So you are not in fact following this guideline or believe this guideline should have the figures in Wikipedia:Article size#Readability issues upped quite a bit to double the amount? Dmcq ( talk) 15:31, 4 July 2012 (UTC)
          • I'm stating two bits of reality: the rule of thumb has always been interpreted as being readable prose size, and the whole guideline has always been interpreted as just that and not a hard limit. Anyway, do you have any objection if I invite editors at Wikipedia talk:Featured article candidates and Wikipedia talk:Good article nominations to comment here? If your proposal of a 100k cap on markup size goes into effect, that means many articles will have to go undergo a drastic cutting down. Wasted Time R ( talk) 15:55, 4 July 2012 (UTC)
  • What about page load size? There are two reasons we should be limiting article size: one for reader attention span which is about readable prose, and one for the bandwidth the article takes to load into the browser. The second metric should include the size of image thumbnails, adding considerably to the plain text and markup. Page load size is the only other metric worth considering; markup size as a measure of bandwidth is incomplete. For what it's worth, I think we should have two limits: one plainly stated for readable prose, and another for page load. Binksternet ( talk) 16:02, 4 July 2012 (UTC)
    • Because it is a rule of thumb which any user can check easily rather than some complicated thing requiring experts to measure. The various measure are fairly loosely linked. The section on readable prose size says 50k, actual articles tend to be limited at about 100k in the size of the file edited. The download size is quite a bit more and if you include images it goes up again, plus one probably wouldn't want to include the sizes of the scripts and style pages as they would normally be cached after a couple of accesses. Dmcq ( talk) 16:12, 4 July 2012 (UTC)
  • Prose size Prose size can be extremely different to markup size. Obviously they're going to correlate somewhat, but not nearly closely enough for one to estimate the other well. Prose size gives a rough estimate of the time a reader has to spend reading to finish the entire article, and that seems to be the entire point behind the guideline, establishing a consistent standard of a comfortable reading time. That's not to say we can't establish a separate limit on markup size or anything like that, if that's feasible. (I don't know what script I'm running to do it, but Page size is in my toolbox.) CMD ( talk) 16:03, 4 July 2012 (UTC)
    • Then please explain 'They also apply less strongly to list articles, especially if splitting them would require breaking up a sortable table.' The readable prose size of a table given by that script is zero. Plus there are other issues like time to download and cost for mobiles. Plus could somebody please explain why the 50k in Wikipedia talk:Article size#Rule of thumb which is the section which justifies this idea is being ignore? Dmcq ( talk) 16:21, 4 July 2012 (UTC)
      • For instance we have for List of bus routes in London
        File size: 565 kB
        Prose size (including all HTML code): 7882 B
        References (includng all HTML code): 168 B
        Wiki text: 185 kB
        Prose size (text only): 3973 B (653 words) "readable prose size"
        References (text only): 9 B
        Do people really mean this is fine by this guideline because it only includes 3973 bytes when in fact the markup size is 185 kB? Dmcq ( talk) 16:30, 4 July 2012 (UTC)
      • List articles aren't read in the same way normal ones are. I'm willing to bet that readers will just go through the top 10 or so, or use ctrl+F if they're finding something. As I said, I have nothing against a separate limit on markup size, or perhaps a separate list guideline. My response to this RfC was based on its idea of changing the current Sizerule from prose to just markup, because I think a prose limit is quite important for keeping our articles concise and engaging. CMD ( talk) 16:31, 4 July 2012 (UTC)
    • Then look at the last few articles that were tagged with {{ too long}} at the top and ignoring list articles. Economy of Pakistan has markup size 98kB and prose size 52kB so it has a problem according to Wikipedia:Article size#Readability issues and a problem by rule of thumb viewed as markup size but not when viewed as prose size. Socialism gives 120kB and 63 kB. Talcott Parsons gives 167 kB and 101 kB - too big by both. Phil Keaggy 55 kB and 38 kB, Clara Bow 66 kB and 28 kB, Capitol records 44 kB and 32 kB, # Humanitarianism 111 kB and 84 kB, Kunming 112 kB and 68 kB, Stress (mechanics) 77 kB and the script failed, Odisha 104 kB and 67 kB, Commodity Futures Modernization Act of 2000 213 kB and 46 kB.
      The thing I take from this is that people get worried about articles being too long at a point long before the prose length is 100 kB. Commodity Futures Modernization Act of 2000 is an example where there aren't tables but the citation sizes bump the file size up considerably - have a look at the citations an see if there is a problem with them! The talk about the citations is no excuse for using prose size instead of the markup size as far as I can see. Dmcq ( talk) 17:13, 4 July 2012 (UTC)
      • I removed this template from 150 pages a while back, including Education in Singapore, Cat, Human rights, and Social Security (United States). I don't think readability is that bad at all or problematic. -- George Ho ( talk) 17:25, 4 July 2012 (UTC)
        • What's your point? That you are happier with longer articles than other people? Dmcq ( talk) 17:48, 4 July 2012 (UTC)
          • I was pointing out the possible misuse of "very long" tag; that's all. Does "Cat" article have to be condensed or tagged as "very long" just because it's "long"? -- George Ho ( talk) 18:18, 4 July 2012 (UTC)
            • I think personally the cat article really could do with trimming and moving bits out. A particular problem I see with it is that the summaries in the cat article seem to be developed independently of the subtopic articles. For example cat genetics is a stub article but there is a bit on it plus a separate section on taxonomy and evolution which pointed to cat evolution which was a small section in cat gap. The Health section is also a mess developed independently of the cat health article. The whole behaviour section was a larger and even more messy version of the same problem. Keeping the size in check would help avoid problems like these. Dmcq ( talk) 22:52, 4 July 2012 (UTC)
              • If you want to tag it as "very long" or {{ overly detailed}}, go ahead. Still, I wonder if read of thumb is really helpful at this time or in the past. -- George Ho ( talk) 23:02, 4 July 2012 (UTC)
                • You're just missing the point. Overly detailed is not appropriate, what is appropriate is that the section on cat genetics should have a summary corresponding to the lead of the cat genetics and cat gap articles. However the cat genetics article remains a stb and stuff is being shoved into the cat article. This is what I mean about WP:NOTPAPER. The cat article is being developed as a book rather than a page in a hyperlinked encyclopaedia rather than following WP:SS guidelines for instance. Dmcq ( talk) 14:24, 7 July 2012 (UTC)
Unfortunately, for you, I like the article the way it is. And I'm amazed that everybody here wants to keep a "Rule of Thumb" section. I don't know how that section is related to the quality of this article. I get a feeling that, without a "Rule of Thumb", this "guideline" would become nothing more than an essay. Unfortunately, for me, I must condense it in favor of reducing length if that section were kept. Would this affect my writing quality? -- George Ho ( talk) 14:27, 7 July 2012 (UTC)
Thanks for addressing the issue instead of just stating a preference. I have put in a new section at #Why wikitext instead of prose text size to try and address the question fully as I see it. No I was not intending the rule of thumb should go, I just want it to be fit for purpose. I don't understand what you mean about "Unfortunately, for me, I must condense it in favor of reducing length if that section were kept. Would this affect my writing quality?". If you like that article as it is I guess you are saying you prefer a monolithic article built like a chapter of a book rather than using links. Both ways can have good writing but I would point again at WP:NOTPAPER as encouraging writing for an internet based medium. Dmcq ( talk) 15:37, 7 July 2012 (UTC)
I'll rephrase: Must I change my writing ability for the sake of length and condensation? Must everybody else? To me "Rule of thumb" helps "Article size" page become a guideline; without it, its guideline status would be doomed to failure. I couldn't and wouldn't let loading and reading issues get to me. I hope these issues are too minor to everyone; in fact, if too lengthy, anybody can resolve one issue or another by editing or addressing one problem of an article in the talk page, not here. -- George Ho ( talk) 16:29, 7 July 2012 (UTC)
If you cannot adjust your style to the internet as outlined in WP:NOTPAPER then possibly somebody else can fix the problems. It is not necessary that everybody be able to do everything well, people have different talents. However if your style is such that you would resist other people putting the stuff into subtopics when article become large because you want everything in one large article rather than use hyperlinks, then you would definitely be acting against the express consensus in the policy. The issues are not minor. Dmcq ( talk) 23:00, 7 July 2012 (UTC)
  • Both markup size and readable prose size have important ramifications. Prose size has to do with striking a balance between informing and exhausting the reader. Markup size has to do with keeping the page's bandwidth requirements within a reasonable size for people who have limited access to the internet. There should be two plainly stated limits for these measurements. Binksternet ( talk) 17:57, 4 July 2012 (UTC)
    • I gave a few figures above foir Economy of Pakistan etc., are there any there or ones you know of where you think the prose size said something useful which the markup size does not indicate just as well? We are talking about a rule of thumb. Dmcq ( talk) 22:58, 4 July 2012 (UTC)
      • See for example Mulholland Drive (film), which is 53 kB (9,063 words) readable prose size and 93,884 bytes in markup size. Now look at John McCain, which is 54 kB (8,832 words) readable prose size but 164,027 bytes in markup size. Both are FA articles and although very close to each other in readable prose size, their markup sizes are very different. Why? Mulholland Drive has 121 footnotes, while John McCain has 331 (our political BLPs tend to be heavily cited, for obvious reasons). Under your proposed guideline change, John McCain would have to undergo a 40 percent reduction in size, when in reality there's nothing wrong with it. Wasted Time R ( talk) 10:33, 5 July 2012 (UTC)
        • I would consider both articles as at the limit for readable prose, which is accord with what Wikipedia:Article size#Readability issues says about 50kB of readable prose as being at about the limit. However the section Wikipedia:Article size#Rule of thumb gives a limit of 100KB. Do you really think that these two articles are only about half the size of an article which is too long? That is what you are saying if you say the rule of thumb applies to readable prose instead of the 50kB of the readability issues section. Yes if the 100KB was interpreted as markup size then the John McCain article would be considered as too long whereas Mulholland Drive would scrape in. That certainly accords far better with my assessment of the size of these articles as being at the top limit. The John McCain article has a very large number of citations many of which have text associated so a case could be made for it being kept as a unit but personally I feel it should be smaller. Why for instance does it have a big section Early life and military career, 1936–1981 which contains practically the whole of the subtopic Early life and military career of John McCain instead of summarizing it better? The same with House and Senate career of John McCain, until 2000? Have these people never heard of just summarizing a subtopic in the main article in accord with WP:SS? I get the feeling the people there don't really know what hyperlinks are in aid of or trust them, I don't know how it ever became a featured article. Dmcq ( talk) 11:57, 5 July 2012 (UTC)
          • You're wrong about your ratios. The "Early life and military career, 1936–1981" section of John McCain is 9.8 kB (1626 words) readable prose size. The Early life and military career of John McCain subarticle is 46 kB (7728 words) readable prose size. That's a 1-to-4.7 ratio, which is hardly "contains practically the whole of the subtopic" as you claim. Doing any less in the main article would shortchange the reader about one of the most important, and most written-about, periods of McCain's life. The ratio for the other section/subarticle you mention is roughly similar. As for the people on that article not knowing what they are doing, I'm the primary author of all of these McCain articles, and I assure you I did know what I was doing. If you are really convinced that John McCain should be stripped of its FA status because its markup size is over 100kB, put it up at WP:FAR and I'll see you there. Wasted Time R ( talk) 12:35, 5 July 2012 (UTC)
            • Sticking so much into the main article is just wrong. Wikipedia is not a paper encyclopaedia with pages one after the other. If a reader wants to look at that they only need to click on the link and the main points could have been summarized far better like they are in the lead to the subtopic. What has been done is that the main article has been cluttered up with stuff that shouldn't be there. Problems have been created by people not choosing the amount of detail to put into the main article properly. It is a badly structured article. What we've got is an article which has been grown unnecessarily to some limit rather than one that has a reasonable structure. There is no clear limit on the size of the early life part, it was obviously split off because the article was too big but then something went wrong. It is not a summary and it is not a proper description. If the article was done properly people scanning it linearly would be able to read a short description of the early life and then know whether to click on the link or not. What is there makes reading difficult.
              I repeat again since this article clearly has the problem - Wikipedia is not a paper encyclopaedia. That is the very first section of WP:NOT. Notice also the statement there "Keeping articles to a reasonable size is important for Wikipedia's accessibility, especially for dial-up and mobile browser readers, since it directly affects page download time (see Wikipedia:Article size)." The idea of readable prose is not covered there and it refers to this guideline. Dmcq ( talk) 13:08, 5 July 2012 (UTC)
  • Readable prose per the comments made by others, above. -- Dweller ( talk) 14:04, 6 July 2012 (UTC)
    • It would be nice if somebody could elaborate on their support or give a reason why one should not consider 50k as being a strong hint of an article being too long as per the section about readable prose size or say what on earth the business about lists in the section about the rule of thumb is in aid of or why WP:NOTPAPER only talks about download time. Dmcq ( talk) 15:07, 6 July 2012 (UTC)
  • The meaning has always been total load size. The underlying reason for the rule is that, despite the advertising claims of ISPs and high-bandwidth device manufacturers, many of our readers still read Wikipedia articles on slow connections such as dial-up. The number of high-speed users is increasing but oddly, the number of low-speed users is holding steady. The footnotes, references, bibliographies, embedded templates, etc, all feed into the total pageload time and must be considered as part of the burden on readers. We should not deliberately create problems for those readers unnecessarily. We can reasonably debate about whether the 50k limit is the right balancing point but it is disingenuous to try to arbitarily define large blocks of content away and ignore the real problems that it creates. Rossami (talk) 17:58, 6 July 2012 (UTC)
I don't believe so. On that measure Phoenix Park would already be starting to go over the 100k limit in the rule of thumb but I don't think many would consider it as anywhere near a limit. (download file 102Kb, markup size 23kB, prose 10KB). On the other hand Venus (410 kB, 107 kB, 44 kB) would be considered as being double the 200kB at which one should start consider splitting according to the technical section, at the limit according to the markup size, but not yet near considered for division according to readable prose size rule of thumb despite the readability issues section saying 50kB is right on the limit of the average concentration span of 40 to 50 minutes - so really anything more is practically definitely wasting bandwidth compared to organizing the material differently. Dmcq ( talk) 21:48, 6 July 2012 (UTC)
Well, factually, the original size used back in 2003 was actually the wikitext size. For example see: [3] GliderMaven ( talk) 01:05, 7 July 2012 (UTC)
Over the years, this seems to have changed to being readable text size; and the recommended size has increased, and a rationale about readability has been added. For example by 2006, the recommendation has increased and certainly seems to exclude markup, which presumably means it's by then referring to readable text: [4] GliderMaven ( talk) 01:05, 7 July 2012 (UTC)
  • Mostly readable prose size: I think we have to be conscious of bandwidth issues too, but it's mainly to keep articles on topic and to a readable level of detail, with proportionate weight. Shooterwalker ( talk) 23:48, 6 July 2012 (UTC)
  • Readable prose is what really matters. — Kusma ( t· c) 11:55, 7 July 2012 (UTC)
  • Readable prose size is what matters in terms of the educational use of Wikipedia. Beyond that is worth limiting overall size to facilitate access by easy download, particularly in disadvantaged regions with slow internet connection, but that is a distinct and secondary issue, as it mostly relates to non-prose (images, tables, etc) . -- ELEKHH T 08:28, 12 July 2012 (UTC)
  • Readable Prose size — These rules of thumb only apply to readable prose (found by counting the words) and not to wiki markup size (as found on history lists or other means).  Brendon is  here 14:23, 16 July 2012 (UTC)
  • Readable Prose size - Though noting that certain transclusions of prose, and also some content-displaying templates could be included. (Though obviously not template coding, nor navboxes or infoboxes or the like). - jc37 16:01, 16 July 2012 (UTC)
  • Mostly readable prose size - or we'll have the crazy situation where good referencing and an eye-relieving quote box punishes the article (through forcing important info out into subarticles), whereas poor referencing and walls of text encourages even more of the same bad. I wouldn't mind an additional recommendation for markup size, but that's for a different and IMO secondary concern (e.g. because image sizes matter much more than extra markup sizes). – sgeureka tc 13:17, 23 July 2012 (UTC)
Why does everyone here completely ignore the fact that the current upper limit on prose text is something that shouldn't be reached anyway? One should have looked at splitting the article long before that is reached. This interpretation is a very rigid and damaging one. Dmcq ( talk) 19:41, 23 July 2012 (UTC)
Probably because that hasn't been raised. All people are saying is that there should be a limit on prose length, in response to your RfC question. What the limit should be is another discussion. CMD ( talk) 05:46, 24 July 2012 (UTC)
Well my feeling is people here haven't the foggiest idea what rule of thumb meanswhich is mentioned in the title: "It is an easily learned and easily applied procedure for approximately calculating or recalling some value, or for making some determination." the meaning given in Wikipedia is accurate. Whereas prose text size is a not an easy measure, it doesn't apply properly for a large percentage of articles, and it is being applied absolutely at the limit which is far above where the guideline indicated. Dmcq ( talk) 08:54, 24 July 2012 (UTC)
And my experience is that on Wikipedia, "guideline" is understood by a significant proportion of editors (even established ones) to mean "gospel". Take my recent attempt to change the shortcut to WP:SIZEGUIDE for instance, quickly reverted in favour of WP:SIZERULE. If the problem is that pages take too long to load, images and citation templates are far more problematic than the amount of text: deal with those. If the problem is that some articles are simply far too long, then readable prose is the appropriate measure. — WFC— 10:08, 24 July 2012 (UTC)
Is there closure on this issue now? Does anything need to be added to this guideline to make this a bit more clear, to avoid future disputes? Shooterwalker ( talk) 03:16, 3 August 2012 (UTC)
  • Readable Prose size - did think momentarily about closing, but realised I'm not impartial so voted instead. Soon we'll all have fast enough connections not to worry too much about page size anyway...it's about measuring what folks read and (presumably) their attention span.... Casliber ( talk · contribs) 14:36, 4 August 2012 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Changing the nutshell?

"Articles should not be either too big or too small."

Is this the right way to say about articles, or must this nutshell be reworded? -- George Ho ( talk) 03:25, 6 July 2012 (UTC)

Rule of thumb

How is one supposed to calculate the Readable Prose size. Can the method be included in the article somewhere? Op47 ( talk) 13:00, 6 July 2012 (UTC)

There's a script one can include referenced in the Readability issues. I've had it crash on me after just outputting the first line but it normally works okay. Treating it as a simple thing for a 'rule of thumb' is pretty silly though I think, and the size articles are allowed using it are quite a bit larger than many people normally think of as too large when they stick that template in. I'd like to see some statistics on the sizes of featured articles and see what the FAC editors have been up to. Dmcq ( talk) 13:25, 6 July 2012 (UTC)
I see a bit in the guideline about size not being a reason to remove stuff. Sort of, but not quite true either. Size is a very good indicator that trivia is being put into an article if it can't be split out into subtopics that have some notability. It doesn't apply that way for lists but lists can be arbitrarily split for instance by ones starting A-E so the bits don't become too huge. Dmcq ( talk) 13:29, 6 July 2012 (UTC)
Now the "Content removal" is becoming disputed, unless it's not... -- George Ho ( talk) 15:41, 6 July 2012 (UTC)
I get the feeling you are approaching the guideline as something legalistic rather than as a guideline. If an article gets very long then yes people do start wondering about the content. It mat be that legalistically size is not a reason for removal but the article should be inspected if it is too large and stuff would often be removed that would be kept in a smaller article. Policies are supposed to describe what is done. So yes I disagree with the wording. Dmcq ( talk) 17:52, 6 July 2012 (UTC)
Now with all discussions attempting to keep that section by proposing amendments, why am I the only one who feel that this section is causing nothing but harm to Wikipedia, even with such amendments? -- George Ho ( talk) 01:34, 9 July 2012 (UTC)

Why wikitext instead of prose text size

The major thing one should do if things are too long is see if large subtopics can be cut out as detailed in WP:SS. If there are no major subtopics and the article is very long then it is worth wondering whether there is sufficient weight for the inclusion of some of the contents. Personally I haven't seen any very long articles not filled with trivia or where large bits shouldn't be cut out into notable subtopics. The only ones that aren't like that are list articles which have special provisions for splitting.

I don't think the prose text argument for the rule of thumb holds any weight because the WP:Article size#Readability issues says 50k is a limit and yet the rule of thumb is being taken as meaning there are no such issues till the prose text goes over 100k. People should be thinking about size before getting to 50k and should have it definitely in mind over 50k, 100k is a very negative aspect of a featured article.

Wikitext size tends to be about twice the size of prose text for featured articles though it can be more if citations have a lot of extra text in them, and it can be much more if there are tables since prose text ignores table sizes. The advantage of wikitext is that it is directly supported by Special:Longpages and the article history whereas prose text is not well supported. Thus if it is a reasonable measure it is a better measure as a rule of thumb. As to the current rule of thumb if 100k was interpreted as wikitext it would put prose size at about 50k before one should definitely consider splitting which is about right by the readability issues section.

The other point in wikitext's favour is a consideration of a mjor problem in Wikipedia. People are dumping in databases of sports results, election results, manga characters etc etc into Wikipedia. These articles come out as only a few hundred bytes as far as prose text is concerned but often involve downloading over a megabyte and quite long times to display even ignoring mobile browsing. Rule of thumb is supposed to be something easy, if featured article people want to polish an article that is another business and they can take time over it, but dealing with the great mass of rubbish being stuck into Wikipedia requires more everyday tools.

For straightforward articles which are okayy wikitext is as good as prose text as a rule of thumb and for everyday use talking about articles which are too large consisting of huge tables prose text is simply useless. As to lists they have their own rules and can be split fairly arbitrarily but their rules stop them being misused as database dumps in quite the same way as normal articles. Dmcq ( talk) 15:26, 7 July 2012 (UTC)

By the way I have also just set up WP:VPT#Section viewing to start thinking about coping better with long articles on mobile devices. Probably somebody else has been at this sort of thing before but getting changes in to the wiki software isn't that easy.. Dmcq ( talk) 15:44, 7 July 2012 (UTC)

Proposed load size rule of thumb

Ok, so I did some research, I read this article: Loading today's sites over dialup about load times.

I also played around with this web analyser tool: [5], I pointed it at a few article pages, the main page, and a couple of featured articles.

It looks like, roughly, an article that is about 1 megabyte of load size takes about 2-2.5 minutes on a 56K modem. The main page is about 80K and takes about 20 seconds. Yesterdays Calgary Stampede FA was 222K and would load in about 51 seconds, and the rocket page is 500K, 1:52 on a 56K modem (note that long articles tend to load a bit quicker than you would expect because of queue latency at the webserver that hurts short articles more.)

I don't have any reason to think that those articles are particular large, but I tentatively suggest we write down 1 megabyte to limit the maximum size, and to have a rule of thumb that it's all good up to 250K (a load time of under a minute).

To put this in perspective, according to that article the average size on the wider web is a bit over 1M and the load time on a 56K modem is about 2 minutes 30, so although 2 minutes is a long load, it's still above average.

I mean in an ideal world I would prefer everything to load in 5 seconds on 56K modem, but that's not going to happen, I don't think people want an encyclopedia web page that looks like it's 1993. So we have to be a bit reasonable.

webpage load size Header text
<250K good
250K-500K acceptable
500K-1M consider shrinking or splitting
>1M should be split

So I would suggest that we add that, in addition to the rule of thumb on prose length. Does that sound OK? GliderMaven ( talk) 16:12, 7 July 2012 (UTC)

The biggest problem I think with that is that a rule of thumb should be something easy, and measuring the file size is not easy. Download file size can differ between whether maths is displayed as png or Tex for MathJax and only people with some technical nous can check it. It is much easier to find than prose text size but it is still a problem. That is why I'd prefer the wikitext size as shown in the history pages for every revision.
As to the actual sizes comparisons the relative sizes tend to be about 8 download bytes to every 2 wikitext bytes to every 1 prose text byte. The major difference is if there are tables prose text ignores them, you'll be amused (?) to learn the prose text script says there is (0 bytes) of prose on the main page! On that basis your advice is roughly equivalent to the prose text guideline and at least catches the huge tables that have been pushed by some sports database dumpers.
If people could actually consider acceptable as meaning that and consider shrinking or splitting as really meaning to consider that articles that long have some problems I would be happy, but it seems they are taking maximum as meaning okay. It should be I feel that in a FAC consider splitting really means that and that special pleading should need to be done for accepting an article in that range as a featured article.
Converting the table above with the 8:2:1 equivalence to wikitext and putting in some encouragement to make smaller gives:
wikitext size What to do
< 30k normally too small to split
< 60k good readability
60k - 120k acceptable but splitting may be helpful
120k - 250k can have readability issues, consider shrinking or splitting
> 250k almost certainly should be split
This would make it easier to judge without special tools and makes it clearer that the maximum is a maximum and not an optimium. Dmcq ( talk) 23:57, 7 July 2012 (UTC)
Oh sure, it's a lot easier to do that, but it just doesn't work, it's not in any way equivalent, and there are no stable conversion ratios you can use. The trouble is that most of the load time and load size is due to loading images, but images take up virtually no wikitext, just a hundred bytes or so. But they balloon up when you load the page to many kilobytes, by a factor of 50 or more. Also references are very bulky in terms of wikitext but don't add much to the load time. GliderMaven ( talk) 00:13, 8 July 2012 (UTC)
Your idea just doesn't work at all. GliderMaven ( talk) 00:13, 8 July 2012 (UTC)
For the Calgary Stampede file size of 260k is the size of the individual html corresponding to the wikitext without images. In that case the wikitext was 72k and the prose 37k so in fact it was quite close to the ratio 8:2:1. I see rocket is one of those for which the prosesize script failed so I can only give the first two figures of 362k and 119k which is also close enough to 8:2 for an easy rule of thumb. We're not talking about something exact.
As to image sizes, that is not included in the file size of 260k of Calgary Stampede that you quoted so you weren't including them yourself. In fact the images for that come to about 628kB. One doesn't notice images so much as they get loaded last normally and you don't see the empty space except by scrolling quickly to the bottom. For rocket the total size of the images is 340kB. If one were to load rocket with an empty cache, not even all the javascript and css, one would load about a megabyte. Coming from another page about 700kB needs to be loaded.
I agree image size is an important consideration but people don't notice it so much in time, it is mainly a price cost to mobile browsers. I think a separate section would be needed about images and the main way to cope is not have such large ones and to use summary style with subtopics, i.e. for people to realize this is an internet encyclopaedia. Dmcq ( talk) 11:27, 8 July 2012 (UTC)
Those numbers are just not true. The prosesize tool doesn't give you load times or load sizes. You have to use a proper web analysis tool like the one I already linked to. The breakdown of Calgary Stampede loading it gives is that it's 222K as follows GliderMaven ( talk) 13:44, 8 July 2012 (UTC)
URL: http://en.wikipedia.org/wiki/Calgary_Stampede
Title: Calgary Stampede - Wikipedia, the free encyclopedia
Date: Report run on Sun Jul 8 09:07:20EDT2012
Diagnosis
Global Statistics
Total HTTP Requests: 33
Total Size: 222714 bytes
Object Size Totals
Object type Size (bytes) Download @ 56K (seconds) Download @ T1 (seconds)
HTML: 52324 10.63 0.48
HTML Images: 124925 29.50 5.26
CSS Images: 15941 3.78 0.68
Total Images: 140866 33.28 5.94
Javascript: 23997 5.78 1.13
CSS: 5527 1.30 0.23
Multimedia: 0 0.00 0.00
Other: 0 0.00 0.00
As you can see 2/3 of it is in images, the HTML markup is only 52K. As I pointed out, the load size is dominated by images, which are loaded as thumbnails, but they will take up virtually no wikitext at all. GliderMaven ( talk) 13:44, 8 July 2012 (UTC)
The Rocket page analyses as follows, as you can see it's 500K:
URL: http://en.wikipedia.org/wiki/Rocket
Title: Rocket - Wikipedia, the free encyclopedia
Date: Report run on Sun Jul 8 09:30:45EDT2012
Diagnosis
Global Statistics
Total HTTP Requests: 79
Total Size: 500598 bytes
Object Size Totals
Object type Size (bytes) Download @ 56K (seconds) Download @ T1 (seconds)
HTML: 81986 16.54 0.63
HTML Images: 367824 86.91 15.55
CSS Images: 15941 3.78 0.68
Total Images: 383765 90.69 16.23
Javascript: 29320 7.04 1.36
CSS: 5527 1.30 0.23
Multimedia: 0 0.00 0.00
Other: 0 0.00 0.00
You're basically, repeatedly looking at a surrogate number, wikitext, but there's no reliable correlation at all with the actual things that people actually care about, like page load time or reading time. GliderMaven ( talk) 13:44, 8 July 2012 (UTC)
I wasn't looking at wikitext, I was looking at the actual sizes. I will list the images loaded specifically for Calgary stampede and their sizes so you can check that your figures are wrong:
13.2K 220px-Stampede_chuckwagon_race.JPG
13.4K 220px-Patsy_Rodgers_stage_coach_1a.jpg
13.7K 220px-Steerwrestling-c01.jpg
14.1K 220px-Calgary_Stampede_Logo.svg.png
14.3K 220px-Calgarystampede.jpg
15.8K 220px-StampedeRodeo2002.JPG
15.9K 350px-Saddledome_from_Calgary_Tower.JPG
16.2K 220px-1923_Calgary_Stampede_parade.jpg
21.6K 200px-Barrel-Racing-Szmurlo.jpg
30.2K 220px-Bull-Riding-Szmurlo.jpg
31.0K 220px-Program_for_1912_Calgary_Stampede.jpg
73.8K 220px-Sale_Pelletier_ice_show.png
76.8K 250px-Chinook_Stampede_Breakfast.png
80.3K 220px-Stampede_Protest.png
83.0K 220px-Stampede_Midway_2011.png
91.0K 220px-Indian_Village.png
total 604.3K for page specific images. You can see from this that this sort of thing is not suitable for a 'rule of thumb' without a lot more work to make an easy tool. Dmcq ( talk) 15:36, 8 July 2012 (UTC)
It looks like the webtool is getting a bad response from the wikimedia servers perhaps; it's only getting a 5k rejection message instead of the actual image. So the actual size of the images is going to be significantly bigger. This only underlines how bad your idea of using wikitext size is; it correlates very, very poorly with anything the user actually care about; as everyone else keeps telling you; even when the tool very significantly underestimated the image size it still dominated load size. GliderMaven ( talk) 22:52, 8 July 2012 (UTC)
I've explained above about the gradual loading of images leading to an improved perceived response and people don't worry so much about the empty boxes anyway when they know they will be filled later. Images should be dealt with as a separate problem as they can mostly be adjusted independently of the text. As I said before the main fix always is to use content splitting and we need an easy rule of thumb we don't need something exact. As you have demonstrated above total download per page is not an easy measure. Dmcq ( talk) 00:11, 9 July 2012 (UTC)
No, we need a realistic measure; one that includes the size of image thumbnails and other files. If there is a way to automate such a measurement then we need to encourage the server side people to help us make it easy. First, though, is accuracy. Second is ease. Binksternet ( talk) 01:12, 9 July 2012 (UTC)
What would one be accurate about and why is it important to be accurate about it? The featured article people with their prose size do have a point, the first consideration should be readability. It is just they have ignored the real limit and pushed something that is hard for most editors and causes real problems when dealing with fanatics shoving in their databases of sports facts that really makes prose text unworkable and bad for general use so we need something that is fairly consistent with what is really wanted there. A tool that outputs total download size, total article specific images or other downloaded media size, total article specific html text size, wikitext size plus the prose text size would be nice, but we do need a simple rule of thumb to get people looking deeper and image size + text size just is not that as it can totally swamp text size problems, whereas if the text size is okay it is normally fairly easy to adjust for image size problems. Dmcq ( talk) 11:36, 9 July 2012 (UTC)
I'm in favor of three plainly stated size limits: readable prose (requires a tool), HTML markup (seen at a glance), and page load (with image thumbnails; needs a new tool). Binksternet ( talk) 13:05, 9 July 2012 (UTC)
Dmcq, you keep pushing markup size, but as everyone here has confirmed, it's the least meaningful number to use. The User:Dr pda/prosesize tool is easy to install and with one click gives all three sizes: readable prose, markup, and load size. If it's so important that everybody be looking at sizes all the time, then the best action would be for this tool to be included in all user accounts by default. I see from your user page that you've worked in the computer field - so you may be familiar with the simple Unix "wc" command. It prints the number of lines, words, and characters in a file. It doesn't just print the number of words and leave the user to make an estimate from that on the number of lines or characters. Wasted Time R ( talk) 04:08, 8 July 2012 (UTC)
Why are featured article candidates not penalized for going over 50k prose text size if readable size is so important? Why will you not engage with the fact that the main problem is people pushing enormous wadges of text with trivia into Wikipedia and that the featured articles even with being pushed to silly limits are not the really big problem. My experience with computers tells me I should make things simple. I have cut out or hidden features in products rather than release them to save on support costs and made sure examples included features users should use rather than showing arcane things. Sometimes a user would be told about a hidden feature if they said about a problem it could fix but keep it simple stupid is the right way to do things for something that says 'the encyclopaedia anyone can edit'. There is no point showing three figures instead of one for a simple rule of thumb. And prose text style has far too many problems compared to its use for readable prose which the FAC editors seem to be ignoring anyway. Dmcq ( talk) 08:55, 8 July 2012 (UTC)
I don't agree with you that there is any problem in the first place. I think the featured articles that go over 60kb / 10,000 words of readable prose size do so for a good reason and I don't think they're full of trivia. I also don't think your model of hypertext and link clicking is how readers actually use Wikipedia in many cases. If you look at the page view stats ( http://stats.grok.se/) for June 2012, for example, Barack Obama got 645,000 views while United States Senate career of Barack Obama got less than 3,000 views, Barack Obama social policy got less than 3,000, Economic policy of Barack Obama got 4,000, and Foreign policy of the Barack Obama administration got 3,500. Paul McCartney got 429,000 views while Paul McCartney's musical career got 2,000 views and Personal relationships of Paul McCartney got 7,000. To use your example, Cat got 406,000 views while Cat gap got 2,000, Cat genetics got 1,000, and Cat health got 3,000. These 100-to-1 or worse readership drop-off ratios are common, I've seen them across many time periods and article/subarticle combinations. So if there's something important about any of these topics, editors know it had better go in the main article, otherwise 99 percent of their readers will never see it. (I can't prove it, but I think readers mostly reach WP articles from search engines or from clicking from one main topic to another, and not from drilling down within a topic. It would be great to actually see a use study of how people reach and navigate WP.) As for making things simple to use and 'the encyclopaedia anyone can edit', that ship has already sailed. The unsuspecting reader who decides to click "Edit" for the first time is hit with a blizzard of inscrutable infobox templates and other markup. How article sizes are shown is the least of their problems ... Wasted Time R ( talk) 11:47, 8 July 2012 (UTC)
I did not say featured articles were full of trivia. I said the main problem about size was elsewhere and we really needed a simple size rule for where people were dumping huge amounts of trivia into Wikipedia. Very often they stick it into big tables which come out as zero size by prose text.
What I said about featured articles is that they should consider size as being a negative factor long before they get to the 100K. There was a justification for using prose size above on the basis that when they got to the limit with a wikitext limit they would be over it if they added a citation. There was no thought that if they have got to such a situation splitting should have been considered a long time earlier and an argument should have been given why more than 50k prose text without splitting was good.
Now you come up with this argument that one should cram as much as possible into a page because they often don't click down to subtopics. What on earth makes you think they read all the top level article anyway? Do you really think they are going to be sitting there for more than an hour reading the business? All that is happening is they get pages slower and Wikipedia wastes resources sending out stuff that people don't look at. Putting in more simply obscures the important bits.
If pages were smaller and loaded faster then they users would click more. Have a look at [6] for instance about what happens as page load time goes up. As it says even a 1 second delay decreases customer satisfaction by 16% and 40% abandon a website if it takes more than 3 seconds to load. Think gnat about attention span rather than cramming stuff in. Dmcq ( talk) 15:56, 8 July 2012 (UTC)
I agree with you that we don't know what happens when readers come to long articles. We don't know how often they just read the lead section and then leave, or how often they look at the table of contents and jump to a section that they are especially interested in, or how often they jump around different sections reading some and skimming others, or how often they read the whole thing through start to finish, or how often they read some now and come back to the article at some later time. We also don't know how often they get frustrated with the load time and abandon the whole thing at the outset. I'd love to see a usage study that shed some light on all this. But one thing that we do know is how often they click through to certain kinds of drill-down subarticles, and it's 1 percent or less of the time. Given that, authors will take their chances with longer articles. As for your belief that if page loads are faster in general, people will click through more, I haven't seen that myself. For example, Samuel Taylor Coleridge is 27 kB readable / 47kB markup / 152 kB load size and loads very quickly; it had 41,491 views last month. But Early life of Samuel Taylor Coleridge had only 362 page views, again a 100-to-1 type ratio. Wasted Time R ( talk) 23:16, 8 July 2012 (UTC)
Perhaps people aren't as interested in Coleridge's early life as that of Obama or Paul McCarty? Anyway there is quite a bit about his early life in the article instead of it just being a couple of paragraphs like the lead of the subtopic - perhaps they read enough there if they were interested? Another way of interpreting it also is that a large section about early life is always downloaded which 99 out of 100 readers aren't interested in. However what I do know is that people will go round a site much more if its response is quick and tat they don't like having a lot of stuff they're not interested in. How many people who read about Coleridge are really interested in his life rather than his poetry? Not a very high percentage is my guess and yet most of that article is about that. In fact I wonder how many read beyond the lead. Dmcq ( talk) 00:21, 9 July 2012 (UTC)

Note that the example given in a previous thread, Wikipedia talk:Article size#Images as part of the total download of an article to a browser, is 38MB (!!) worth of page load because of the hundreds of thumbnail images. Pages like that must be cut down by taking away the images or by splitting. Our guideline must recommend an upper limit for that kind of silliness. 1MB seems reasonable. Binksternet ( talk) 01:19, 9 July 2012 (UTC)

Yes there definitely does need to be some guidance about image size, I think it should be a separate section from the text rule of thumb though and that getting a proper handle on the text size would ameliorate much of the difficulty. Even now if you look at List of American Civil War Generals (Confederate) which was the article in question if you apply the prosesize script it says it only has 3135 bytes! That shows how useless the prose text size is for general use. The wikitext occupies 247kB and the file size 385 kB. This is an example where the file size to wikitext size is quite reasonable - I don't know why the ratio normally tends to be about 4 to 1. I still think it is too long but that ratio does make a case for also having a section on file size as an additional measure which could be used in special pleading to say a page isn't too long when the rule of thumb indicates it is, this is the same sort of status I'd give to prose text size. Dmcq ( talk) 08:35, 9 July 2012 (UTC)
I'm in favor of three plainly stated size limits: readable prose (requires a tool), HTML markup (seen at a glance), and page load (with image thumbnails; needs a new tool). Binksternet ( talk) 13:05, 9 July 2012 (UTC)
Actually it is wikitext that is seen at a glance, it gets expanded by templates and html tags inserted to something typically about four times the size though it may be a less than twice or can be more if lots of templates are used, that doesn't matter too much I don't think in this context though. I would typify the sizes as being good for
Page load with thumbnails (probably should not include cached javascript and css) for the total time and money overhead before a page is fully loaded. This would mainly be use din arguments that there were too many thumbnails or they were too large. We'd need good guidelines about that - currently I don't know any and WP:PERF just says not to worry about that, and WP:IMPROVING hasn't a thought in the world about it but does say ' The most important point is that now it is highly encouraged to talk, learn, and worry about performance issues before an article becomes a nightmare for admins to rescue.' which completely contradicts the other essay.
wikitext (possibly including transcluded pages which aren't templates) as a simple guide for saying pages are getting too long when dealing with problems like sports statistics. An alternative is if we can get the html file size for a standard environment easily.
prose text size for the featured article editors as an indicator of readability. If you can phrase that better please do.
Unfortunately most of the editors here seem to be arguing for what I would consider the least useful measure for general purposes as being the main rule of thumb measure. I just hope that doesn't become general knowledge or it will be exploited to death, we better get some other guidelines in before that happens. Dmcq ( talk) 16:10, 10 July 2012 (UTC)
Just use the 'Page length (in bytes)' as reported from the Page Information link on every page. Anything else is too obscure or complicated. Kaldari ( talk) 06:09, 7 December 2012 (UTC)

Measure for measure

Seems to me all three major ways of measuring size are relevant to different users. The majority of users are presumably readers who found their way via Web search and are ignorant of the article's topic. For those who use a desktop or large laptop screen and fast wired connection, quick comprehensibility is the main design consideration, which makes readable prose size the proper measure. For those using their mobile phone screen as I often do when merely reading, or a small tablet, small prose size is even more important for avoiding getting lost in an article with sections either too large or too numerous, and download size also becomes important for those of us with slow mobile radio connections.

Editors ignorant of the fine points discussed here, who will remain the majority of editors for an indefinite period, only know markup size, because that's what's in the watchlist entry. Those who edit on a small screen or slow connection are again even more interested in markup size. And of course many readers who use small mobile screens, including me often, will be reading one or another of the "Mobile Web" versions and sometimes the official Wikipedia mobile app or an unofficial one, most of which will present pictures with a smaller thumbnail than the "Desktop version" that the majority of deskbound readers use. So, yeah, all these methods of setting limits ought to be taken into consideration, but markup size is the only one the majority of editors will use until the others are as easily reported as that one. Jim.henderson ( talk) 03:13, 19 July 2012 (UTC)

I'd certainly like an easy way to measure the size of the images as well as he markup size. However I think the prose text size is a red herring herring as the size here is way beyond what is given in the readability issues, the section saying size is no reason to chop things out is also a misleading one. The prose text also gives no guideline on lists or tables or citation notes which are mini articles in themselves. The place seems to be dominated by people with an interest in cramming as much in as they can and can't or won't see the problems they cause. I view the guideline as unfit for any purpose at the moment whether it be for featured article assessment or for stopping people sticking in huge lists of wrestling bout results and every action of every monster in dungeons and dragons Dmcq ( talk) 08:22, 19 July 2012 (UTC)
Propose changes right now since you are addressing issues with "Content removal" section. -- George Ho ( talk) 14:25, 19 July 2012 (UTC)
I would have something like 'Size is not of itself a reason to remove content. List articles may be split arbitrarily. If a non-list article becomes very long without being able to be shortened by splitting off notable subtopics that is a strong indication that trivial details have been included. Consensus should be shown that excessive trivia are not included if an article is grown beyond the normal article size guidelines. It is normally easier not to split off subtopics whilst actively developing the basic structure of an article.' Not marvellous but perhaps a start for discussion. Dmcq ( talk) 16:00, 19 July 2012 (UTC)
Working off that, I think we should change "List articles may be split arbitrarily." to "List articles may be split arbitrarily, although if a logical split is viable that is preferred." I also think we should follow on from the splitting sentence at the end with "If an article is developed but too long, consider what information is more helpful to give an overview to the reader, in line with WP:Summary style, and move excess information to articles devoted to specific subtopics." This should hopefully encourage the shifting of content, rather than its simple removal. CMD ( talk) 16:44, 19 July 2012 (UTC)

Originally the only measure for size was the byte count. The words "readable prose" were introduced in 2004 to point out that tables, lists, and markup were not to be included, but did not adjust the suggested counts that involved. [7] Prior to that it was clear that the only count that was used was the byte count - see [8] and [9] to see that this article is 15 kb (that was before you could just click history to find out the current size. In fact the suggested sizes have increased, not decreased, while using a measure that gives a smaller size. This has compounded the problem of pages being too long. Apteva ( talk) 20:06, 19 September 2012 (UTC)

"Content removal" section

This section is now disputed because changes have been proposed and because information of subtopic in article dedicated to main topic may be either decent or excessive. To establish a straw poll, you can create a subheading below with a touch of RFC tag. -- George Ho ( talk) 16:52, 19 July 2012 (UTC)

RfCs aren't meant to be used as straw polls. Also, rather than creating a new section, perhaps we could keep the discussion in the section above? CMD ( talk) 17:31, 19 July 2012 (UTC)
Haven't you read the OP of above section? We can't make the above section to be another many things in one or change the subject. -- George Ho ( talk) 17:40, 19 July 2012 (UTC)
The above section is five posts long. It's hardly committed to a specific cause. CMD ( talk) 17:47, 19 July 2012 (UTC)
My impression was that summarizing a section would be a preferable way to deal with a long article, especially if splitting a section out would violate WP:what Wikipedia is not (or if that section is so unreferenced as to fail the notability guideline). Depending on who you ask, such "summarizing" would be seen as "removal". But regardless, I don't think this section is accurate, and I think it needs to be rephrased or removed. Shooterwalker ( talk) 03:20, 3 August 2012 (UTC)
Correct. It is not very well worded. In that context summarizing is removing. WP uses summary style, and expects that long sections will be split into sub articles leaving a summary paragraph. The section before this one, "Splitting an article" deals with that subject. The section in question deals with simply cutting out words to make an article shorter. I also fail to see the need for putting the template on the article. Just fix the wording and discuss it. This is not a major dispute that no one can agree on, which is what that template connotes. Apteva ( talk) 22:17, 19 September 2012 (UTC)

Origin of Marimba

Marimba is a musical instrument made and played by the Lozi people of the western province of Zambia — Preceding unsigned comment added by 101.119.24.76 ( talk) 01:35, 11 August 2012 (UTC)

Template:Size

Am I using Template:Size correctly, on the massively oversized List of historic places in Quebec? Seeing the equally massive logo it places atop, I'm unsure. Should this go on the Talk page? The template documentation is unclear, at least, to me. Shawn in Montreal ( talk) 15:42, 23 August 2012 (UTC)

Talk page, yes. Article, NO!! As for the template itself, documentation needs better, consistent explanation. -- George Ho ( talk) 16:34, 23 August 2012 (UTC)
Hmm. When I move the template to the Article talk page (which is otherwise empty) it reads zero bytes. So it's not registering the size the article, at all. I guess I shouldn't use the thing at all? Shawn in Montreal ( talk) 17:21, 23 August 2012 (UTC)
After much digging... the template you want is: Template:Very long (or one of the SeeAlso or Splitting templates listed in its docs).
We should probably list that template (and possibly some of the related templates) in this policy page? -- Quiddity ( talk) 21:05, 23 August 2012 (UTC)
Tangentially: These templates seem to be related, but are currently unused: {{ Size}}, {{ Pages}}, {{ Pages-size}}, {{ PageInfo}} - I'm not sure if we should merge them all into one, or delete them all, or what? -- Quiddity ( talk) 21:05, 23 August 2012 (UTC)
The first thing to do would be to fix {{ Size}} so that it works properly. It has a parameter that is supposed to select either a big (70px) or a small (35px) exclamation mark that is not working so you get the full 323 px.
{{#ifexpr:   <!---1---> {{PAGESIZE:{{FULLPAGENAME}}|R}} >= 102400 
|<!---2---> 
[[File:Ui Yellowexclamation.png| {{#ifeq: {{{big|no}}} | yes | 70px |35px}} 
link= Template:Longish]]
Apteva ( talk) 20:51, 19 September 2012 (UTC)

Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook