![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 |
Since there's a request to point out innacuracies, I'd like to point out that Czech and Slovak are separate languages, and should not be listed together. While similar, neither one of them qualifies as a dialect of the other, and they have evolved independently (see Slovak language#Relationships to other languages for a discussion on the topic). There is certainly more arguments for listing them separately than there are for listing Bulgarian and Macedonian separately, and unlike that example, both Czechs and Slovaks will agree that their languages are separate. Since I'm not familiar with the souces used for the article, I'd like to request that someone with knowledge on where to find reliable data separate the two. Thanks. -- Aram գուտանգ 8 July 2005 04:47 (UTC)
This is an argument that's come up repeatedly, and will continue to come up. We have two lists of languages by native speakers. Should we make one a list of languages by cultural identification, separating Czechoslovak and unifying Chinese, and the other a list of languages by the criterion of mutual intelligibility? That should make everyone happy. kwami 2005 July 9 17:51 (UTC)
This is a whole new low with regard to sociolinguistics on Wikipedia - that Czech and Slovak are not merely grouped together, but wholly equated. While trivially useful for some readers to have some sort of an overview of which foreign languages can be grouped together, and (I presume) a fun exercise for linguists, it's also out of touch with reality because it blatantly ignores the behaviour and thoughts of the people speaking those dialects. You simply cannot claim the high ground "genetically they're the same, so there!" and expect for people to just accept it. -- Joy [shallot] 9 July 2005 19:15 (UTC)
We have two contrary tendencies here: distinguishing languages genealogically, and distinguishing them culturally. This is going to continue to create conflicts, until we decide on one or the other - or create two lists. When I first saw this article, Chinese was listed as a single language, but there were half a dozen Italian "dialects" listed as separate languages. That's just silly: Italian has about the diversity of Cantonese. We should go one way or the other. I've tried to make the list somewhat more consistant, but of course haven't been able to do everything. If you don't like the direction I've gone, fine: Do something better. But let's at least make it internally consistant. kwami 20:02, 2005 July 9 (UTC)
A couple of points: 1) the Italian dialects are, according to Ethnologue, not even all that closely related to each other. Calling Sardinian an Italian dialect would appear to be technically inaccurate. And, according to Ethnologue, Piemontese, Lombardese, and so forth are closer to French and the Langue d'Oc than they are to Italian. As far as general standards, I'd suggest that the presumption should be that languages listed as separate languages by Ethnologue should be treated as separate languages. However, if languages listed by ethnologue as separate languages are often considered to be the same language, especially for political reasons, we should unify them. Thus, Eastern and Western Farsi, or Gheg and Tosk Albanian, should get unified despite being separate languages on Ethnologue. Probably this goes for Arabic as well, if only because differentiating the dialects is so difficult. This should perhaps also be done for some of the Hindi dialects like Awadhi or Haryanvi (but probably not Punjabi or the Bihar dialects). I'm not sure what should be done about Hindi and Urdu, but they should probably be separated out again, as well. Czech and Slovak should definitely be separated, because the languages are listed as separate on Ethnologue, and are not normally considered to be actually the same language. john k 20:16, 9 July 2005 (UTC)
Dear kwami,
I'm glad you're interested in foreign languages, including Slavic languages. It seems we share the same passion :-) It's nice to meet you.
But I need to say something less pleasant: in my opinion you are somewhat misinformed about Czech and Slovak.
Czech and Slovak share most of their vocabulary, but there are some significant differences between them in phonetics and phonology. Slovak grammar is almost identical to Polish grammar, while Czech grammar is quite different (partly due to peculiar phonetic shifts in Czech).
As regards mutual intelligibility, I think you are overlooking an important fact: it depends not only on the similarity of languages, but also on the exposure to the other language. I'll give you an example. I'm a Pole and I live in Poland. I've been in Slovakia many times since I was a kid. For 10 years I've been living 2 kilometres away from the Slovakian frontier. I often go to Slovakia to have a lunch. As a result, I can understand spoken Slovak, even if I never learned to speak it. I didn't have to learn to speak, because Slovaks I talked with could understand Polish. That's because Polish and Slovak are closely related and if one of them is your native language, then it's enough to figure out what the main phonetic differences are and learn a few different words to understand the other language. On the other hand, I can't understand Czech, because I rarely heard it. So what you need is just some contact with the other language and it will naturally become intelligible to you. That's what happened in Czechoslovakia. Czechs had a lot of contact with the Slovak language and Slovaks had a lot of contact with the Czech language. For example, both languages alternated on the same TV channel. That's why Czechs and Slovaks can easily understand each other. Slovaks who didn't have the opportunity to hear the Czech language often, for example because they lived outside of Czechoslovakia or they were born in the 90s, when there was no common state anymore, find Czech difficult to understand. Of course, they can understand many words, because the two languages are very similar. But they have problems in understanding spoken Czech on TV, etc.
The dialectal differentiation is also relevant to mutual intelligibility. Western Slovak dialects are very close to Czech, so they are mutually intelligible with Czech and Moravian dialects. Central Slovak dialects are less similar to Czech and they share many common features with Polish and Slovene (but still Czech is the closest language). Eastern Slovak dialects are more similar to Polish than to literary Czech. The standard Slovak language is primarily based on the central dialects.
You wrote: I don't speak much, but I picked up a little Slovak while in that country. When I went to Czechia, all of a sudden I was speaking Czech. The words coming out of my mouth hadn't changed, only the country I was saying them in had.
Similarly, you could wrote: I don't speak much, but I picked up a little Spanish while in that country. When I went to Italy, all of a sudden I was speaking Italian. The words coming out of my mouth hadn't changed, only the country I was saying them in had. Spanish and Italian are similar languages. If you use Spanish in Italy, you'll probably be understood (all the more so if the matter is simple and the subject of message can be guessed from the context). Your interlocutors may even think you are awkwardly trying to speak Italian. Of course, they won't say your Italian is awkward. They'll probably be much more polite. Actually, I had that experience 7 days ago when I was going by train to Bratislava. I had a conversation with a Slovak girl. I spoke an awful mixture of Slovene and Polish with some Slovak words thrown in. I explained that I couldn't really speak Slovak, but I can understand it. She replied, "but you speak Slovak very well"! I said, "oh, no..." but she insisted, "you really speak very well." I said "I can't believe it..." and then she said to another passenger, "he speaks very well, doesn't he?" and the other passenger confirmed. I felt a bit embarassed and I didn't explain I hadn't even spoken Slovak :-)
I'm astonished that some people you talked with claimed that Czech and Slovak are dialects of the same language. I never met such people. 160 years ago this opinion could be justified, but now that there are two independent standard languages, for a long time officially recognized as separate, it seems anachronistic. Anyway, even if some people share this view (I don't think they are many), I'm sure no linguist claims that Czech and Slovak are the same language. And this is important, because I think Wikipedia is meant to be a source of scientific information.
To sum up, Czech and Slovak are very closely related (they are the closest relatives), but they are definitely separate languages. This is quite clear to linguists, even if the question how to distinguish between separate languages and different dialects of the same language was not satisfactorily answered by linguistics (the related discussion is interestingly presented by Einar Haugen in his article "Dialect, Language, Nation", American Anthropologist, vol. 68 (1966), pp. 922-935; reprinted in: "Sociolinguistics", ed. by J. B. Pride and J. Holmes, Penguin, 1972).
Kwami, I wish you much success in learning foreign languages.
Best regards. Boraczek 23:46, 12 August 2005 (UTC)
So are we ranking by Native speakers or Total speakers? If it's by total, why does it say otherwise in the first sentence of the page? If it is by Native, then what is this page: List of the most spoken native languages ? Please clarify.-- Zereshk 8 July 2005 22:32 (UTC)
If it is by total, then we should start re-ranking the page accordingly. Right now, it's ranked by natives. I'll be woking on that.-- Zereshk 9 July 2005 12:02 (UTC)
Im fine one way or the other. But I dont think getting estimates of 2nd language speakers would be too difficult. The list can be according to any predefined definition. In any case, all we must do is define what we want to be listed here, and stick to it.-- Zereshk 9 July 2005 18:39 (UTC)
I tend to think Chinese and Czech/Slovak should both be broken up. I'd suggest that only in cases like Arabic, where it's not only a complicated question of whether it's one or several languages, but also hard to figure out how exactly to divide it up into separate languages, that we should keep them together. I'd also suggest just redirecting the other page to here, and explaining that this page is listing native speakers. In terms of 2nd language speakers, one problem is how to define "2nd language" - is it anybody who has any knowledge of the language at all? Or is it more specific than that? Is a Dane who speaks some English because he studied it in school the same as a Yoruba who speaks English as a second language and has to use it in everyday communication? Better to avoid the whole question, I think. john k 9 July 2005 19:01 (UTC)
I've added a column for language families - so far, I've mostly only had a chance to add in the broadest families - Indo-European, Austronesian, &c., but hopefully we can add in the more specific branches over the next few days. I think this should be useful, especially for the less well known languages which we don't have specific articles about. As far as I can tell, 17 18 language families (Indo-European, Uralic, Altaic, Afro-Asiatic, Niger-Congo, Nilo-Saharan, Sino-Tibetan, Dravidian, Tai-Kadai, Hmong-Mien, Austro-Asiatic, Austronesian, Japonic, Quechuan, Aymaran, Uto-Aztecan, and Mayan, and Tupian) and 1 language isolate (Korean) are represented among the languages with more than one million speakers.
john
k 9 July 2005 01:30 (UTC)
The language called "Persian" is known internationally and domestically within Iran as "Farsi."
The CIA, in the World Factbook have a list of the most used native languages:
note: percents are for "first language" speakers only See reference
Internet World Stats has also a list of the people able to speak each language, including second languages (is ordered by internet users thought)
The english numbers seem a bit high compared with other reports, but they claim to have accurate data. :? -- Bisho 15:33, 14 July 2005 (UTC)
Okay, despite all the talk, no one's fixed this article. I'm reverting to the last version to define language by speaker identification (keeping Akira's edits), since most people feel that intelligibility tests are unworkable. I'm the one who added the Chinese "dialects" in the first place, and I don't have any problem removing my own additions to this article.
By all means, please add the individual Chinese "dialects" back in if you like, but keep the main heading and make a note under it. I might do that myself. No need to go to a lot of work; the info is all in the page history from when I added it the first time. And put back Malay, Czechoslovak, and Serbocroatian back in if you like, as additional info - it's all there in the page history.
If our conception of language is to be cultural or self identification, then we shouldn't mix in intelligibility tests, unless it's added as additional information, and cross referenced. We need some sort of consistency in an encyclopedia article, not just whatever feels right for everyone's favorite language. kwami 22:40, 2005 July 14 (UTC)
I say group Chinese, Arabic (except Maltese and other separated Arabic languages) add a linguistics note in these cases, Group Swiss German to the rest of German. Split Malay/Indonesia, Hindu/Urdu, etc. - Pedro 23:37, 14 July 2005 (UTC)
Well, I have an idea....why not, we group each disputed language under its most commonly referred by name, eg. Chinese for the "chinese" languages and German for the languages spoken in germany, austria, most of switzerland and then provide a sub-division under which one can see the number of speakers per "dialect"/sub-language? That would make naviagation a breeze too! Kenkoo1987 12:37, 6 August 2005 (UTC)
Turkish language has much more total speakers than it is written on this page. Azeri, Kyrgyz, Kazakh, Uzbek, Turkmen and other Turkic languages have only dialectic differences from Turkish. And they are called with Turkish like Azeri Turkish or Kyrgyz Turkish instead of Azeri or Kyrgyz. So, the total number of Turkish speakers is 165,61 million according to this page and in fact it is almost 250 million with the Turks living as minorities all around the world and living in autonomous Turk regions especially in Russian Federation and China.
The Ethnologue figure of 46.28M native speakers in Turkey is just the 1987 population times 85%, based on a guess that 15% of the population is Kurdish. Actually, that figure is now generally considered an underestimate, and the Kurdish population could be as high be 25-30%. However, the Kurdish article and most of the estimates I've seen place it at approximately 20%. Given Turkey's current population has increased dramatically to 69.66M, that would be 56 million native speakers today. The population of Bulgaria has descreased dramatically to 7.45M; at 9.4% ethnic Turk (mostly Turkish speaking), that's a further 700k. Greece: emmigration offsets population growth, so perhaps still ~130k. Cyprus: N. Cyprus population is now 210k. Macedonia: 1982 figure 200k; don't know about now. Uzbekistan: Population increased dramatically from 1979. At current growth rates, calculating back from 1993 (from the demographics chart for Uzbekistan), the 1979 population was ~17.2M. Assuming the same percentage today, that gives ~300k Osmanli speakers. Germany: 2.1M. Netherlands: 200k. France: 140k. All other countries are I believe < 100k, though Moldova has 140k Gagauz speakers. Total: 60.0 million, plus Gagauz. In any case, our major uncertainty is the %age in Turkey: if the Kurds are just a bit more numerous in Turkey, that would offset Turkish-speaking immigrants in the rest of the world. We could maybe guess it's 61 million? I'm putting in 60M native, and assuming 2nd speakers are basically the 14M Kurdish population, ~75M total. kwami 05:39, 29 September 2005 (UTC)
Hi, I read what you have written, I have some words to say;
(0) Kurdish issue is not easy to figure out as kwami put forward it. Since they never have had a country all these numbers are ambigious. Even if the numbers are correct it does not imply the number of native Kurdish speakers. So many Kurdish people in Turkey do not know a word of Kurdish. It is also not easy to claim that Kurdish is those people's native language. Since there is no institution in Turkey that gives Kurdish lessons it is not that plausible that Kurdish people, at least the ones in Turkey, speaks Kurdish properly. That is to say, as the article starts "This is a list of languages ordered by number of first-language speakers" Turkish is the native/first language of so many Kurds in Turkey -sad but true! Therefore, population analysis is pointless... What we need is qualitative analysis instead of quantitative ones.
(1) Osmanli is not Turkish. Osmanli is a dead language; a strange combination of Turkish Arabic and Persian. Cannot be considered as Turkish. I can suplly you documents. As a native speaker of Turkish it is almost impossible for me/us to read a text in Ottoman. Remove the word Osmanli!
(2) I am reluctant to consider Kyrgyz, Kazakh, and Uzbek languages as Turkish. I can understand Azeri and Turkmen languages. Others are certainly Turkic languages but not Turkish! For instance, there are great similarities between Spanish and Portuguese: even 2 native speakers of P&S are so close to understand each other when they speak but Portuguese is not Spanish and vice versa. But I do not know if we can take it as a criterion, because Danes also understand what Sweeds say :-)))
(2.1) Uzbek, this is an example that I obtained from Uzbek_language page: "Barcha odamlar erkin, qadr-qimmat va huquqlarda teng bo'lib tug'iladilar. Ular aql va vijdon sohibidirlar va bir-birlari ila birodarlarcha muomala qilishlari zarur." If is there any Turk out there who claims that he/she can understand this text, then OK lets add Uzbek language to the list as Turkish, but It is almost impossible to understand. There are only 2 or maybe 3 words that I catch, not more! Native speakers of Turkish, like I did, may check http://uz.wikipedia.org/wiki/Main_Page try to read the articles, Can you? I admit that there so many similarities but to be honest I couldn't read those articles.
(2.2) Turkmen_language is very close to Turkish and according to my account, it is convinient to add it to our list. I did check wikipedia's Turkmen edition and yes I can read it.
(2.3) When it comes to Kyrgyz and Kazakh languages, I really do not know much about them... Since they are not using the latin letters it is at least impossible for us to read what they write.
(2.4) Yakuts people, 363,000 speakers (according to wikipedia), is forgotten. I, partially, understand when those people speak in Yakut (Sakha language). I shall not claim that I do understand it as good as I understand Azeri but at least it is worth to consider them as well. I posted a question to sakha language page. Lets see if we can communicate in Turkish.
(3) Turkish people all around the world must be taken into account as well, especially in Europe/Germany.
(4) Is it really a offical language in Bulgaria? I dont think so!
(5) It is not Cyprus! It must be at least written as Northen Cyprus.
(6) Turkish is the first language in Turkey and Northen Cyprus.
(7) Briefly (numbers are taken from wikipedia);
Turkey --> 70M (-10M is accepted owning to Kurds)
Azerbaijan --> 22M (8M in Azerbaijan and 16M in Iran)
Turkmen --> 5,4M (It is 6,4, -1M due to Turkmens in Turkey)
Germany --> 2M (We need to check that, I am not sure)
Bulgaria --> 0.7M (Hard to say if Turkish is the their first language)
Northen Cyprus --> 0.2M
All over --> 100,3M (Without Kyrgyz and Kazakh)
Turkish as first language (lets drop Bulgaria and Kurds) = 89,6M janus_tr 05:00, 19 October 2005 (GMT+1)
Good. After each message, at least, we progress. [But please do not write about politics then these pages are getting really crepy -it is the weak side of wikipedia] Anyway...
Kurdish dispute: We need a reference point; I checked the other wiki articles neither a source nor a citiation... You say %20, I say %30, X says %60... Today, I checked Britannica there it states; "...The largest minority group is the Kurds, who probably make up at least 15 percent of the population." Kurds. (2005). Britannica Student Encyclopedia. Retrieved October 21, 2005, from Encyclopædia Britannica Online http://search.eb.com/ebi/article-9275335 Even Britannica uses the word "probably", the very reason behind it very simple. There is no scientific work on that. Hope it happens one day: but at least we, I think, can use it. Therefore, (70M x 15) / 100= 10.5M What do you say?
Osmanli, Turkish, Anatalion Turkish dilemma: Officially it is called "Turkish" and the "dialect of Istanbul" (that I speak) is regarded as its core. Therefore, if you would like to say something different than Turkish, you may write Turkish (Istanbul). Anatolian Turkish is just a dialect. For instance near the black sea region they speak Karadeniz dialect and at the west coast with a different dialect whereas "Osmanli" is misleading and wrong. Osmanli is out of my scope because it looks like Azeri people could/may understand it. You can easliy understand if you ever try to read in Osmanli :-)) In Turkey, since 1932 -Türk Dil Kurumu, Turkish Language Association, has been regulating the language. According to this governmental association what proper is the Istanbulise (I don't how to spell) Turkish; in France they have the same system. Seen thus, anatolian turkish is deceptive as well. Lets right "Turkish" and since the istanbulise turkish is, officially, Turkish we may put a note for that and may lead the readers to www.tdk.gov.tr.
Lets keep Oguz population seperate, I am OK with that.
Until we have response from Yakut people, lets take them aside.
No reliable knowledge about Bulgaria... What is that suppose to us national or offical language?! strangeeeeee
Should we list all the Oghuz family, then we have to consider http://en.wikipedia.org/wiki/Tatar_language as well. I have a lot fo diffuculties to figure it out but seems closer than Uzbek. If you want you can add all these middle-asia oriented turkic languages to the list.
My conclusion:
Total number = 112,40M : janus 05:43 (GMT+1), October 21, 2005
I will just give a quote from the Constitution of the Republic of Bulgaria (here):
Article 3
Bulgarian shall be the official language of the Republic.
I hope this ends all confusion about the matter. -- Mégara (Мегъра) - D. Mavrov 17:05, 24 April 2006 (UTC)
Thought the New Kypchak language article was interesting. Of course, it's nowhere near reality, but it does show what the conception of a broader standardized Turkic language could be: in this case, Kazakh, Kyrgyz, Tatar, etc, but not Anatolian Turkish or Uzbek. I think we're pretty safe in assuming that Turkic as a whole need not be considered as a language. kwami 10:46, 3 November 2005 (UTC)
I checked the article, as you mentioned "it's nowhere near reality". There'll be so many projects like this one; time shall show us which one(s) shall prevail.
janus 04:00, 5 November 2005 (UTC)
Someone just revised the Korean population upward to 71M, but left no ref. However, this looks about right: S Korea 48.4, N Korea 18-20 (officially 23, not considering the famine), China 1.9 (probably not counting recent refugees), USA 1.8, Japan 0.7, Canada and Australia together 0.1. (Few Russian Koreans still speak the language.) This gives us 70.9 million using the lower estimate for North Korea. Perhaps a million or so more wouldn't be unreasonable, but I don't know how the Wikipedia article gets 78. kwami 06:44, 2005 July 19 (UTC)
Why is there only 20 M Thai native speaker while the population is 67 M right now. ALthough there are several dialect in Thailand right now. But everyone can use the standard Thai including old people and adolescents.
It is absurd to separate Bosnian and Serbian. Both the written and spoken languages are, as far as I am aware, virtually identical, and about half of the population of Bosnia are Serbs, who would be surprised to learn that they do not speak Serbian. I'd suggest that, given this confusion, we should merge Serbo-Croatian back together into a single language. john k 05:42, 2 August 2005 (UTC)
I just split up the table both for ease of navigation and for ease of editing. As for the numbers I picked, it's a logarithmic scale: languages with 106 (1 million) speakers, 106.5 (~3 million) speakers, 107 (10 million) speakers, 107.5 (~30 million) speakers, 108 (100 million) speakers or more. That way there are similar numbers of entries in each table, though the first is rather shorter and the last somewhat longer than the others. Anyway, that's why the 3 and 30 are there, in case anyone thinks they're odd numbers to use. kwami 08:55, 2005 August 2 (UTC)
There is no "Bajar" language listed in Ethnologue for Malaysia or Indonesia. There are too many speakers for it to be Bajaw, and Banjar should be included in Malay. Any ideas? If not, we should probably delete this. kwami 10:41, 2005 August 2 (UTC)
Why is Maithili split out, but the other Bihari languages (e.g. Bhojpuri) are included in Hindi? I would suggest that the Bihari and Rajasthani languages are perhaps distinct enough, and considered distinct enough, from Hindi to warrant not being included. This in contrast to, say, Awadhi, which is usually considered a dialect. john k 06:08, 4 August 2005 (UTC)
Would someone skilled in m:EasyTimeline be willing to make a chart of these? – Quadell ( talk) ( sleuth) 13:43, August 5, 2005 (UTC)
It says "Indo-European, Slavic, deposed and executed 1314"
What does that mean?
Should we lump all the Berber languages together, as we have with Karen, Chinese, etc.? Just a thought -- kwami
We seem to have come to a general consensus on most things here. I've also verified languages down to 2.3 million speakers with Ethnologue 15, and marked those that need further confirmation (because E does not give figures, etc.) (Basically, all those data with the word "million" in them have been confirmed this way.) So, what do people think, remove the warning, or replace it with a general warning that some data is dated, and that the definitions of many languages is fuzzy? kwami 08:20, 2005 August 9 (UTC)
Yes, that's true. I didn't mean edit war, so much as edit conflict - I was assuming that it was accidental. I'm beginning to think the Berber languages should be combined. john k 23:37, 11 August 2005 (UTC)
Yes, 1% of the population according to Ethnologue 15, presumably expats.
I've been removing 'significant communities' if the language is not native to the country and is less than 1% the population of the country per Ethnologue 15. So far I've covered America, Europe, Oceania. kwami 06:43, 19 September 2005 (UTC)
I haven't even attempted to keep track of all the 'significant communities in' entries that people have been adding. I think a large part of our problem is the title of this article. People read "List of languages by total speakers" and foolishly believe that it is a list of languages by total speakers.
Wanna move this to List of languages by native speaking population or List of languages by number of native speakers (which currently redirects here)? We should probably also rename List of languages by total native speakers and link it to this article as an example of the problems involved. kwami 07:11, 2005 August 13 (UTC)
This article has been renamed after the result of a move request. I have renamed this list of languages by number of native speakers as per the request. I did not do anything with the similarly named list of languages by total native speakers as it is not clear from the above what, if anything, you would want done. Dragons flight 23:18, August 22, 2005 (UTC)
Sylheti should be incorporated into Bengali, shouldn't it? john k 04:55, 17 August 2005 (UTC)
Is Farsi there? I searched for Iran and didn't find anything.
These three chinese languages are the ONLY ones without numbers shown. Really I don't think it matters how unrecent the data is if you put a clear date. All of these languages have numbers on their own specific pages, and should be listed here as well. Frencheneesz 13:36, 2005 August 18 (whats UTC?)
I think the number (46 million) in the article page needs to be edited to 77 million to match the country population, or it might be less since there are few minorities who speake other languages.
The sister article List of languages by total native speakers is a compilation of published lists, such as the CIA and Ethnologue, useful as a source of data. I've suggested renaming it to better reflect its contents on its talk page, but there's been no response. How do people here feel about renaming it, and what would be a good name? "Language population data" maybe? kwami 18:55, 2005 August 30 (UTC)
We currently say that a "significant" presence of a language in a country is 1% of the population. However, that is not what we actually have. There are many languages in India and China that would need to be taken off the list, because they aren't official and are spoken by less than 10-13 million people. It would also be weird to have a language only listed for Burma, when the main population is in China, because it makes up more than 1% of the Burmese population but less than 1% of the Chinese population. So obviously this 1% thing isn't going to work if we take it literally. Or is it only supposed to apply to immigrant languages?
Should we do that, and explicitly say 'immigrant languages'? Or do we want some other criterion? (Someone just added Urdu to the US, and there's no way there are 3 million Urdu speakers there.)
kwami 02:02, 2005 August 31 (UTC)
Ok. 1 % is a figure that makes a language important, but we also know that 1 Million is an important number for a language to survive. So there you go: 1% or 1 M.... -
Pedro
19:45, 2 September 2005 (UTC)
Where is Welsh on this Page?
Shouldn't Urdu and Hindi be listed together? Especially if Chinese is listed as one language. Sukh | ਸੁਖ | Talk 20:04, 2 September 2005 (UTC)
Urdu and Hindi are not separate languages and are only considered to be separate languages by rhetoricians who wish to distance Pakistanis from Indians. I would know, I speak them...both? The pronounciation is exactly the same. Each is perfectly understood by speakers of the other.
They are simply written in a different script, and recently religious and socially motivated individuals have sought to bring Sanskrit vocabulary to "Hindi" and Farsi vocabulary to "Urdu" in an attempt to create the impression that speakers are distinct ethnic groups. This is simply not true.
Surely there are enough native Swahili speakers to make this list.
In December 2004 the China Daily released an article describing a survey about how many Mandarin speakers there actually are in China. It turned out 18% spoke Mandarin at home, 42% spoke it at work or school, and 53% could speak it. Since 18% of China's population is about 235 million people, it follows that Mandarin should be placed behind Hindi, English, and Spanish.
We have an edit war going on here, which I think we should resolve here on this discussion page rather than simply reverting each other.
This article is based primarily on Ethnologue 15. Now, Ethnologue is hardly the most reliable resource, and I think we're all aware that we can do better, but at least it provides a modicum of stability to a contentious field. We have changed several languages from what Ethnologue has published, but these changes have been discussed here so that there is basic agreement to the changes.
According to Ethnologue, there are 4.5 million ethnic Assyrians. However, most of these people speak Persian or Arabic as their native/home language. Some may speak some Assyrian as a second language, but that's not what this article is about.
Again, according to Ethnologue, Assyrian has 210,000 native/home speakers out of this ethnic population of 4.5 million. (That is, about 4% of the ethnic population.) There is a similar number of Chaldean speakers, and lesser numbers of other idioms such as Turoyo. If these people consider that they all speak the same language, then they should all be lumped together. However, all the Aramaic languages/dialects total only ~ 534,000. Since the cutoff point for this article is 1 million, Aramaic/Assyrian/Chaldean/Syriac does not make the cut even if it is considered a single language.
Assyria 90, if you have evidence that the number of speakers is greater than Ethnologue reports, please present that information here. However, I personally will consider any unsubstantiated attempt to list 4.5 million speakers of Assyrian as politically motivated, and will revert it: Mark isn't the only one doing this! If you wish to convince people of your claims, please provide something more than your personal say-so. We can't take seriously the claim that you personally know all 4.5 million Assyrians and have verified their native language. kwami 23:43, 14 September 2005 (UTC)
At my screen the rightmost column on Chinese spans on a page and a half (!). At the same time, we have a second column about the language family. It's as wide. I suggest to remove that column. If I were to look for statistical info about number of speakers, most likely I wouldn't give a buck about the family of the language. And if I did, I would just click the link for the lang and see. Currently the family column takes up important space. If we remove it, the list will become about two times shorter (at 1024x768). Does anybody disagree? (I'm User:logixoul, I'm having problems with my cookies at the moment) -- 85.130.99.211 20:26, 19 September 2005 (UTC)
Recalculated Portuguese. The main discrepancy is that no one has given a source for the claim that 60% of Angolans are native Portuguese speakers. Since 99.5% of Angolans speak some other language as their native tongue, I find the figure doubtful. Here's my calc: Angola 52k, Cabo Verde 15k, Mozambique 30k, Sao Tome 2.5k, South Africa 617k, India 250k, Macao 2k, Paraguay 636k, Luxembourg 100k, France 750k, Switzerland 86k, Andorra 2k. That's just under 2.6 million, which just offsets the 2.6 million Brazilians who do not speak Portuguese as their native tongue. So, Brazil 186.1M, no adjustment; Portugal 10.6M, less 0.5M non-native, and you get 196M. This agrees with the WA 2005 figure of 195M (which is rounded off to the nearest 5M).
Bengali also is listed with 196M. However, the bulk of that figure is now ten years old. Given that Bangladesh has a high growth rate, it should be well over 196M by now, and therefore I've ranked Bengali ahead of Portuguese. kwami 09:33, 20 September 2005 (UTC)
in 1983: native Portuguese speakers: 60% (of 100%) native Portuguese speakers capable of speaking an African language: 50% that gives a max. of 70% speaking African languages. In the capital of Angola, Luanda: 75% are native speakers of Portuguese.
You really dont know the country believing that 99.5% of Angolans speak an African language. So before reverting anything, or saying that you dont know any source. And you've asked me once about the data, but you simply ignored.
Here's a link: http://www.linguaportuguesa.ufrn.br/pt_3.4.a.php - Pedro 23:04, 26 September 2005 (UTC)
I've a problem connecting to INE, the site is http://www.ine.gov.mz/ (maybe they are down) you may link on censo de 1997 and search "língua", or something...
Embora means several things, in that case it means "meanwhile", "moradores" (plural), "morador" (singular) is a person that lives in a given place, in this case, Angola. The most common word for countries is in fact "habitantes" (inhabitants).-- Pedro 20:36, 27 September 2005 (UTC)
Okay, verified all languages for all countries, per Ethnologue 15, and took out anything less than 1%, with a few exceptions: languages native to one country are listed there regardless; languages slightly under 1% in a second country may be listed if that number is a third or more of the total speakers; and a couple judgment calls like leaving Korean in Japan, since they are the most significant minority in mainland Japan even if somewhat under 1%; and Portuguese in Namibia, even though I don't have figures, because there are significant numbers in both Angola and South Africa. I suggest reverting any attempts to add additional countries (like Mongolia for Russian, or the US for Panjabi) unless these additions are supported.
However, sometimes Ethnologue just states that an immigrant language is found in a country without giving figures. Usually this means the numbers are small, but not always, so I may have deleted a few countries I shouldn't have. In other cases, as with Philippino emmigrants and Ivory Coast immigrants, numbers are given for nationality but not language. In a couple of these cases I left a country in with a question mark. kwami 13:19, 22 September 2005 (UTC)
Should the Akan languages be unified? Currently Baoule, Anyi, and Brong have separate entries. kwami
![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 |
Since there's a request to point out innacuracies, I'd like to point out that Czech and Slovak are separate languages, and should not be listed together. While similar, neither one of them qualifies as a dialect of the other, and they have evolved independently (see Slovak language#Relationships to other languages for a discussion on the topic). There is certainly more arguments for listing them separately than there are for listing Bulgarian and Macedonian separately, and unlike that example, both Czechs and Slovaks will agree that their languages are separate. Since I'm not familiar with the souces used for the article, I'd like to request that someone with knowledge on where to find reliable data separate the two. Thanks. -- Aram գուտանգ 8 July 2005 04:47 (UTC)
This is an argument that's come up repeatedly, and will continue to come up. We have two lists of languages by native speakers. Should we make one a list of languages by cultural identification, separating Czechoslovak and unifying Chinese, and the other a list of languages by the criterion of mutual intelligibility? That should make everyone happy. kwami 2005 July 9 17:51 (UTC)
This is a whole new low with regard to sociolinguistics on Wikipedia - that Czech and Slovak are not merely grouped together, but wholly equated. While trivially useful for some readers to have some sort of an overview of which foreign languages can be grouped together, and (I presume) a fun exercise for linguists, it's also out of touch with reality because it blatantly ignores the behaviour and thoughts of the people speaking those dialects. You simply cannot claim the high ground "genetically they're the same, so there!" and expect for people to just accept it. -- Joy [shallot] 9 July 2005 19:15 (UTC)
We have two contrary tendencies here: distinguishing languages genealogically, and distinguishing them culturally. This is going to continue to create conflicts, until we decide on one or the other - or create two lists. When I first saw this article, Chinese was listed as a single language, but there were half a dozen Italian "dialects" listed as separate languages. That's just silly: Italian has about the diversity of Cantonese. We should go one way or the other. I've tried to make the list somewhat more consistant, but of course haven't been able to do everything. If you don't like the direction I've gone, fine: Do something better. But let's at least make it internally consistant. kwami 20:02, 2005 July 9 (UTC)
A couple of points: 1) the Italian dialects are, according to Ethnologue, not even all that closely related to each other. Calling Sardinian an Italian dialect would appear to be technically inaccurate. And, according to Ethnologue, Piemontese, Lombardese, and so forth are closer to French and the Langue d'Oc than they are to Italian. As far as general standards, I'd suggest that the presumption should be that languages listed as separate languages by Ethnologue should be treated as separate languages. However, if languages listed by ethnologue as separate languages are often considered to be the same language, especially for political reasons, we should unify them. Thus, Eastern and Western Farsi, or Gheg and Tosk Albanian, should get unified despite being separate languages on Ethnologue. Probably this goes for Arabic as well, if only because differentiating the dialects is so difficult. This should perhaps also be done for some of the Hindi dialects like Awadhi or Haryanvi (but probably not Punjabi or the Bihar dialects). I'm not sure what should be done about Hindi and Urdu, but they should probably be separated out again, as well. Czech and Slovak should definitely be separated, because the languages are listed as separate on Ethnologue, and are not normally considered to be actually the same language. john k 20:16, 9 July 2005 (UTC)
Dear kwami,
I'm glad you're interested in foreign languages, including Slavic languages. It seems we share the same passion :-) It's nice to meet you.
But I need to say something less pleasant: in my opinion you are somewhat misinformed about Czech and Slovak.
Czech and Slovak share most of their vocabulary, but there are some significant differences between them in phonetics and phonology. Slovak grammar is almost identical to Polish grammar, while Czech grammar is quite different (partly due to peculiar phonetic shifts in Czech).
As regards mutual intelligibility, I think you are overlooking an important fact: it depends not only on the similarity of languages, but also on the exposure to the other language. I'll give you an example. I'm a Pole and I live in Poland. I've been in Slovakia many times since I was a kid. For 10 years I've been living 2 kilometres away from the Slovakian frontier. I often go to Slovakia to have a lunch. As a result, I can understand spoken Slovak, even if I never learned to speak it. I didn't have to learn to speak, because Slovaks I talked with could understand Polish. That's because Polish and Slovak are closely related and if one of them is your native language, then it's enough to figure out what the main phonetic differences are and learn a few different words to understand the other language. On the other hand, I can't understand Czech, because I rarely heard it. So what you need is just some contact with the other language and it will naturally become intelligible to you. That's what happened in Czechoslovakia. Czechs had a lot of contact with the Slovak language and Slovaks had a lot of contact with the Czech language. For example, both languages alternated on the same TV channel. That's why Czechs and Slovaks can easily understand each other. Slovaks who didn't have the opportunity to hear the Czech language often, for example because they lived outside of Czechoslovakia or they were born in the 90s, when there was no common state anymore, find Czech difficult to understand. Of course, they can understand many words, because the two languages are very similar. But they have problems in understanding spoken Czech on TV, etc.
The dialectal differentiation is also relevant to mutual intelligibility. Western Slovak dialects are very close to Czech, so they are mutually intelligible with Czech and Moravian dialects. Central Slovak dialects are less similar to Czech and they share many common features with Polish and Slovene (but still Czech is the closest language). Eastern Slovak dialects are more similar to Polish than to literary Czech. The standard Slovak language is primarily based on the central dialects.
You wrote: I don't speak much, but I picked up a little Slovak while in that country. When I went to Czechia, all of a sudden I was speaking Czech. The words coming out of my mouth hadn't changed, only the country I was saying them in had.
Similarly, you could wrote: I don't speak much, but I picked up a little Spanish while in that country. When I went to Italy, all of a sudden I was speaking Italian. The words coming out of my mouth hadn't changed, only the country I was saying them in had. Spanish and Italian are similar languages. If you use Spanish in Italy, you'll probably be understood (all the more so if the matter is simple and the subject of message can be guessed from the context). Your interlocutors may even think you are awkwardly trying to speak Italian. Of course, they won't say your Italian is awkward. They'll probably be much more polite. Actually, I had that experience 7 days ago when I was going by train to Bratislava. I had a conversation with a Slovak girl. I spoke an awful mixture of Slovene and Polish with some Slovak words thrown in. I explained that I couldn't really speak Slovak, but I can understand it. She replied, "but you speak Slovak very well"! I said, "oh, no..." but she insisted, "you really speak very well." I said "I can't believe it..." and then she said to another passenger, "he speaks very well, doesn't he?" and the other passenger confirmed. I felt a bit embarassed and I didn't explain I hadn't even spoken Slovak :-)
I'm astonished that some people you talked with claimed that Czech and Slovak are dialects of the same language. I never met such people. 160 years ago this opinion could be justified, but now that there are two independent standard languages, for a long time officially recognized as separate, it seems anachronistic. Anyway, even if some people share this view (I don't think they are many), I'm sure no linguist claims that Czech and Slovak are the same language. And this is important, because I think Wikipedia is meant to be a source of scientific information.
To sum up, Czech and Slovak are very closely related (they are the closest relatives), but they are definitely separate languages. This is quite clear to linguists, even if the question how to distinguish between separate languages and different dialects of the same language was not satisfactorily answered by linguistics (the related discussion is interestingly presented by Einar Haugen in his article "Dialect, Language, Nation", American Anthropologist, vol. 68 (1966), pp. 922-935; reprinted in: "Sociolinguistics", ed. by J. B. Pride and J. Holmes, Penguin, 1972).
Kwami, I wish you much success in learning foreign languages.
Best regards. Boraczek 23:46, 12 August 2005 (UTC)
So are we ranking by Native speakers or Total speakers? If it's by total, why does it say otherwise in the first sentence of the page? If it is by Native, then what is this page: List of the most spoken native languages ? Please clarify.-- Zereshk 8 July 2005 22:32 (UTC)
If it is by total, then we should start re-ranking the page accordingly. Right now, it's ranked by natives. I'll be woking on that.-- Zereshk 9 July 2005 12:02 (UTC)
Im fine one way or the other. But I dont think getting estimates of 2nd language speakers would be too difficult. The list can be according to any predefined definition. In any case, all we must do is define what we want to be listed here, and stick to it.-- Zereshk 9 July 2005 18:39 (UTC)
I tend to think Chinese and Czech/Slovak should both be broken up. I'd suggest that only in cases like Arabic, where it's not only a complicated question of whether it's one or several languages, but also hard to figure out how exactly to divide it up into separate languages, that we should keep them together. I'd also suggest just redirecting the other page to here, and explaining that this page is listing native speakers. In terms of 2nd language speakers, one problem is how to define "2nd language" - is it anybody who has any knowledge of the language at all? Or is it more specific than that? Is a Dane who speaks some English because he studied it in school the same as a Yoruba who speaks English as a second language and has to use it in everyday communication? Better to avoid the whole question, I think. john k 9 July 2005 19:01 (UTC)
I've added a column for language families - so far, I've mostly only had a chance to add in the broadest families - Indo-European, Austronesian, &c., but hopefully we can add in the more specific branches over the next few days. I think this should be useful, especially for the less well known languages which we don't have specific articles about. As far as I can tell, 17 18 language families (Indo-European, Uralic, Altaic, Afro-Asiatic, Niger-Congo, Nilo-Saharan, Sino-Tibetan, Dravidian, Tai-Kadai, Hmong-Mien, Austro-Asiatic, Austronesian, Japonic, Quechuan, Aymaran, Uto-Aztecan, and Mayan, and Tupian) and 1 language isolate (Korean) are represented among the languages with more than one million speakers.
john
k 9 July 2005 01:30 (UTC)
The language called "Persian" is known internationally and domestically within Iran as "Farsi."
The CIA, in the World Factbook have a list of the most used native languages:
note: percents are for "first language" speakers only See reference
Internet World Stats has also a list of the people able to speak each language, including second languages (is ordered by internet users thought)
The english numbers seem a bit high compared with other reports, but they claim to have accurate data. :? -- Bisho 15:33, 14 July 2005 (UTC)
Okay, despite all the talk, no one's fixed this article. I'm reverting to the last version to define language by speaker identification (keeping Akira's edits), since most people feel that intelligibility tests are unworkable. I'm the one who added the Chinese "dialects" in the first place, and I don't have any problem removing my own additions to this article.
By all means, please add the individual Chinese "dialects" back in if you like, but keep the main heading and make a note under it. I might do that myself. No need to go to a lot of work; the info is all in the page history from when I added it the first time. And put back Malay, Czechoslovak, and Serbocroatian back in if you like, as additional info - it's all there in the page history.
If our conception of language is to be cultural or self identification, then we shouldn't mix in intelligibility tests, unless it's added as additional information, and cross referenced. We need some sort of consistency in an encyclopedia article, not just whatever feels right for everyone's favorite language. kwami 22:40, 2005 July 14 (UTC)
I say group Chinese, Arabic (except Maltese and other separated Arabic languages) add a linguistics note in these cases, Group Swiss German to the rest of German. Split Malay/Indonesia, Hindu/Urdu, etc. - Pedro 23:37, 14 July 2005 (UTC)
Well, I have an idea....why not, we group each disputed language under its most commonly referred by name, eg. Chinese for the "chinese" languages and German for the languages spoken in germany, austria, most of switzerland and then provide a sub-division under which one can see the number of speakers per "dialect"/sub-language? That would make naviagation a breeze too! Kenkoo1987 12:37, 6 August 2005 (UTC)
Turkish language has much more total speakers than it is written on this page. Azeri, Kyrgyz, Kazakh, Uzbek, Turkmen and other Turkic languages have only dialectic differences from Turkish. And they are called with Turkish like Azeri Turkish or Kyrgyz Turkish instead of Azeri or Kyrgyz. So, the total number of Turkish speakers is 165,61 million according to this page and in fact it is almost 250 million with the Turks living as minorities all around the world and living in autonomous Turk regions especially in Russian Federation and China.
The Ethnologue figure of 46.28M native speakers in Turkey is just the 1987 population times 85%, based on a guess that 15% of the population is Kurdish. Actually, that figure is now generally considered an underestimate, and the Kurdish population could be as high be 25-30%. However, the Kurdish article and most of the estimates I've seen place it at approximately 20%. Given Turkey's current population has increased dramatically to 69.66M, that would be 56 million native speakers today. The population of Bulgaria has descreased dramatically to 7.45M; at 9.4% ethnic Turk (mostly Turkish speaking), that's a further 700k. Greece: emmigration offsets population growth, so perhaps still ~130k. Cyprus: N. Cyprus population is now 210k. Macedonia: 1982 figure 200k; don't know about now. Uzbekistan: Population increased dramatically from 1979. At current growth rates, calculating back from 1993 (from the demographics chart for Uzbekistan), the 1979 population was ~17.2M. Assuming the same percentage today, that gives ~300k Osmanli speakers. Germany: 2.1M. Netherlands: 200k. France: 140k. All other countries are I believe < 100k, though Moldova has 140k Gagauz speakers. Total: 60.0 million, plus Gagauz. In any case, our major uncertainty is the %age in Turkey: if the Kurds are just a bit more numerous in Turkey, that would offset Turkish-speaking immigrants in the rest of the world. We could maybe guess it's 61 million? I'm putting in 60M native, and assuming 2nd speakers are basically the 14M Kurdish population, ~75M total. kwami 05:39, 29 September 2005 (UTC)
Hi, I read what you have written, I have some words to say;
(0) Kurdish issue is not easy to figure out as kwami put forward it. Since they never have had a country all these numbers are ambigious. Even if the numbers are correct it does not imply the number of native Kurdish speakers. So many Kurdish people in Turkey do not know a word of Kurdish. It is also not easy to claim that Kurdish is those people's native language. Since there is no institution in Turkey that gives Kurdish lessons it is not that plausible that Kurdish people, at least the ones in Turkey, speaks Kurdish properly. That is to say, as the article starts "This is a list of languages ordered by number of first-language speakers" Turkish is the native/first language of so many Kurds in Turkey -sad but true! Therefore, population analysis is pointless... What we need is qualitative analysis instead of quantitative ones.
(1) Osmanli is not Turkish. Osmanli is a dead language; a strange combination of Turkish Arabic and Persian. Cannot be considered as Turkish. I can suplly you documents. As a native speaker of Turkish it is almost impossible for me/us to read a text in Ottoman. Remove the word Osmanli!
(2) I am reluctant to consider Kyrgyz, Kazakh, and Uzbek languages as Turkish. I can understand Azeri and Turkmen languages. Others are certainly Turkic languages but not Turkish! For instance, there are great similarities between Spanish and Portuguese: even 2 native speakers of P&S are so close to understand each other when they speak but Portuguese is not Spanish and vice versa. But I do not know if we can take it as a criterion, because Danes also understand what Sweeds say :-)))
(2.1) Uzbek, this is an example that I obtained from Uzbek_language page: "Barcha odamlar erkin, qadr-qimmat va huquqlarda teng bo'lib tug'iladilar. Ular aql va vijdon sohibidirlar va bir-birlari ila birodarlarcha muomala qilishlari zarur." If is there any Turk out there who claims that he/she can understand this text, then OK lets add Uzbek language to the list as Turkish, but It is almost impossible to understand. There are only 2 or maybe 3 words that I catch, not more! Native speakers of Turkish, like I did, may check http://uz.wikipedia.org/wiki/Main_Page try to read the articles, Can you? I admit that there so many similarities but to be honest I couldn't read those articles.
(2.2) Turkmen_language is very close to Turkish and according to my account, it is convinient to add it to our list. I did check wikipedia's Turkmen edition and yes I can read it.
(2.3) When it comes to Kyrgyz and Kazakh languages, I really do not know much about them... Since they are not using the latin letters it is at least impossible for us to read what they write.
(2.4) Yakuts people, 363,000 speakers (according to wikipedia), is forgotten. I, partially, understand when those people speak in Yakut (Sakha language). I shall not claim that I do understand it as good as I understand Azeri but at least it is worth to consider them as well. I posted a question to sakha language page. Lets see if we can communicate in Turkish.
(3) Turkish people all around the world must be taken into account as well, especially in Europe/Germany.
(4) Is it really a offical language in Bulgaria? I dont think so!
(5) It is not Cyprus! It must be at least written as Northen Cyprus.
(6) Turkish is the first language in Turkey and Northen Cyprus.
(7) Briefly (numbers are taken from wikipedia);
Turkey --> 70M (-10M is accepted owning to Kurds)
Azerbaijan --> 22M (8M in Azerbaijan and 16M in Iran)
Turkmen --> 5,4M (It is 6,4, -1M due to Turkmens in Turkey)
Germany --> 2M (We need to check that, I am not sure)
Bulgaria --> 0.7M (Hard to say if Turkish is the their first language)
Northen Cyprus --> 0.2M
All over --> 100,3M (Without Kyrgyz and Kazakh)
Turkish as first language (lets drop Bulgaria and Kurds) = 89,6M janus_tr 05:00, 19 October 2005 (GMT+1)
Good. After each message, at least, we progress. [But please do not write about politics then these pages are getting really crepy -it is the weak side of wikipedia] Anyway...
Kurdish dispute: We need a reference point; I checked the other wiki articles neither a source nor a citiation... You say %20, I say %30, X says %60... Today, I checked Britannica there it states; "...The largest minority group is the Kurds, who probably make up at least 15 percent of the population." Kurds. (2005). Britannica Student Encyclopedia. Retrieved October 21, 2005, from Encyclopædia Britannica Online http://search.eb.com/ebi/article-9275335 Even Britannica uses the word "probably", the very reason behind it very simple. There is no scientific work on that. Hope it happens one day: but at least we, I think, can use it. Therefore, (70M x 15) / 100= 10.5M What do you say?
Osmanli, Turkish, Anatalion Turkish dilemma: Officially it is called "Turkish" and the "dialect of Istanbul" (that I speak) is regarded as its core. Therefore, if you would like to say something different than Turkish, you may write Turkish (Istanbul). Anatolian Turkish is just a dialect. For instance near the black sea region they speak Karadeniz dialect and at the west coast with a different dialect whereas "Osmanli" is misleading and wrong. Osmanli is out of my scope because it looks like Azeri people could/may understand it. You can easliy understand if you ever try to read in Osmanli :-)) In Turkey, since 1932 -Türk Dil Kurumu, Turkish Language Association, has been regulating the language. According to this governmental association what proper is the Istanbulise (I don't how to spell) Turkish; in France they have the same system. Seen thus, anatolian turkish is deceptive as well. Lets right "Turkish" and since the istanbulise turkish is, officially, Turkish we may put a note for that and may lead the readers to www.tdk.gov.tr.
Lets keep Oguz population seperate, I am OK with that.
Until we have response from Yakut people, lets take them aside.
No reliable knowledge about Bulgaria... What is that suppose to us national or offical language?! strangeeeeee
Should we list all the Oghuz family, then we have to consider http://en.wikipedia.org/wiki/Tatar_language as well. I have a lot fo diffuculties to figure it out but seems closer than Uzbek. If you want you can add all these middle-asia oriented turkic languages to the list.
My conclusion:
Total number = 112,40M : janus 05:43 (GMT+1), October 21, 2005
I will just give a quote from the Constitution of the Republic of Bulgaria (here):
Article 3
Bulgarian shall be the official language of the Republic.
I hope this ends all confusion about the matter. -- Mégara (Мегъра) - D. Mavrov 17:05, 24 April 2006 (UTC)
Thought the New Kypchak language article was interesting. Of course, it's nowhere near reality, but it does show what the conception of a broader standardized Turkic language could be: in this case, Kazakh, Kyrgyz, Tatar, etc, but not Anatolian Turkish or Uzbek. I think we're pretty safe in assuming that Turkic as a whole need not be considered as a language. kwami 10:46, 3 November 2005 (UTC)
I checked the article, as you mentioned "it's nowhere near reality". There'll be so many projects like this one; time shall show us which one(s) shall prevail.
janus 04:00, 5 November 2005 (UTC)
Someone just revised the Korean population upward to 71M, but left no ref. However, this looks about right: S Korea 48.4, N Korea 18-20 (officially 23, not considering the famine), China 1.9 (probably not counting recent refugees), USA 1.8, Japan 0.7, Canada and Australia together 0.1. (Few Russian Koreans still speak the language.) This gives us 70.9 million using the lower estimate for North Korea. Perhaps a million or so more wouldn't be unreasonable, but I don't know how the Wikipedia article gets 78. kwami 06:44, 2005 July 19 (UTC)
Why is there only 20 M Thai native speaker while the population is 67 M right now. ALthough there are several dialect in Thailand right now. But everyone can use the standard Thai including old people and adolescents.
It is absurd to separate Bosnian and Serbian. Both the written and spoken languages are, as far as I am aware, virtually identical, and about half of the population of Bosnia are Serbs, who would be surprised to learn that they do not speak Serbian. I'd suggest that, given this confusion, we should merge Serbo-Croatian back together into a single language. john k 05:42, 2 August 2005 (UTC)
I just split up the table both for ease of navigation and for ease of editing. As for the numbers I picked, it's a logarithmic scale: languages with 106 (1 million) speakers, 106.5 (~3 million) speakers, 107 (10 million) speakers, 107.5 (~30 million) speakers, 108 (100 million) speakers or more. That way there are similar numbers of entries in each table, though the first is rather shorter and the last somewhat longer than the others. Anyway, that's why the 3 and 30 are there, in case anyone thinks they're odd numbers to use. kwami 08:55, 2005 August 2 (UTC)
There is no "Bajar" language listed in Ethnologue for Malaysia or Indonesia. There are too many speakers for it to be Bajaw, and Banjar should be included in Malay. Any ideas? If not, we should probably delete this. kwami 10:41, 2005 August 2 (UTC)
Why is Maithili split out, but the other Bihari languages (e.g. Bhojpuri) are included in Hindi? I would suggest that the Bihari and Rajasthani languages are perhaps distinct enough, and considered distinct enough, from Hindi to warrant not being included. This in contrast to, say, Awadhi, which is usually considered a dialect. john k 06:08, 4 August 2005 (UTC)
Would someone skilled in m:EasyTimeline be willing to make a chart of these? – Quadell ( talk) ( sleuth) 13:43, August 5, 2005 (UTC)
It says "Indo-European, Slavic, deposed and executed 1314"
What does that mean?
Should we lump all the Berber languages together, as we have with Karen, Chinese, etc.? Just a thought -- kwami
We seem to have come to a general consensus on most things here. I've also verified languages down to 2.3 million speakers with Ethnologue 15, and marked those that need further confirmation (because E does not give figures, etc.) (Basically, all those data with the word "million" in them have been confirmed this way.) So, what do people think, remove the warning, or replace it with a general warning that some data is dated, and that the definitions of many languages is fuzzy? kwami 08:20, 2005 August 9 (UTC)
Yes, that's true. I didn't mean edit war, so much as edit conflict - I was assuming that it was accidental. I'm beginning to think the Berber languages should be combined. john k 23:37, 11 August 2005 (UTC)
Yes, 1% of the population according to Ethnologue 15, presumably expats.
I've been removing 'significant communities' if the language is not native to the country and is less than 1% the population of the country per Ethnologue 15. So far I've covered America, Europe, Oceania. kwami 06:43, 19 September 2005 (UTC)
I haven't even attempted to keep track of all the 'significant communities in' entries that people have been adding. I think a large part of our problem is the title of this article. People read "List of languages by total speakers" and foolishly believe that it is a list of languages by total speakers.
Wanna move this to List of languages by native speaking population or List of languages by number of native speakers (which currently redirects here)? We should probably also rename List of languages by total native speakers and link it to this article as an example of the problems involved. kwami 07:11, 2005 August 13 (UTC)
This article has been renamed after the result of a move request. I have renamed this list of languages by number of native speakers as per the request. I did not do anything with the similarly named list of languages by total native speakers as it is not clear from the above what, if anything, you would want done. Dragons flight 23:18, August 22, 2005 (UTC)
Sylheti should be incorporated into Bengali, shouldn't it? john k 04:55, 17 August 2005 (UTC)
Is Farsi there? I searched for Iran and didn't find anything.
These three chinese languages are the ONLY ones without numbers shown. Really I don't think it matters how unrecent the data is if you put a clear date. All of these languages have numbers on their own specific pages, and should be listed here as well. Frencheneesz 13:36, 2005 August 18 (whats UTC?)
I think the number (46 million) in the article page needs to be edited to 77 million to match the country population, or it might be less since there are few minorities who speake other languages.
The sister article List of languages by total native speakers is a compilation of published lists, such as the CIA and Ethnologue, useful as a source of data. I've suggested renaming it to better reflect its contents on its talk page, but there's been no response. How do people here feel about renaming it, and what would be a good name? "Language population data" maybe? kwami 18:55, 2005 August 30 (UTC)
We currently say that a "significant" presence of a language in a country is 1% of the population. However, that is not what we actually have. There are many languages in India and China that would need to be taken off the list, because they aren't official and are spoken by less than 10-13 million people. It would also be weird to have a language only listed for Burma, when the main population is in China, because it makes up more than 1% of the Burmese population but less than 1% of the Chinese population. So obviously this 1% thing isn't going to work if we take it literally. Or is it only supposed to apply to immigrant languages?
Should we do that, and explicitly say 'immigrant languages'? Or do we want some other criterion? (Someone just added Urdu to the US, and there's no way there are 3 million Urdu speakers there.)
kwami 02:02, 2005 August 31 (UTC)
Ok. 1 % is a figure that makes a language important, but we also know that 1 Million is an important number for a language to survive. So there you go: 1% or 1 M.... -
Pedro
19:45, 2 September 2005 (UTC)
Where is Welsh on this Page?
Shouldn't Urdu and Hindi be listed together? Especially if Chinese is listed as one language. Sukh | ਸੁਖ | Talk 20:04, 2 September 2005 (UTC)
Urdu and Hindi are not separate languages and are only considered to be separate languages by rhetoricians who wish to distance Pakistanis from Indians. I would know, I speak them...both? The pronounciation is exactly the same. Each is perfectly understood by speakers of the other.
They are simply written in a different script, and recently religious and socially motivated individuals have sought to bring Sanskrit vocabulary to "Hindi" and Farsi vocabulary to "Urdu" in an attempt to create the impression that speakers are distinct ethnic groups. This is simply not true.
Surely there are enough native Swahili speakers to make this list.
In December 2004 the China Daily released an article describing a survey about how many Mandarin speakers there actually are in China. It turned out 18% spoke Mandarin at home, 42% spoke it at work or school, and 53% could speak it. Since 18% of China's population is about 235 million people, it follows that Mandarin should be placed behind Hindi, English, and Spanish.
We have an edit war going on here, which I think we should resolve here on this discussion page rather than simply reverting each other.
This article is based primarily on Ethnologue 15. Now, Ethnologue is hardly the most reliable resource, and I think we're all aware that we can do better, but at least it provides a modicum of stability to a contentious field. We have changed several languages from what Ethnologue has published, but these changes have been discussed here so that there is basic agreement to the changes.
According to Ethnologue, there are 4.5 million ethnic Assyrians. However, most of these people speak Persian or Arabic as their native/home language. Some may speak some Assyrian as a second language, but that's not what this article is about.
Again, according to Ethnologue, Assyrian has 210,000 native/home speakers out of this ethnic population of 4.5 million. (That is, about 4% of the ethnic population.) There is a similar number of Chaldean speakers, and lesser numbers of other idioms such as Turoyo. If these people consider that they all speak the same language, then they should all be lumped together. However, all the Aramaic languages/dialects total only ~ 534,000. Since the cutoff point for this article is 1 million, Aramaic/Assyrian/Chaldean/Syriac does not make the cut even if it is considered a single language.
Assyria 90, if you have evidence that the number of speakers is greater than Ethnologue reports, please present that information here. However, I personally will consider any unsubstantiated attempt to list 4.5 million speakers of Assyrian as politically motivated, and will revert it: Mark isn't the only one doing this! If you wish to convince people of your claims, please provide something more than your personal say-so. We can't take seriously the claim that you personally know all 4.5 million Assyrians and have verified their native language. kwami 23:43, 14 September 2005 (UTC)
At my screen the rightmost column on Chinese spans on a page and a half (!). At the same time, we have a second column about the language family. It's as wide. I suggest to remove that column. If I were to look for statistical info about number of speakers, most likely I wouldn't give a buck about the family of the language. And if I did, I would just click the link for the lang and see. Currently the family column takes up important space. If we remove it, the list will become about two times shorter (at 1024x768). Does anybody disagree? (I'm User:logixoul, I'm having problems with my cookies at the moment) -- 85.130.99.211 20:26, 19 September 2005 (UTC)
Recalculated Portuguese. The main discrepancy is that no one has given a source for the claim that 60% of Angolans are native Portuguese speakers. Since 99.5% of Angolans speak some other language as their native tongue, I find the figure doubtful. Here's my calc: Angola 52k, Cabo Verde 15k, Mozambique 30k, Sao Tome 2.5k, South Africa 617k, India 250k, Macao 2k, Paraguay 636k, Luxembourg 100k, France 750k, Switzerland 86k, Andorra 2k. That's just under 2.6 million, which just offsets the 2.6 million Brazilians who do not speak Portuguese as their native tongue. So, Brazil 186.1M, no adjustment; Portugal 10.6M, less 0.5M non-native, and you get 196M. This agrees with the WA 2005 figure of 195M (which is rounded off to the nearest 5M).
Bengali also is listed with 196M. However, the bulk of that figure is now ten years old. Given that Bangladesh has a high growth rate, it should be well over 196M by now, and therefore I've ranked Bengali ahead of Portuguese. kwami 09:33, 20 September 2005 (UTC)
in 1983: native Portuguese speakers: 60% (of 100%) native Portuguese speakers capable of speaking an African language: 50% that gives a max. of 70% speaking African languages. In the capital of Angola, Luanda: 75% are native speakers of Portuguese.
You really dont know the country believing that 99.5% of Angolans speak an African language. So before reverting anything, or saying that you dont know any source. And you've asked me once about the data, but you simply ignored.
Here's a link: http://www.linguaportuguesa.ufrn.br/pt_3.4.a.php - Pedro 23:04, 26 September 2005 (UTC)
I've a problem connecting to INE, the site is http://www.ine.gov.mz/ (maybe they are down) you may link on censo de 1997 and search "língua", or something...
Embora means several things, in that case it means "meanwhile", "moradores" (plural), "morador" (singular) is a person that lives in a given place, in this case, Angola. The most common word for countries is in fact "habitantes" (inhabitants).-- Pedro 20:36, 27 September 2005 (UTC)
Okay, verified all languages for all countries, per Ethnologue 15, and took out anything less than 1%, with a few exceptions: languages native to one country are listed there regardless; languages slightly under 1% in a second country may be listed if that number is a third or more of the total speakers; and a couple judgment calls like leaving Korean in Japan, since they are the most significant minority in mainland Japan even if somewhat under 1%; and Portuguese in Namibia, even though I don't have figures, because there are significant numbers in both Angola and South Africa. I suggest reverting any attempts to add additional countries (like Mongolia for Russian, or the US for Panjabi) unless these additions are supported.
However, sometimes Ethnologue just states that an immigrant language is found in a country without giving figures. Usually this means the numbers are small, but not always, so I may have deleted a few countries I shouldn't have. In other cases, as with Philippino emmigrants and Ivory Coast immigrants, numbers are given for nationality but not language. In a couple of these cases I left a country in with a question mark. kwami 13:19, 22 September 2005 (UTC)
Should the Akan languages be unified? Currently Baoule, Anyi, and Brong have separate entries. kwami