This is the
talk page for discussing improvements to the
Human mitochondrial molecular clock article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||
|
This article needs lots of cleanup. For example, what does capable of directly estimating events greater tha 50,000 accurately in the lead mean? 50,000 brickstones? Years? Generations? The lead is too long and too complicated. Any volunteer? Northfox ( talk) 08:04, 3 January 2010 (UTC)
Before I go into the details, I will just give a brief background to some of the recent activities with these articles. Archaeogenetics is fast growing scientific discipline and there have been numerous recent publications concerning the variation in human mitochondrial DNA. Mitochondrial Eve is somewhat of a popular science phenomenon but in most of the scientific literature, the term mitochondrial Eve is avoided, instead she is usually referred to as the human mitochondrial MRCA. This difference has been at the root of some of the problems facing the article. I would prefer that many of these articles be covered from the popular science perspective, because popular science publications are accessible to the general public, which fits well with WP:TECHNICAL. In fact many independent editors who are clearly interested in these articles have expressed complaints about the level of technical detail.
Many of the popular science publications are also secondary sources, as opposed to the actual publications they source which would be primary sources, see WP:PSTS. The popular books by Stephen Oppenheimer, Brian Sykes, Spencer Wells and Steve Olson (writer) would provide a useful foundation for the articles.
The alternative route is to use primary data, which is what Pdeitiker had been using. This results in a lot of jargon. I had been working on these articles, but sometime in December got swept up with the other controversy, so indeed these articles were neglected half way, but since Biophys seems interested, it's a welcome distraction.
To answer your question, the first section "Early studies" is not a major issue. I have concerns that it is unnecessarily complex, but it is fairly accurate. I wouldn't mind including it, though in the future it could be re-written. The problems begin in the next section, Sequence based studies. Below is an,
The section is tangential and mostly refers to archeological finds in Africa and the Near East. The archeology sources listed are not directly connected mitochondrial DNA, in fact they may be purely archeological. This paragraph also includes controversial claims that are treated as factual, the claim in bold is cited to this article, however this article states
$45–75 kya [10 ,29], broadly consistent with dates inferred from paleontological So I don't know where the 52kya is coming from. The next paragraph states
However Gonder et al, actually state
Compare this statement with the statement in bold. What has occurred is that Wikipedian has taken TMRCA and added an additional standard deviation to come up with the 130,000 to 280,000 year range. Statistically nothing wrong with that, just that no such analysis is found in Gonder et al 2007.
So what we have is a some factual information mixed with some original or unverifiable claims. Wapondaponda ( talk) 17:57, 27 July 2010 (UTC)
There are three basic issues that have been identified in the literature that affect molecular clocking.
1. Site specific mutation rates, methods of determination. To state quite simply site specific mutation rates have not been rigorously determined across broad species. I know this for a fact because the original therian database that most of this work was done contained only 40 species, and to be fair anthropoids need to be excluded if you are going to is site specific rates to clock anthropoids.
2. Purifying selection 3. Adaptive selection.
Here is the basic problem with 2 and 3. I will state this quite simply, the number of mutations of rare sites in humans exceeds the expectation. This has been noted by 7 authors and has been referenced on the mtEve page and one the arbitration boards, independently I have used the site-rate based determinations and there is indeed an inflection in the rarely mutated sites that that flattens out in the more common mt site variants. This inflection is not seen in comparing humans to gorilla, or humans to chimpanzees, it is seen when comparing chimpanzees to chimpanzees at the periphery of their mitocondrial tree, but it is not as severe as that seen in humans.
Soares et al. uses generalized statistical methods to estimate the TMRCA in humans, and based on this method they suggest that this 'relative bias' seen in interhuman comparisons is the consequence of purifying selection and they correct for this purifying selection in ascertaining their estimate.
I have examined this issue, I have come to a different conclusion. In examining their global mtDNA and particular some lineages in africa, I found a near-significant decrease in the relative bias, in fact certain intercomparisons are flat, as one expects if no purifying selection is present. If adaptive selection is operating, as some authors suggest, the bias should increase as human environment changes (i.e. migration from warmer areas to colder areas, from diets rich in complex carbs to diets rich in fatty meats). To some degree there is a correlation.
The problem is this, the system they are using is seeded with too few sites to develope good statistics in their apriori database. Type II statistical errors occur (accepting the null hypothesis when there is a significant difference) occurs when there is inadequate sample size in either the control or subject sample sets. Gonder et al noted very clearly that their was a relaxation of selection within Tanzania, but selection was evident everywhere else. If purifying selection is the only cause then the bias should not occur juxtaposed to the PMRCA of all mtDNA. The problem on the human side of the equation is that the sample size within these 'unbaised' haplogroups is current to small to develope good statistics.
Therefore many of the points of view are still valid. The global TMRCA is based on a molecular clock that obviously has examples of purifying selection, IOW the temporal appearance of derived haplotypes that are unstable over the long term. But there are also examples in the database of mtDNA that have evolved to deal with arctic climate and then to deal with the tropical climate (e.g. South America) and some of these examples the highest individual TMRCA to the human concensus.
As one of the members of another group pointed out to me recently Soares also excluded certain groups in his analysis, which may are may not be bad, but if group masking is to be done then it needs to be done with an eye on adaptive selection, IOW haplotypes that show marked adaptive selection should be masked before applying purifying selection correction algorithm.
The base leg of the calculation of TMRCA is based on the C/H LCA which Soares takes as 6.5 Ma despite information suggesting it is over a much broader range, he does not include that range into his analysis. White and company has published his findings suggesting teh C/H LCA is over 7 million years in age. The C/H LCA that has been used is largely defined circularly. The problem with that definition is two fold. 1. Uncertainty about with gibbons and orangutans evolved 2. Adaptive selection in the evolution of Tropical apes from Temperate ape ancestors.
Therefore there is uncertainty in the clock that has been used by Sarich, Wilson, Vigilante and others, the variation is not huge, but it does affect the confidence interval that is stated.
Finally basal human evolution of mtDNA is largely characterized by chimpanzee human comparisons, that can be characterized regionally contained evolution of chimpanzees with equitorial africa (at least to 2 mya) and evolution of humans either in Africa or in spats in Asia. In addition the invention of fire, housing, clothing have markedly changed the human environment. This may have resulted in a higher rate of adapative evolution on the human side.
There has been a tendency in the past for individual authors to idealize confidence, this is the consequence of a focus on certain sources of variation while ignoring other sources. One study of anthropoid evolution found that no matter which anchors one uses one will always have mtDNA that violate expected branch lengths, this group suggested that anchor times should always be represented as confidence ranges that statistically reconcile branch lengths. Without knowing the sources of variation this form of data treatment corrects for a large number of variances.
This type of treatement however is never done with mtDNA TMRCA analysis and is a major reason why TMRCA ranges have fluctuated greatly over time. MW has been pressing for the removal of TMRCAs that violate his sense of consistency, however these alternative dates do represent some of the uncertainty created when sources of variation go ignored.
Let me just state, within molecular anthropology the C/H LCA was treate as fact of 4-6 ma from approximately 1975 to 1995. Now C/H LCA between 6-10 Ma is considered the most probable. This is what happens when one does not engage statistical variance in an unbiased manner. PB666 yap 16:21, 5 November 2010 (UTC)
This is the
talk page for discussing improvements to the
Human mitochondrial molecular clock article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||
|
This article needs lots of cleanup. For example, what does capable of directly estimating events greater tha 50,000 accurately in the lead mean? 50,000 brickstones? Years? Generations? The lead is too long and too complicated. Any volunteer? Northfox ( talk) 08:04, 3 January 2010 (UTC)
Before I go into the details, I will just give a brief background to some of the recent activities with these articles. Archaeogenetics is fast growing scientific discipline and there have been numerous recent publications concerning the variation in human mitochondrial DNA. Mitochondrial Eve is somewhat of a popular science phenomenon but in most of the scientific literature, the term mitochondrial Eve is avoided, instead she is usually referred to as the human mitochondrial MRCA. This difference has been at the root of some of the problems facing the article. I would prefer that many of these articles be covered from the popular science perspective, because popular science publications are accessible to the general public, which fits well with WP:TECHNICAL. In fact many independent editors who are clearly interested in these articles have expressed complaints about the level of technical detail.
Many of the popular science publications are also secondary sources, as opposed to the actual publications they source which would be primary sources, see WP:PSTS. The popular books by Stephen Oppenheimer, Brian Sykes, Spencer Wells and Steve Olson (writer) would provide a useful foundation for the articles.
The alternative route is to use primary data, which is what Pdeitiker had been using. This results in a lot of jargon. I had been working on these articles, but sometime in December got swept up with the other controversy, so indeed these articles were neglected half way, but since Biophys seems interested, it's a welcome distraction.
To answer your question, the first section "Early studies" is not a major issue. I have concerns that it is unnecessarily complex, but it is fairly accurate. I wouldn't mind including it, though in the future it could be re-written. The problems begin in the next section, Sequence based studies. Below is an,
The section is tangential and mostly refers to archeological finds in Africa and the Near East. The archeology sources listed are not directly connected mitochondrial DNA, in fact they may be purely archeological. This paragraph also includes controversial claims that are treated as factual, the claim in bold is cited to this article, however this article states
$45–75 kya [10 ,29], broadly consistent with dates inferred from paleontological So I don't know where the 52kya is coming from. The next paragraph states
However Gonder et al, actually state
Compare this statement with the statement in bold. What has occurred is that Wikipedian has taken TMRCA and added an additional standard deviation to come up with the 130,000 to 280,000 year range. Statistically nothing wrong with that, just that no such analysis is found in Gonder et al 2007.
So what we have is a some factual information mixed with some original or unverifiable claims. Wapondaponda ( talk) 17:57, 27 July 2010 (UTC)
There are three basic issues that have been identified in the literature that affect molecular clocking.
1. Site specific mutation rates, methods of determination. To state quite simply site specific mutation rates have not been rigorously determined across broad species. I know this for a fact because the original therian database that most of this work was done contained only 40 species, and to be fair anthropoids need to be excluded if you are going to is site specific rates to clock anthropoids.
2. Purifying selection 3. Adaptive selection.
Here is the basic problem with 2 and 3. I will state this quite simply, the number of mutations of rare sites in humans exceeds the expectation. This has been noted by 7 authors and has been referenced on the mtEve page and one the arbitration boards, independently I have used the site-rate based determinations and there is indeed an inflection in the rarely mutated sites that that flattens out in the more common mt site variants. This inflection is not seen in comparing humans to gorilla, or humans to chimpanzees, it is seen when comparing chimpanzees to chimpanzees at the periphery of their mitocondrial tree, but it is not as severe as that seen in humans.
Soares et al. uses generalized statistical methods to estimate the TMRCA in humans, and based on this method they suggest that this 'relative bias' seen in interhuman comparisons is the consequence of purifying selection and they correct for this purifying selection in ascertaining their estimate.
I have examined this issue, I have come to a different conclusion. In examining their global mtDNA and particular some lineages in africa, I found a near-significant decrease in the relative bias, in fact certain intercomparisons are flat, as one expects if no purifying selection is present. If adaptive selection is operating, as some authors suggest, the bias should increase as human environment changes (i.e. migration from warmer areas to colder areas, from diets rich in complex carbs to diets rich in fatty meats). To some degree there is a correlation.
The problem is this, the system they are using is seeded with too few sites to develope good statistics in their apriori database. Type II statistical errors occur (accepting the null hypothesis when there is a significant difference) occurs when there is inadequate sample size in either the control or subject sample sets. Gonder et al noted very clearly that their was a relaxation of selection within Tanzania, but selection was evident everywhere else. If purifying selection is the only cause then the bias should not occur juxtaposed to the PMRCA of all mtDNA. The problem on the human side of the equation is that the sample size within these 'unbaised' haplogroups is current to small to develope good statistics.
Therefore many of the points of view are still valid. The global TMRCA is based on a molecular clock that obviously has examples of purifying selection, IOW the temporal appearance of derived haplotypes that are unstable over the long term. But there are also examples in the database of mtDNA that have evolved to deal with arctic climate and then to deal with the tropical climate (e.g. South America) and some of these examples the highest individual TMRCA to the human concensus.
As one of the members of another group pointed out to me recently Soares also excluded certain groups in his analysis, which may are may not be bad, but if group masking is to be done then it needs to be done with an eye on adaptive selection, IOW haplotypes that show marked adaptive selection should be masked before applying purifying selection correction algorithm.
The base leg of the calculation of TMRCA is based on the C/H LCA which Soares takes as 6.5 Ma despite information suggesting it is over a much broader range, he does not include that range into his analysis. White and company has published his findings suggesting teh C/H LCA is over 7 million years in age. The C/H LCA that has been used is largely defined circularly. The problem with that definition is two fold. 1. Uncertainty about with gibbons and orangutans evolved 2. Adaptive selection in the evolution of Tropical apes from Temperate ape ancestors.
Therefore there is uncertainty in the clock that has been used by Sarich, Wilson, Vigilante and others, the variation is not huge, but it does affect the confidence interval that is stated.
Finally basal human evolution of mtDNA is largely characterized by chimpanzee human comparisons, that can be characterized regionally contained evolution of chimpanzees with equitorial africa (at least to 2 mya) and evolution of humans either in Africa or in spats in Asia. In addition the invention of fire, housing, clothing have markedly changed the human environment. This may have resulted in a higher rate of adapative evolution on the human side.
There has been a tendency in the past for individual authors to idealize confidence, this is the consequence of a focus on certain sources of variation while ignoring other sources. One study of anthropoid evolution found that no matter which anchors one uses one will always have mtDNA that violate expected branch lengths, this group suggested that anchor times should always be represented as confidence ranges that statistically reconcile branch lengths. Without knowing the sources of variation this form of data treatment corrects for a large number of variances.
This type of treatement however is never done with mtDNA TMRCA analysis and is a major reason why TMRCA ranges have fluctuated greatly over time. MW has been pressing for the removal of TMRCAs that violate his sense of consistency, however these alternative dates do represent some of the uncertainty created when sources of variation go ignored.
Let me just state, within molecular anthropology the C/H LCA was treate as fact of 4-6 ma from approximately 1975 to 1995. Now C/H LCA between 6-10 Ma is considered the most probable. This is what happens when one does not engage statistical variance in an unbiased manner. PB666 yap 16:21, 5 November 2010 (UTC)