![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
![]() |
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
Might be handy if this page included some information about removing selection bias - but I don't have the knowledge. Cached 11:42, 5 December 2005 (UTC)
I believe that much of this article as written so far pertains to sampling bias, not selection bias, and hence the article clamors for substantial revision. Properly, selection bias refers to bias in the estimation of the causal effect of a treatment because of heterogeneous selection into the group that receives the treatment. Sampling bias refers to biases in the collection of data. (See also the article on bias (statistics).) Tobacman 20:08, 11 January 2006 (UTC) The stub on censored regression models also helps make this distinction and given that this is a one strategy used to overcome these kinds of biases should probably be linked into this article Albertivan 13:31, 30 March 2007 (UTC)
Another possible distinction is that sampling bias is rather produced by an accidental or instrumental bias in the sampling technique, as against a deliberate or unconscious manipulation.
I am interested in ttesting the validity of online polling, under which the assumption is made that the set of participation in a poll is defined by the persons who view the website for other reasons, and who participate in the poll based upon their discovery of its presence.
I am especially interested in determining the validity of such a sample when affected by a self-selection bias, particularly when a subset of participants with a predetermined answer to a poll, is selected by self-recruitment - that is, by other means than discovery only upon visitation of the website, such as mutual e-mail notification.
To what degree can such participants bias the results of the poll, in comparison to their relative participation?
Any thoughts on approach?
Nothing apart from appreciating
Heckman was is smart, yes. why doesnt he make his model clear for public consumption. Its too mathematical.
Stephen mpumwire, Fortfortal
I am changing one word in the first paragraph again. Conclusions drawn from statistical analysis are inductive, not deductive. As such, inductive arguments exhibit a valence of strength. The language of logic dictates that deductive arguments are either valid or invalid, and either sound or unsound. Since only deductive arguments have validity, it does not make sense to refer to statistical inductive arguments as valid. It makes sense to refer to statistical arguments that suffer from a selection bias as weak. Weak arguments are inductive in nature and are not likely to preserve truth. Kanodin 09:35, 9 July 2007 (UTC)
Sherry Seethaler gives the case of the Chicago Tribune headline, "Dewey Defeats Truman" which was based in part on a telephone survey. At the time, telephones were expensive items whose owners tended to be in the elite - who favored Dewey much more than the average voter. Where should this go in the article? In the Participants section? -- Uncle Ed ( talk) 20:10, 20 May 2009 (UTC)
I moved this line below to here from after Partitioning data with knowledge of the contents of the partitions, and then analyzing them with tests designed for blindly chosen partitions, because I think the examples need more description here in order to be self-explanatory in the article, without the reader having to read all those target articles to understand why they even should be read. Mikael Häggström ( talk) 15:37, 24 September 2009 (UTC)
(see stratified sampling, cluster sampling, Texas sharpshooter fallacy)
I started an article specifically on circular analysis/double dipping. In my field of fMRI this might occur where a researcher adjusts parameters in a classifier to improve its accuracy, or stops adding pre-processing steps when the result 'reaches significance'. I think it definitely needs a section or something on this page. Also a redirect from circular analysis and double dipping. Here's the initial version of the page:
Circular Analysis is the selection of parameters of an analysis using the data to be analysed. It is often referred to as double dipping, as one uses the same data twice. Circular analysis inflates the statistical strength of results and, at the most extreme can result in a strongly significant result from noise.
At its most simple, it can include the decision to remove outliers, after noticing this might help improve the analysis of an experiment. The effect can be more subtle. In fMRI data, for example, considerable amounts of pre-processing is often needed. These might be applied incrementally until the analysis 'works'. Similarly, the classifiers used in a multivoxel analysis of fMRI data require parameters, which could be tuned to maximise the classification accuracy.
Careful design of the analysis one plans to perform, prior to collecting the data, means the analysis choice is not affected by the data collected. Alternatively, one might decide to perfect the classification on one or two participants, and then use the analysis on the remaining participant data. Regarding the selection of classification parameters, a common method is to divide the data into two sets, and find the optimum parameter using one set and then test using this parameter value on the second set. This is a standard technique used (for example) by the princeton MVPA classification library.
Kriegeskorte, Nikolaus, et al. "Circular analysis in systems neuroscience: the dangers of double dipping." Nature neuroscience 12.5 (2009): 535-540.
Lionfish0 ( talk) 10:30, 21 March 2013 (UTC)
Dr. Nosenzo has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:
I am not sure there is an agreement across disciplines on the labels and definitions of "selection" and "sampling" biases. To my understanding, "selection bias" occurs when a rule other than simple random sampling is used to sample the underlying population that is the object of interest, resulting in a distorted representation of the true population. [source: http://www.dictionaryofeconomics.com/article?id=pde2008_S000084]. Sampling bias occurs when the selected sample is non-representative of the underlying population, which may hinder generalizability of findings. A source that discusses this distinction is: BRS Behavioral Science, Fifth Edition. This contradicts what is presented here in the Wiki article. I also disagree with the classification of "types" of selection bias; some of the types listed in the article do not seem to relate, strictly speaking, to the issue of non-random sampling (eg the discussion of which studies are included in a meta-analysis; the discussion of data reporting and disclosure). I think these are distinct issues that have more to do with scientific malpractice than with sampling or selection bias.
We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.
We believe Dr. Nosenzo has expertise on the topic of this article, since he has published relevant scholarly research:
ExpertIdeasBot ( talk) 16:34, 2 August 2016 (UTC)
Dr. Turon has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:
Sampling bias is systematic error due to a non-random sample of a population,[2] causing some members of the population to be less likely to be included than others, resulting in a biased sample, defined as a statistical sample of a population (or non-human factors) in which all participants are not equally balanced or objectively represented.[3] It is mostly classified as a subtype of selection bias,[4] sometimes specifically termed sample selection bias,[5][6][7] but some classify it as a separate type of bias.[8]
my suggestion: Sampling bias is systematic error due to a non-random sample of a population,[2] causing some members of the population to be less likely to be included than others, resulting in a biased sample. When the sample differs systematically from the population from which it was drawn, any statistical analysis of the sample will reflect features of the population in a biased manner.
suggestion to add in Types paragraph: "Incidental Truncation. Here, we do not observe [the outcome of interest] because of the outcome of another variable. The leading example is estimating the so-called wage offer function from labor economics. Interest lies in how various factors, such as education, affect the wage an individual could earn in the labor force. For people who are in the workforce, we observe the wage offer as the current wage. But, for those currently out of the workforce, we do not observe the wage offer. Because working may be systematically correlated with unobservables that affect the wage offer, using only working people (..) might produce biased estimators of the parameters in the wage offer equation." (quoted from "Introductory Econometrics" by J.M Wooldridge, 2nd ed, p. 585.
suggestion to add in the Mitigation paragraph:
The Heckman sample selection correction consists in modeling the selection of individuals into the sample as well as the equation of interest. A crucial and often problematic ingredient of this method is that one must have access to data on variables which affect the selection process but do not affect the outcome of interest. This is called the exclusion restriction.
We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.
We believe Dr. Turon has expertise on the topic of this article, since he has published relevant scholarly research:
ExpertIdeasBot ( talk) 02:35, 6 September 2016 (UTC)
As others have noted above, this page confounds multiple concepts. I'm particularly concerned with the section on "observer bias" and reference to "anthropic reasoning" I'm commenting on the talk page of Anthropic Principle too. I haven't yet read the Tegmark or Bostrom books sited there, but I don't think such WP:FRINGE metaphysical arguments should be presented without caveat on this page. DolyaIskrina ( talk) 16:06, 2 July 2019 (UTC)
Vaccines causing autism might be a better example of Types/susceptibility bias. — Preceding unsigned comment added by 2606:A000:825A:500:1D9:45B5:6018:DA31 ( talk) 04:12, 10 November 2019 (UTC)
Both are the same subject. Crashed greek ( talk) 06:04, 16 November 2022 (UTC)
"Cherry picking, which actually is not selection bias, but confirmation bias...". Selection bias is a function of flaws in the methodological approach to data sampling and handling. Confirmation bias is the conscious or unconscious manipulation of already extant data sets. Iskandar323 ( talk) 05:55, 24 November 2022 (UTC)
![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
![]() |
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
Might be handy if this page included some information about removing selection bias - but I don't have the knowledge. Cached 11:42, 5 December 2005 (UTC)
I believe that much of this article as written so far pertains to sampling bias, not selection bias, and hence the article clamors for substantial revision. Properly, selection bias refers to bias in the estimation of the causal effect of a treatment because of heterogeneous selection into the group that receives the treatment. Sampling bias refers to biases in the collection of data. (See also the article on bias (statistics).) Tobacman 20:08, 11 January 2006 (UTC) The stub on censored regression models also helps make this distinction and given that this is a one strategy used to overcome these kinds of biases should probably be linked into this article Albertivan 13:31, 30 March 2007 (UTC)
Another possible distinction is that sampling bias is rather produced by an accidental or instrumental bias in the sampling technique, as against a deliberate or unconscious manipulation.
I am interested in ttesting the validity of online polling, under which the assumption is made that the set of participation in a poll is defined by the persons who view the website for other reasons, and who participate in the poll based upon their discovery of its presence.
I am especially interested in determining the validity of such a sample when affected by a self-selection bias, particularly when a subset of participants with a predetermined answer to a poll, is selected by self-recruitment - that is, by other means than discovery only upon visitation of the website, such as mutual e-mail notification.
To what degree can such participants bias the results of the poll, in comparison to their relative participation?
Any thoughts on approach?
Nothing apart from appreciating
Heckman was is smart, yes. why doesnt he make his model clear for public consumption. Its too mathematical.
Stephen mpumwire, Fortfortal
I am changing one word in the first paragraph again. Conclusions drawn from statistical analysis are inductive, not deductive. As such, inductive arguments exhibit a valence of strength. The language of logic dictates that deductive arguments are either valid or invalid, and either sound or unsound. Since only deductive arguments have validity, it does not make sense to refer to statistical inductive arguments as valid. It makes sense to refer to statistical arguments that suffer from a selection bias as weak. Weak arguments are inductive in nature and are not likely to preserve truth. Kanodin 09:35, 9 July 2007 (UTC)
Sherry Seethaler gives the case of the Chicago Tribune headline, "Dewey Defeats Truman" which was based in part on a telephone survey. At the time, telephones were expensive items whose owners tended to be in the elite - who favored Dewey much more than the average voter. Where should this go in the article? In the Participants section? -- Uncle Ed ( talk) 20:10, 20 May 2009 (UTC)
I moved this line below to here from after Partitioning data with knowledge of the contents of the partitions, and then analyzing them with tests designed for blindly chosen partitions, because I think the examples need more description here in order to be self-explanatory in the article, without the reader having to read all those target articles to understand why they even should be read. Mikael Häggström ( talk) 15:37, 24 September 2009 (UTC)
(see stratified sampling, cluster sampling, Texas sharpshooter fallacy)
I started an article specifically on circular analysis/double dipping. In my field of fMRI this might occur where a researcher adjusts parameters in a classifier to improve its accuracy, or stops adding pre-processing steps when the result 'reaches significance'. I think it definitely needs a section or something on this page. Also a redirect from circular analysis and double dipping. Here's the initial version of the page:
Circular Analysis is the selection of parameters of an analysis using the data to be analysed. It is often referred to as double dipping, as one uses the same data twice. Circular analysis inflates the statistical strength of results and, at the most extreme can result in a strongly significant result from noise.
At its most simple, it can include the decision to remove outliers, after noticing this might help improve the analysis of an experiment. The effect can be more subtle. In fMRI data, for example, considerable amounts of pre-processing is often needed. These might be applied incrementally until the analysis 'works'. Similarly, the classifiers used in a multivoxel analysis of fMRI data require parameters, which could be tuned to maximise the classification accuracy.
Careful design of the analysis one plans to perform, prior to collecting the data, means the analysis choice is not affected by the data collected. Alternatively, one might decide to perfect the classification on one or two participants, and then use the analysis on the remaining participant data. Regarding the selection of classification parameters, a common method is to divide the data into two sets, and find the optimum parameter using one set and then test using this parameter value on the second set. This is a standard technique used (for example) by the princeton MVPA classification library.
Kriegeskorte, Nikolaus, et al. "Circular analysis in systems neuroscience: the dangers of double dipping." Nature neuroscience 12.5 (2009): 535-540.
Lionfish0 ( talk) 10:30, 21 March 2013 (UTC)
Dr. Nosenzo has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:
I am not sure there is an agreement across disciplines on the labels and definitions of "selection" and "sampling" biases. To my understanding, "selection bias" occurs when a rule other than simple random sampling is used to sample the underlying population that is the object of interest, resulting in a distorted representation of the true population. [source: http://www.dictionaryofeconomics.com/article?id=pde2008_S000084]. Sampling bias occurs when the selected sample is non-representative of the underlying population, which may hinder generalizability of findings. A source that discusses this distinction is: BRS Behavioral Science, Fifth Edition. This contradicts what is presented here in the Wiki article. I also disagree with the classification of "types" of selection bias; some of the types listed in the article do not seem to relate, strictly speaking, to the issue of non-random sampling (eg the discussion of which studies are included in a meta-analysis; the discussion of data reporting and disclosure). I think these are distinct issues that have more to do with scientific malpractice than with sampling or selection bias.
We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.
We believe Dr. Nosenzo has expertise on the topic of this article, since he has published relevant scholarly research:
ExpertIdeasBot ( talk) 16:34, 2 August 2016 (UTC)
Dr. Turon has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:
Sampling bias is systematic error due to a non-random sample of a population,[2] causing some members of the population to be less likely to be included than others, resulting in a biased sample, defined as a statistical sample of a population (or non-human factors) in which all participants are not equally balanced or objectively represented.[3] It is mostly classified as a subtype of selection bias,[4] sometimes specifically termed sample selection bias,[5][6][7] but some classify it as a separate type of bias.[8]
my suggestion: Sampling bias is systematic error due to a non-random sample of a population,[2] causing some members of the population to be less likely to be included than others, resulting in a biased sample. When the sample differs systematically from the population from which it was drawn, any statistical analysis of the sample will reflect features of the population in a biased manner.
suggestion to add in Types paragraph: "Incidental Truncation. Here, we do not observe [the outcome of interest] because of the outcome of another variable. The leading example is estimating the so-called wage offer function from labor economics. Interest lies in how various factors, such as education, affect the wage an individual could earn in the labor force. For people who are in the workforce, we observe the wage offer as the current wage. But, for those currently out of the workforce, we do not observe the wage offer. Because working may be systematically correlated with unobservables that affect the wage offer, using only working people (..) might produce biased estimators of the parameters in the wage offer equation." (quoted from "Introductory Econometrics" by J.M Wooldridge, 2nd ed, p. 585.
suggestion to add in the Mitigation paragraph:
The Heckman sample selection correction consists in modeling the selection of individuals into the sample as well as the equation of interest. A crucial and often problematic ingredient of this method is that one must have access to data on variables which affect the selection process but do not affect the outcome of interest. This is called the exclusion restriction.
We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.
We believe Dr. Turon has expertise on the topic of this article, since he has published relevant scholarly research:
ExpertIdeasBot ( talk) 02:35, 6 September 2016 (UTC)
As others have noted above, this page confounds multiple concepts. I'm particularly concerned with the section on "observer bias" and reference to "anthropic reasoning" I'm commenting on the talk page of Anthropic Principle too. I haven't yet read the Tegmark or Bostrom books sited there, but I don't think such WP:FRINGE metaphysical arguments should be presented without caveat on this page. DolyaIskrina ( talk) 16:06, 2 July 2019 (UTC)
Vaccines causing autism might be a better example of Types/susceptibility bias. — Preceding unsigned comment added by 2606:A000:825A:500:1D9:45B5:6018:DA31 ( talk) 04:12, 10 November 2019 (UTC)
Both are the same subject. Crashed greek ( talk) 06:04, 16 November 2022 (UTC)
"Cherry picking, which actually is not selection bias, but confirmation bias...". Selection bias is a function of flaws in the methodological approach to data sampling and handling. Confirmation bias is the conscious or unconscious manipulation of already extant data sets. Iskandar323 ( talk) 05:55, 24 November 2022 (UTC)