Mathematics desk | ||
---|---|---|
< July 31 | << Jul | August | Sep >> | Current desk > |
Welcome to the Wikipedia Mathematics Reference Desk Archives |
---|
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
Hello. I'm a grad student who's been asked to look at some data and come up with some statistics for possible publication, but the data set is small and limited to "present/not present", so I'm not sure how to approach it. To give a little more detail, I have been provided two collections of items, each of which has been attributed with specific features, as follows:
Collection A | Collection B | |
---|---|---|
Feature 1 | 0/30 | 49/50 |
Feature 2 | 5/30 | 45/50 |
Feature 3 | 15/30 | 40/50 |
Feature 4 | 0/30 | 10/50 |
Feature 5 | 20/30 | 0/50 |
My experience with statistics is very limited, and I'm having trouble working with the various statistics articles, so the questions I have are as follows:
Honestly, I'm not even sure where to start, so any help you can provide would be extremely helpful. Many thanks. - But I Played One On TV ( talk) 15:18, 1 August 2008 (UTC)
Hello. Assuming that the collections, A and B, are samples from a big population, you want to know about the population based on knowing the sample. The probability of each feature can be estimated with some uncertainty from the sample data. A collection contains n items of which i have some feature. If you knew the probability, x, that a random item of the population has the feature, then you could compute the probability pi(x) that the collection contains i = 0, 1, 2, ... , n items having that feature. This probability is given by the binomial distribution
This distribution is summarized by it's mean value ± standard deviation:
This means that if you know the population frequency, x, of some feature, then you can estimate the sample frequency of the feature, which is
This formula for estimating i knowing x and n is however not what you want. You want a formula for estimating x knowing i and n. The distribution function for x, knowing i and n is still
apart from an unimportant normalization factor. This distribution function is known as the beta distribution. The mean value of the beta distribution is not , but rather , and the standard deviation of the beta distribution is not but rather So the formula you want is
Substituting your data into this formula gives the following estimates for the population frequencies:
Collection A | Collection B | |
---|---|---|
Feature 1 | 0.03±0.03 | 0.96±0.03 |
Feature 2 | 0.19±0.07 | 0.88±0.04 |
Feature 3 | 0.50±0.09 | 0.79±0.06 |
Feature 4 | 0.03±0.03 | 0.21±0.06 |
Feature 5 | 0.66±0.08 | 0.02±0.02 |
Now you want to know if the two collections can be believed to come from the same population. Is a number from a distribution 0.03±0.03 likely to be equal to a number from a distribution 0.96±0.03? Compute the difference
Is this difference likely to be zero? Zero is 0.93/0.05=18.6 standard deviations away from the mean value of the distribution. This difference is highly significant.
Be warned that different statisticians use different approximations and thus may reach different results. The above approach may not be considered standard by your teacher. Have fun! Bo Jacoby ( talk) 23:16, 1 August 2008 (UTC).
This question has nothing to do with the populations but only with the samples. So the probabilities above are not relevant. You need to go back to the original data material and count how many elements in each of the collections A and B that has the new Feature 6, which is "Feature 1 and Feature 2 and not Feature 5". Then you get four numbers:
and the sums
The probability you ask for is A21/A01. Bo Jacoby ( talk) 05:01, 12 September 2008 (UTC).
Thanks for the nice words. No, there is no error of calculation. In this case the probability is known. What does it mean? Out of the elements having feature 6 you pick one element at random. The probability that this element is in collection B is exactly A21/A01. Bo Jacoby ( talk) 18:48, 16 September 2008 (UTC)
Perhaps you want to consider a random element from the population? Then rename the variables. You must distinguish between the number of elements in the population, A000, and inside the sample, A100, and outside the sample, A200, and the corresponding known sample counts, A111, A112, A121, A122, and the unknown counts outside the sample, A211, A212, A221, A222. The probability, that a random element having feature 6 is in collection B, is A021/A001=(A021/A000)/(A001/A000). (A zero in a digit position in the index indicates a summation). Now A021/A000 is the probability that a random element of the population is in collection B and has feature 6, and A001/A000 is the probability that a random element of the population has feature 6. These probabilities are not known but can, (assuming A000>>1) , be estimated by the beta distribution
and
Division gives
I don't know any exact formula for the mean and standard deviation of the quotient between two random variables. Bo Jacoby ( talk) 21:23, 16 September 2008 (UTC).
let ABCD be a convex quadrilateral .Consider the points E and F such that C is the mid point of the line segment AE and D is the mid point of line segment BF. Evaluat with proof ,the ratio of [ABEF] and [ABCD]
ACtually I am not sure what is meant by [ABCD]. —Preceding unsigned comment added by Khubab ( talk • contribs) 22:33, 1 August 2008 (UTC)
Mathematics desk | ||
---|---|---|
< July 31 | << Jul | August | Sep >> | Current desk > |
Welcome to the Wikipedia Mathematics Reference Desk Archives |
---|
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
Hello. I'm a grad student who's been asked to look at some data and come up with some statistics for possible publication, but the data set is small and limited to "present/not present", so I'm not sure how to approach it. To give a little more detail, I have been provided two collections of items, each of which has been attributed with specific features, as follows:
Collection A | Collection B | |
---|---|---|
Feature 1 | 0/30 | 49/50 |
Feature 2 | 5/30 | 45/50 |
Feature 3 | 15/30 | 40/50 |
Feature 4 | 0/30 | 10/50 |
Feature 5 | 20/30 | 0/50 |
My experience with statistics is very limited, and I'm having trouble working with the various statistics articles, so the questions I have are as follows:
Honestly, I'm not even sure where to start, so any help you can provide would be extremely helpful. Many thanks. - But I Played One On TV ( talk) 15:18, 1 August 2008 (UTC)
Hello. Assuming that the collections, A and B, are samples from a big population, you want to know about the population based on knowing the sample. The probability of each feature can be estimated with some uncertainty from the sample data. A collection contains n items of which i have some feature. If you knew the probability, x, that a random item of the population has the feature, then you could compute the probability pi(x) that the collection contains i = 0, 1, 2, ... , n items having that feature. This probability is given by the binomial distribution
This distribution is summarized by it's mean value ± standard deviation:
This means that if you know the population frequency, x, of some feature, then you can estimate the sample frequency of the feature, which is
This formula for estimating i knowing x and n is however not what you want. You want a formula for estimating x knowing i and n. The distribution function for x, knowing i and n is still
apart from an unimportant normalization factor. This distribution function is known as the beta distribution. The mean value of the beta distribution is not , but rather , and the standard deviation of the beta distribution is not but rather So the formula you want is
Substituting your data into this formula gives the following estimates for the population frequencies:
Collection A | Collection B | |
---|---|---|
Feature 1 | 0.03±0.03 | 0.96±0.03 |
Feature 2 | 0.19±0.07 | 0.88±0.04 |
Feature 3 | 0.50±0.09 | 0.79±0.06 |
Feature 4 | 0.03±0.03 | 0.21±0.06 |
Feature 5 | 0.66±0.08 | 0.02±0.02 |
Now you want to know if the two collections can be believed to come from the same population. Is a number from a distribution 0.03±0.03 likely to be equal to a number from a distribution 0.96±0.03? Compute the difference
Is this difference likely to be zero? Zero is 0.93/0.05=18.6 standard deviations away from the mean value of the distribution. This difference is highly significant.
Be warned that different statisticians use different approximations and thus may reach different results. The above approach may not be considered standard by your teacher. Have fun! Bo Jacoby ( talk) 23:16, 1 August 2008 (UTC).
This question has nothing to do with the populations but only with the samples. So the probabilities above are not relevant. You need to go back to the original data material and count how many elements in each of the collections A and B that has the new Feature 6, which is "Feature 1 and Feature 2 and not Feature 5". Then you get four numbers:
and the sums
The probability you ask for is A21/A01. Bo Jacoby ( talk) 05:01, 12 September 2008 (UTC).
Thanks for the nice words. No, there is no error of calculation. In this case the probability is known. What does it mean? Out of the elements having feature 6 you pick one element at random. The probability that this element is in collection B is exactly A21/A01. Bo Jacoby ( talk) 18:48, 16 September 2008 (UTC)
Perhaps you want to consider a random element from the population? Then rename the variables. You must distinguish between the number of elements in the population, A000, and inside the sample, A100, and outside the sample, A200, and the corresponding known sample counts, A111, A112, A121, A122, and the unknown counts outside the sample, A211, A212, A221, A222. The probability, that a random element having feature 6 is in collection B, is A021/A001=(A021/A000)/(A001/A000). (A zero in a digit position in the index indicates a summation). Now A021/A000 is the probability that a random element of the population is in collection B and has feature 6, and A001/A000 is the probability that a random element of the population has feature 6. These probabilities are not known but can, (assuming A000>>1) , be estimated by the beta distribution
and
Division gives
I don't know any exact formula for the mean and standard deviation of the quotient between two random variables. Bo Jacoby ( talk) 21:23, 16 September 2008 (UTC).
let ABCD be a convex quadrilateral .Consider the points E and F such that C is the mid point of the line segment AE and D is the mid point of line segment BF. Evaluat with proof ,the ratio of [ABEF] and [ABCD]
ACtually I am not sure what is meant by [ABCD]. —Preceding unsigned comment added by Khubab ( talk • contribs) 22:33, 1 August 2008 (UTC)