![]() | Statistics Project‑class | ||||||
|
There are several things that could be discussed:
Is it ok to discourage the use of capital letters to denote the random variables? It certainly seems that statisticians don't use that convention anymore. When we have a sample x1, …, xn then each of these quantities is itself a random variable, an iid copy from some common distribution F.
Really? I guess it depends what area you're in, but capitalisation still seems pretty prominent. Also, they are capitalised in both the Notation in probability and statistics and random variable articles. — 3mta3 ( talk) 13:06, 16 May 2010 (UTC)
It seems a common practice to use square brackets with the expectation: E[x], whereas for variance it is half/half: either Var[x] or Var(x), for covariance it is already mostly the parentheses Cov(x, y). So what should our recommendation be?
The symbols for expectation, variance and covariance are all traditionally uppercase: E, Var, Cov; whereas correlation is almost always seen in lowercase: corr. Should we convert it to uppercase as well, or leave it be?
The common symbol in statistics to denote transposition is ′: x′. This contradicts the
MOS:MATH recommendation which is to use the \top
or xT.
Some common distributions such as normal N(μ, σ2), t-distribution tk, F-distribution Fk,ℓ, chi-squared χk2, uniform U(a, b) are all traditionally written in italic. For other, not-so-common distributions, there is no tradition. Should we require them to be in italic as well (eg: Poisson, Exponential, Binomial)? If so should they be abbreviated if possible (eg: Poi, Exp, B)?
I think there is a critical distinction between the parameter, such as λ of the exponential distribution, and the degrees of freedom, such as ν of the t-distribution. In practice the λ of the exponential is rarely known, so it has to be estimated. This is why it is a parameter, and it is for example meaningful to ask what is the Hessian of the log-likelihood of the distribution with respect to this parameter. On the other hand the degrees of freedom “parameter” is not a true parameter since it is always known beforehand in applications and never estimated. In particular the Fisher information with respect to this ν does not exist (although technically it could probably be calculated). The distinction between these “estimable” and “non-estimable” parameters is that the former are given in parentheses, like N(μ, σ2), while the latter as a subscript: tk. If we make this into a rule, then some of the distributions will have to be changed, for example the binomial Bn(p).
Both n or T are viable symbols to denote the sample size. The T is more frequent in time series models, whereas n in iid settings. However it should be forbidden to use these symbols to denote anything else other than the sample size (for example like the Numerical methods for linear least squares article), otherwise it would cause too much confusion.
The content of the Mathematical Formulas section, while sensible advice, is not really specific to statistics (or probability). I would propose adding it to MOS:MATH. Alternatively, move it to the bottom under a General Advice section, to make the statistics-specific guidance more prominent. — 3mta3 ( talk) 12:51, 16 May 2010 (UTC)
![]() | Statistics Project‑class | ||||||
|
There are several things that could be discussed:
Is it ok to discourage the use of capital letters to denote the random variables? It certainly seems that statisticians don't use that convention anymore. When we have a sample x1, …, xn then each of these quantities is itself a random variable, an iid copy from some common distribution F.
Really? I guess it depends what area you're in, but capitalisation still seems pretty prominent. Also, they are capitalised in both the Notation in probability and statistics and random variable articles. — 3mta3 ( talk) 13:06, 16 May 2010 (UTC)
It seems a common practice to use square brackets with the expectation: E[x], whereas for variance it is half/half: either Var[x] or Var(x), for covariance it is already mostly the parentheses Cov(x, y). So what should our recommendation be?
The symbols for expectation, variance and covariance are all traditionally uppercase: E, Var, Cov; whereas correlation is almost always seen in lowercase: corr. Should we convert it to uppercase as well, or leave it be?
The common symbol in statistics to denote transposition is ′: x′. This contradicts the
MOS:MATH recommendation which is to use the \top
or xT.
Some common distributions such as normal N(μ, σ2), t-distribution tk, F-distribution Fk,ℓ, chi-squared χk2, uniform U(a, b) are all traditionally written in italic. For other, not-so-common distributions, there is no tradition. Should we require them to be in italic as well (eg: Poisson, Exponential, Binomial)? If so should they be abbreviated if possible (eg: Poi, Exp, B)?
I think there is a critical distinction between the parameter, such as λ of the exponential distribution, and the degrees of freedom, such as ν of the t-distribution. In practice the λ of the exponential is rarely known, so it has to be estimated. This is why it is a parameter, and it is for example meaningful to ask what is the Hessian of the log-likelihood of the distribution with respect to this parameter. On the other hand the degrees of freedom “parameter” is not a true parameter since it is always known beforehand in applications and never estimated. In particular the Fisher information with respect to this ν does not exist (although technically it could probably be calculated). The distinction between these “estimable” and “non-estimable” parameters is that the former are given in parentheses, like N(μ, σ2), while the latter as a subscript: tk. If we make this into a rule, then some of the distributions will have to be changed, for example the binomial Bn(p).
Both n or T are viable symbols to denote the sample size. The T is more frequent in time series models, whereas n in iid settings. However it should be forbidden to use these symbols to denote anything else other than the sample size (for example like the Numerical methods for linear least squares article), otherwise it would cause too much confusion.
The content of the Mathematical Formulas section, while sensible advice, is not really specific to statistics (or probability). I would propose adding it to MOS:MATH. Alternatively, move it to the bottom under a General Advice section, to make the statistics-specific guidance more prominent. — 3mta3 ( talk) 12:51, 16 May 2010 (UTC)