In statistics, the HodgesâLehmann estimator is a robust and nonparametric estimator of a population's location parameter. For populations that are symmetric about one median, such as the Gaussian or normal distribution or the Student t-distribution, the HodgesâLehmann estimator is a consistent and median-unbiased estimate of the population median. For non-symmetric populations, the HodgesâLehmann estimator estimates the " pseudoâmedian", which is closely related to the population median.
The HodgesâLehmann estimator was proposed originally for estimating the location parameter of one-dimensional populations, but it has been used for many more purposes. It has been used to estimate the differences between the members of two populations. It has been generalized from univariate populations to multivariate populations, which produce samples of vectors.
It is based on the Wilcoxon signed-rank statistic. In statistical theory, it was an early example of a rank-based estimator, an important class of estimators both in nonparametric statistics and in robust statistics. The HodgesâLehmann estimator was proposed in 1963 independently by Pranab Kumar Sen and by Joseph Hodges and Erich Lehmann, and so it is also called the "HodgesâLehmannâSen estimator". [1]
In the simplest case, the "HodgesâLehmann" statistic estimates the location parameter for a univariate population. [2] [3] Its computation can be described quickly. For a dataset with n measurements, the set of all possible two-element subsets of it such that †(i.e. specifically including self-pairs; many secondary sources incorrectly omit this detail), which set has n(n + 1)/2 elements. For each such subset, the mean is computed; finally, the median of these n(n + 1)/2 averages is defined to be the HodgesâLehmann estimator of location.
The HodgesâLehmann statistic also estimates the difference between two populations. For two sets of data with m and n observations, the set of two-element sets made of them is their Cartesian product, which contains m Ă n pairs of points (one from each set); each such pair defines one difference of values. The HodgesâLehmann statistic is the median of the m Ă n differences. [4]
For a population that is symmetric, the HodgesâLehmann statistic estimates the population's median. It is a robust statistic that has a breakdown point of 0.29, which means that the statistic remains bounded even if nearly 30 percent of the data have been contaminated. This robustness is an important advantage over the sample mean, which has a zero breakdown point, being proportional to any single observation and so liable to being misled by even one outlier. The sample median is even more robust, having a breakdown point of 0.50. [5] The HodgesâLehmann estimator is much better than the sample mean when estimating mixtures of normal distributions, also. [6]
For symmetric distributions, the HodgesâLehmann statistic has greater efficiency than does the sample median. For the normal distribution, the Hodges-Lehmann statistic is nearly as efficient as the sample mean. For the Cauchy distribution (Student t-distribution with one degree of freedom), the Hodges-Lehmann is infinitely more efficient than the sample mean, which is not a consistent estimator of the median. [5]
For non-symmetric populations, the Hodges-Lehmann statistic estimates the population's "pseudo-median", [7] a location parameter that is closely related to the median. The difference between the median and pseudo-median is relatively small, and so this distinction is neglected in elementary discussions. Like the spatial median, [8] the pseudoâmedian is well defined for all distributions of random variables having dimension two or greater; for one-dimensional distributions, there exists some pseudoâmedian, which need not be unique, however. Like the median, the pseudoâmedian is defined for even heavyâtailed distributions that lack any (finite) mean. [9]
The one-sample HodgesâLehmann statistic need not estimate any population mean, which for many distributions does not exist. The two-sample HodgesâLehmann estimator need not estimate the difference of two means or the difference of two (pseudo-)medians; rather, it estimates the differences between the population of the paired randomâvariables drawn respectively from the populations. [4]
The HodgesâLehmann univariate statistics have several generalizations in multivariate statistics: [10]
In statistics, the HodgesâLehmann estimator is a robust and nonparametric estimator of a population's location parameter. For populations that are symmetric about one median, such as the Gaussian or normal distribution or the Student t-distribution, the HodgesâLehmann estimator is a consistent and median-unbiased estimate of the population median. For non-symmetric populations, the HodgesâLehmann estimator estimates the " pseudoâmedian", which is closely related to the population median.
The HodgesâLehmann estimator was proposed originally for estimating the location parameter of one-dimensional populations, but it has been used for many more purposes. It has been used to estimate the differences between the members of two populations. It has been generalized from univariate populations to multivariate populations, which produce samples of vectors.
It is based on the Wilcoxon signed-rank statistic. In statistical theory, it was an early example of a rank-based estimator, an important class of estimators both in nonparametric statistics and in robust statistics. The HodgesâLehmann estimator was proposed in 1963 independently by Pranab Kumar Sen and by Joseph Hodges and Erich Lehmann, and so it is also called the "HodgesâLehmannâSen estimator". [1]
In the simplest case, the "HodgesâLehmann" statistic estimates the location parameter for a univariate population. [2] [3] Its computation can be described quickly. For a dataset with n measurements, the set of all possible two-element subsets of it such that †(i.e. specifically including self-pairs; many secondary sources incorrectly omit this detail), which set has n(n + 1)/2 elements. For each such subset, the mean is computed; finally, the median of these n(n + 1)/2 averages is defined to be the HodgesâLehmann estimator of location.
The HodgesâLehmann statistic also estimates the difference between two populations. For two sets of data with m and n observations, the set of two-element sets made of them is their Cartesian product, which contains m Ă n pairs of points (one from each set); each such pair defines one difference of values. The HodgesâLehmann statistic is the median of the m Ă n differences. [4]
For a population that is symmetric, the HodgesâLehmann statistic estimates the population's median. It is a robust statistic that has a breakdown point of 0.29, which means that the statistic remains bounded even if nearly 30 percent of the data have been contaminated. This robustness is an important advantage over the sample mean, which has a zero breakdown point, being proportional to any single observation and so liable to being misled by even one outlier. The sample median is even more robust, having a breakdown point of 0.50. [5] The HodgesâLehmann estimator is much better than the sample mean when estimating mixtures of normal distributions, also. [6]
For symmetric distributions, the HodgesâLehmann statistic has greater efficiency than does the sample median. For the normal distribution, the Hodges-Lehmann statistic is nearly as efficient as the sample mean. For the Cauchy distribution (Student t-distribution with one degree of freedom), the Hodges-Lehmann is infinitely more efficient than the sample mean, which is not a consistent estimator of the median. [5]
For non-symmetric populations, the Hodges-Lehmann statistic estimates the population's "pseudo-median", [7] a location parameter that is closely related to the median. The difference between the median and pseudo-median is relatively small, and so this distinction is neglected in elementary discussions. Like the spatial median, [8] the pseudoâmedian is well defined for all distributions of random variables having dimension two or greater; for one-dimensional distributions, there exists some pseudoâmedian, which need not be unique, however. Like the median, the pseudoâmedian is defined for even heavyâtailed distributions that lack any (finite) mean. [9]
The one-sample HodgesâLehmann statistic need not estimate any population mean, which for many distributions does not exist. The two-sample HodgesâLehmann estimator need not estimate the difference of two means or the difference of two (pseudo-)medians; rather, it estimates the differences between the population of the paired randomâvariables drawn respectively from the populations. [4]
The HodgesâLehmann univariate statistics have several generalizations in multivariate statistics: [10]