Parameters |
(
integer) ( real) ( real) | ||
---|---|---|---|
Support | |||
PMF | |||
CDF | |||
Mean | |||
Mode | |||
Entropy |
In probability theory and statistics, the ZipfâMandelbrot law is a discrete probability distribution. Also known as the ParetoâZipf law, it is a power-law distribution on ranked data, named after the linguist George Kingsley Zipf, who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot, who subsequently generalized it.
The probability mass function is given by
where is given by
which may be thought of as a generalization of a harmonic number. In the formula, is the rank of the data, and and are parameters of the distribution. In the limit as approaches infinity, this becomes the Hurwitz zeta function . For finite and the ZipfâMandelbrot law becomes Zipf's law. For infinite and it becomes a zeta distribution.
The distribution of words ranked by their frequency in a random text corpus is approximated by a power-law distribution, known as Zipf's law.
If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001). Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with s = 1 does not converge, while the ZipfâMandelbrot generalization with s > 1 does. Furthermore, there is evidence that the closed class of functional words that define a language obeys a ZipfâMandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register. [1]
In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a ZipfâMandelbrot law. [2]
Within music, many metrics of measuring "pleasing" music conform to ZipfâMandelbrot distributions. [3]
Parameters |
(
integer) ( real) ( real) | ||
---|---|---|---|
Support | |||
PMF | |||
CDF | |||
Mean | |||
Mode | |||
Entropy |
In probability theory and statistics, the ZipfâMandelbrot law is a discrete probability distribution. Also known as the ParetoâZipf law, it is a power-law distribution on ranked data, named after the linguist George Kingsley Zipf, who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot, who subsequently generalized it.
The probability mass function is given by
where is given by
which may be thought of as a generalization of a harmonic number. In the formula, is the rank of the data, and and are parameters of the distribution. In the limit as approaches infinity, this becomes the Hurwitz zeta function . For finite and the ZipfâMandelbrot law becomes Zipf's law. For infinite and it becomes a zeta distribution.
The distribution of words ranked by their frequency in a random text corpus is approximated by a power-law distribution, known as Zipf's law.
If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001). Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with s = 1 does not converge, while the ZipfâMandelbrot generalization with s > 1 does. Furthermore, there is evidence that the closed class of functional words that define a language obeys a ZipfâMandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register. [1]
In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a ZipfâMandelbrot law. [2]
Within music, many metrics of measuring "pleasing" music conform to ZipfâMandelbrot distributions. [3]