![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||
|
![]() | This article may be too technical for most readers to understand.(September 2010) |
(Particularly the section about the risk function.)
At the section Example, the text was claiming using variance 2.25, when in fact it is referring to standard deviation (I also suspect it should have been 2.5 instead of 2.25, since 2.25 creates a barely noticeable deviation at the graph, but only the figure author may confirm). I've fixed the term (see Image for justification), yet, in accordance with formula at section Definition, it should also be an average as stated below.
— Preceding unsigned comment added by Rafael Siqueira Telles Vieira ( talk • contribs) 16:37, 25 October 2019 (UTC)
Note that the figure shows rather than as the caption says. --anon
In my experience calling the technique Parzen windowing is limited specifically to time-series analysis, and mainly in engineering fields. In general statistics (and in statistical machine learning), the term kernel density estimation is much more common. Therefore I'd propose it be moved there. As an aside, the attribution to Parzen is also historically problematic, since Rosenblatt introduced the technique into the statistics literature in 1956, and it had been used in several more obscure papers as early as the 1870s, and again in the early 1950s. -- Delirium 22:59, 26 August 2006 (UTC)
What is x in the equation? --11:06, 5 October 2006 (UTC)
The technique called here Parzen window is called kernel density estimation in non parametric statistics. It seems to me to be a much more general term and much clearer for people searching for it. The comment above state the same problem. I also agree that the article should refer to the Parzen-Rosenblatt notion of a kernel, and not just of Parzen. The definition of a Parzen-Rosenblatt kernel should be latter added on the kernel (statistics) page. —The preceding unsigned comment was added by Gpeilon ( talk • contribs).
Hi, I just noticed that the optimal global bandwidth in Rosenblatt, M. The Annals of Mathematical Statistics, Vol. 42, No. 6. (Dec., 1971), pp. 1815-1842. has an additional factor of . Just an oversight, or is there a reason for the difference that I'm missing? Best, Yeteez 18:34, 24 May 2007 (UTC)
In addition, what is the lower case n in the optimal bandwidth, it is undefined. CnlPepper ( talk) 17:18, 13 December 2007 (UTC)
Shouldn't the in the formula for K(x) be dropped, on the grounds that it is already there in the form of h in the formula for ?
-- Santaclaus 15:45, 7 June 2007 (UTC)
Though not sure whether it violates the guidelines of what wikipedia is, I like the example section. But I would like to see the commands in some non-proprietory language, e.g. R. -- Ben T/ C 14:41, 2 July 2007 (UTC)
Can somebody please add a paragraph on what the practical use of Kernel density estimation is? Provide an example from statistics or econometrics? Thanks!
Isn't a Gaussian with variance of 1 totally arbitrary? On the other hand, using the PDF of your measurement tool as a kernel seems quite meaningful. For example, if you are measuring people's heights and know you can measure to a std. dev of 1/4", then convolving the set of measured heights by a Gaussian with std. dev of 1/4" seems like it captures everything you know about the data set. For example, in the limit of one sample, the estimation would reflect our best guess of the distribution for that one person. 155.212.242.34 22:07, 6 November 2007 (UTC)
--> I agree with the above poster that a standard gaussian is arbitrary. True, gaussians are often used as the kernel, but the variance of the gaussian is usually selected based on the "coarseness" of the desired result, and therefore not necessarily 1. —Preceding unsigned comment added by Zarellam ( talk • contribs) 07:08, 17 April 2009 (UTC)
The variance then is the parameter h and can still be chosen as desired. I fixed this on the page. 170.223.0.55 ( talk) 14:57, 27 April 2009 (UTC)
It appears that the section Properties tells us how to select . However, I found several things confusing here, and would like to see these described more clearly.
First, if I'm interpreting correctly, and would be constants for the standard normal kernel that was earlier stated to be the common choice, e.g. and . The fact that these constants for the standard normal were not given confused me and left me thinking that maybe there was a notational inconsistency or something, or that I wasn't interpreting something right. So please, mention what these constants are for the standard kernel choice.
Next and more serious, I'm still confused about . It appears that we're going to find in order to find But apparently must be estimated as a function of . I mean if is the underlying true distribution, which we don't know, then we don't know , so the implication is that we'd need to use , which is defined in terms of . So it seems like has a circular definition. —Preceding unsigned comment added by 98.207.54.162 ( talk) 19:12, 7 February 2009 (UTC)
The description of a histogram as KDE with a boxcar kernel is not entirely accurate. In a histogram the bin centers are fixed, whereas in KDE the kernel is centered on each data point. See this page for more explanation. -- Nubicles ( talk) 04:19, 20 February 2009 (UTC)
Hence I would suggest deleting the sentence about the histogram---I would say it is significantly misleading. -- Spireguy ( talk) 02:25, 19 May 2010 (UTC)
I suggest that the article Multivariate kernel density estimation was merged into this one. They cover essentially the same topic, with all formulas being essentially the same. The multivariate case is arguably more complicated, but it better fits into the subsection of this article, than a standalone topic. Note that currently this page already mentions the multivariate estimation, at least in the examples section. // stpasha » 17:30, 24 September 2010 (UTC)
One consideration in the merge/not-merge question is that of article-length, and there may be some guidelines on this. I think that articles with worthwile content as separate articles would be too long to merge. In addition to the topics mentioned in the discussion above, there some othere that are yet to be mentioned: density esimation with known bounds on the range of values, density estimation for circular data. In any case, it seems that the next move should be to remove the "multivariate" material from the present article and merge it into the multivariate article so that that has a logical structure. Things could be left there, but this preliminary step would presumably make it easier to do a full merge if that were decided on. Melcombe ( talk) 08:55, 27 September 2010 (UTC)
Several changes have been made recently about technical mathematical details in the definition which are not entirely correct.
The example using the old faithful data is flawed and has to be removed. The faithful data is not continuous: out of the 272 observations there are only 51 unique data values. This implies that the data could not have come from a continuous density. The function kde assumes that the data is continuous, not discrete. The theory presented in the article does not apply to such discrete data. Someone please add an appropriate example. —Preceding unsigned comment added by 64.235.198.242 ( talk) 01:24, 12 October 2010 (UTC)
I understand all measurement are discretizations of continuous data, this is precisely the problem!!! If I have $U$ being uniform on $[0,1]$ and discretize it to the binary $B=I_{U<0.5}$, then is $B$ still a continuous random variable. It is not, regardless of $U$. Thus your comment does not change the fact that the data fed to the estimator is discrete, but the estimator smooths it as if it were continuous. The data fed to the kernel estimator should have been continuous (up to floating point accuracy) in the first place. This example shows that this particular kernel estimator cannot distinguish between discrete and continuous data. A valid estimator will smooth continuous data and NOT smooth discrete data. In other words, a properly working kernel estimator will give bandwidth =0 when it is given discrete data. The true distribution of a discrete set of data , say, (2,2,2,3,3,3,3,4,5,6) should not be smoothed. In conclusion, the example is logically flawed: Two mistakes cancel each other to give the appearance of a correct example. The mistakes are: discrete data is fed to the estimator instead of continuous (1 mistake), the effect of which is then canceled out by an improperly working kernel estimator (2 mistake). One has made two different errors that have by chance canceled each other out so that you obtained the correct answer by happenstance. —Preceding unsigned comment added by 132.204.251.179 ( talk) 15:45, 15 October 2010 (UTC)
It is our judgement whether we want to assume that the underlying distribution is actually continuous or discrete (or maybe a mixture?).
In case of the Old Faithful eruptions it is natural to suppose that the eruption intervals is a continuous r.v., since more or less every physical variable is continuous (except for those that by construction take values in a discrete set -- such as the number of particles in a certain process, or certain quantum numbers, etc). In the dataset that we have the eruption intervals were rounded up to a nearest integer. Which really cannot stop us from using the kernel estimator to recover the density. // stpasha » 02:47, 19 October 2010 (UTC)
kde. You can use a discrete one if you want to put probability mass on other integer values, but you have to be clear from the beginning weather the data is assumed to be discrete or continuous. I am afraid professional statisticians get confused on this issue. Perhaps it is because they know little about smoothing of discrete data. The fact that many people like you think the example is OK does not make it OK. Popular opinion does not justify incorrect mathematics. Unfortunately, I do not have time to argue with the prevailing view, no matter how wrong it is, so I will let this one go and hope people will see reason at some future point in time.
This is all a moot point if the bandwidth is >> than the discretization error. In the other limit, where the bandwidth is << than the discretization error, the flaw in applying KDE will become very apparent. ---( 192.161.76.82 ( talk)) —Preceding undated comment added 23:48, 18 October 2017 (UTC)
a) Discussion of boundary effects. These effects are important in many practical applications. Statements in the article about optimal AMISE rates are incorrect in the presence of boundary bias.
b) High-order kernels. If you are willing to assume that the kernel is not necessarily positive, these should be mentioned. Again, statements in the article about optimal AMISE rates are incorrect if high-order kernels are allowed (these kernels lead to better asymptotic convergence rates). — Preceding unsigned comment added by 95.32.237.124 ( talk) 22:23, 19 August 2011 (UTC)
Maybe if someone could check : h <- hpi(x=waiting) function hpi is not accesable (maybe removed) plot(fhat, drawpoints=TRUE) drawpoints is not longer a parameter — Preceding unsigned comment added by 143.169.52.102 ( talk) 09:45, 14 August 2012 (UTC)
This article mentions: “after [Emanuel Parzen] and [Murray Rosenblatt], who are usually credited with independently creating it in its current form”. This is not correct. Parzen refers to Rosenblatt (so clearly NOT independent, and btw both papers are in the same journal), and the paper of Parzen does not provide the simple equations that we find in this paper.
If there is no objection, I'd like to correct this. Phdb ( talk) 22:08, 25 September 2011 (UTC)
The first sentence says "In statistics, kernel density estimation (KDE) is a **non-parametric** way to estimate the probability density function of a random variable." Then further down is: "The bandwidth of the kernel is a **free parameter** which exhibits a strong influence on the resulting estimate." So which is it? — Preceding unsigned comment added by 115.187.246.104 ( talk) 10:05, 3 June 2014 (UTC)
It is sort of parametric though in that it relies on an assumption about the smootheness of the distribution... which is equivalent to parameterizing the distribution by the amplitudes of the kernels. Maybe a clarification would be helpful? 2601:4A:857F:4880:F8EC:DAAF:2844:296F ( talk) 23:57, 12 December 2020 (UTC)
"K(•) is the kernel — a symmetric but not necessarily positive function that integrates to one"
Shouldn't it be the exact opposite: "K(•) is the kernel — a positive but not necessarily symmetric function that integrates to one"?
-- Lucas Gallindo ( talk) 21:40, 18 October 2014 (UTC)
Yes, I agree. I'll change it.
DWeissman (
talk)
21:20, 25 November 2014 (UTC)
Dr. Guillen has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:
I would insert a note at the end of the Badnwidth selection section.
Bandwidth selection for kernel density estimation in heavy-tailed distributions is difficult.
I would insert this reference Bolance, C., Buch-Larsen, T., Guillén, M. and Nielsen, J.P. (2005) “Kernel density estimation for heavy-tailed distributions using the Champernowne transformation” Statistics, 39, 6, 503-518.
We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.
Dr. Guillen has published scholarly research which seems to be relevant to this Wikipedia article:
ExpertIdeasBot ( talk) 06:47, 8 July 2015 (UTC)
Done! -
User:Harish victory
09:17, 15 November 2015 (UTC)
This section starts off with an incorrect claim:
> Kernel density estimation may have certain limitations because it is based on Gaussian distribution statistics
There are KDE methods that use the Gaussian distribution as a starting point (pilot estimator). There are KDE methods that use Gaussian kernels. These are not necessary or exhaustive choices. There are also KDE methods for heavy-tailed distributions, such as variable bandwidth KDE and data transformations.
Then there is the unsupported claim:
> In this connection, head/tail breaks[27] and its deduced TIN-based density estimation[28] can better characterize the heavy-tailed data.
Looking at the two references on the ArXiv, how can you say that the TIN method better characterizes a certain kind of data when (1) you never compared method TIN to KDE (nether paper mentions KDE) and (2) have not worked out the statistical properties of method TIN (not published in a stats journal btw)?
---( 192.161.76.82 ( talk)) —Preceding undated comment added 23:06, 18 October 2017 (UTC)
I guess it must be "Intuitively one wants to choose h as large as the data will allow" in the Definition section, because you can let go h to zero, achieving a perfect fit between data and probability density (sum of delta functions). Am I right or wrong? Syspedia ( talk) 10:12, 16 November 2017 (UTC)
Histogram bin edges need to be defined: -4, -2, 0, ..., 8.
Also, if there are 6 data points, why are the bars 1/12 high? — Preceding unsigned comment added by 2600:1700:EB40:86D0:707F:873F:613E:BAB8 ( talk) 00:01, 18 December 2021 (UTC)
Tree-structured Parzen estimators link here, but shouldn't they have their own article? There is more to say, see here: https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f Biggerj1 ( talk) 21:04, 12 October 2022 (UTC)
![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||
|
![]() | This article may be too technical for most readers to understand.(September 2010) |
(Particularly the section about the risk function.)
At the section Example, the text was claiming using variance 2.25, when in fact it is referring to standard deviation (I also suspect it should have been 2.5 instead of 2.25, since 2.25 creates a barely noticeable deviation at the graph, but only the figure author may confirm). I've fixed the term (see Image for justification), yet, in accordance with formula at section Definition, it should also be an average as stated below.
— Preceding unsigned comment added by Rafael Siqueira Telles Vieira ( talk • contribs) 16:37, 25 October 2019 (UTC)
Note that the figure shows rather than as the caption says. --anon
In my experience calling the technique Parzen windowing is limited specifically to time-series analysis, and mainly in engineering fields. In general statistics (and in statistical machine learning), the term kernel density estimation is much more common. Therefore I'd propose it be moved there. As an aside, the attribution to Parzen is also historically problematic, since Rosenblatt introduced the technique into the statistics literature in 1956, and it had been used in several more obscure papers as early as the 1870s, and again in the early 1950s. -- Delirium 22:59, 26 August 2006 (UTC)
What is x in the equation? --11:06, 5 October 2006 (UTC)
The technique called here Parzen window is called kernel density estimation in non parametric statistics. It seems to me to be a much more general term and much clearer for people searching for it. The comment above state the same problem. I also agree that the article should refer to the Parzen-Rosenblatt notion of a kernel, and not just of Parzen. The definition of a Parzen-Rosenblatt kernel should be latter added on the kernel (statistics) page. —The preceding unsigned comment was added by Gpeilon ( talk • contribs).
Hi, I just noticed that the optimal global bandwidth in Rosenblatt, M. The Annals of Mathematical Statistics, Vol. 42, No. 6. (Dec., 1971), pp. 1815-1842. has an additional factor of . Just an oversight, or is there a reason for the difference that I'm missing? Best, Yeteez 18:34, 24 May 2007 (UTC)
In addition, what is the lower case n in the optimal bandwidth, it is undefined. CnlPepper ( talk) 17:18, 13 December 2007 (UTC)
Shouldn't the in the formula for K(x) be dropped, on the grounds that it is already there in the form of h in the formula for ?
-- Santaclaus 15:45, 7 June 2007 (UTC)
Though not sure whether it violates the guidelines of what wikipedia is, I like the example section. But I would like to see the commands in some non-proprietory language, e.g. R. -- Ben T/ C 14:41, 2 July 2007 (UTC)
Can somebody please add a paragraph on what the practical use of Kernel density estimation is? Provide an example from statistics or econometrics? Thanks!
Isn't a Gaussian with variance of 1 totally arbitrary? On the other hand, using the PDF of your measurement tool as a kernel seems quite meaningful. For example, if you are measuring people's heights and know you can measure to a std. dev of 1/4", then convolving the set of measured heights by a Gaussian with std. dev of 1/4" seems like it captures everything you know about the data set. For example, in the limit of one sample, the estimation would reflect our best guess of the distribution for that one person. 155.212.242.34 22:07, 6 November 2007 (UTC)
--> I agree with the above poster that a standard gaussian is arbitrary. True, gaussians are often used as the kernel, but the variance of the gaussian is usually selected based on the "coarseness" of the desired result, and therefore not necessarily 1. —Preceding unsigned comment added by Zarellam ( talk • contribs) 07:08, 17 April 2009 (UTC)
The variance then is the parameter h and can still be chosen as desired. I fixed this on the page. 170.223.0.55 ( talk) 14:57, 27 April 2009 (UTC)
It appears that the section Properties tells us how to select . However, I found several things confusing here, and would like to see these described more clearly.
First, if I'm interpreting correctly, and would be constants for the standard normal kernel that was earlier stated to be the common choice, e.g. and . The fact that these constants for the standard normal were not given confused me and left me thinking that maybe there was a notational inconsistency or something, or that I wasn't interpreting something right. So please, mention what these constants are for the standard kernel choice.
Next and more serious, I'm still confused about . It appears that we're going to find in order to find But apparently must be estimated as a function of . I mean if is the underlying true distribution, which we don't know, then we don't know , so the implication is that we'd need to use , which is defined in terms of . So it seems like has a circular definition. —Preceding unsigned comment added by 98.207.54.162 ( talk) 19:12, 7 February 2009 (UTC)
The description of a histogram as KDE with a boxcar kernel is not entirely accurate. In a histogram the bin centers are fixed, whereas in KDE the kernel is centered on each data point. See this page for more explanation. -- Nubicles ( talk) 04:19, 20 February 2009 (UTC)
Hence I would suggest deleting the sentence about the histogram---I would say it is significantly misleading. -- Spireguy ( talk) 02:25, 19 May 2010 (UTC)
I suggest that the article Multivariate kernel density estimation was merged into this one. They cover essentially the same topic, with all formulas being essentially the same. The multivariate case is arguably more complicated, but it better fits into the subsection of this article, than a standalone topic. Note that currently this page already mentions the multivariate estimation, at least in the examples section. // stpasha » 17:30, 24 September 2010 (UTC)
One consideration in the merge/not-merge question is that of article-length, and there may be some guidelines on this. I think that articles with worthwile content as separate articles would be too long to merge. In addition to the topics mentioned in the discussion above, there some othere that are yet to be mentioned: density esimation with known bounds on the range of values, density estimation for circular data. In any case, it seems that the next move should be to remove the "multivariate" material from the present article and merge it into the multivariate article so that that has a logical structure. Things could be left there, but this preliminary step would presumably make it easier to do a full merge if that were decided on. Melcombe ( talk) 08:55, 27 September 2010 (UTC)
Several changes have been made recently about technical mathematical details in the definition which are not entirely correct.
The example using the old faithful data is flawed and has to be removed. The faithful data is not continuous: out of the 272 observations there are only 51 unique data values. This implies that the data could not have come from a continuous density. The function kde assumes that the data is continuous, not discrete. The theory presented in the article does not apply to such discrete data. Someone please add an appropriate example. —Preceding unsigned comment added by 64.235.198.242 ( talk) 01:24, 12 October 2010 (UTC)
I understand all measurement are discretizations of continuous data, this is precisely the problem!!! If I have $U$ being uniform on $[0,1]$ and discretize it to the binary $B=I_{U<0.5}$, then is $B$ still a continuous random variable. It is not, regardless of $U$. Thus your comment does not change the fact that the data fed to the estimator is discrete, but the estimator smooths it as if it were continuous. The data fed to the kernel estimator should have been continuous (up to floating point accuracy) in the first place. This example shows that this particular kernel estimator cannot distinguish between discrete and continuous data. A valid estimator will smooth continuous data and NOT smooth discrete data. In other words, a properly working kernel estimator will give bandwidth =0 when it is given discrete data. The true distribution of a discrete set of data , say, (2,2,2,3,3,3,3,4,5,6) should not be smoothed. In conclusion, the example is logically flawed: Two mistakes cancel each other to give the appearance of a correct example. The mistakes are: discrete data is fed to the estimator instead of continuous (1 mistake), the effect of which is then canceled out by an improperly working kernel estimator (2 mistake). One has made two different errors that have by chance canceled each other out so that you obtained the correct answer by happenstance. —Preceding unsigned comment added by 132.204.251.179 ( talk) 15:45, 15 October 2010 (UTC)
It is our judgement whether we want to assume that the underlying distribution is actually continuous or discrete (or maybe a mixture?).
In case of the Old Faithful eruptions it is natural to suppose that the eruption intervals is a continuous r.v., since more or less every physical variable is continuous (except for those that by construction take values in a discrete set -- such as the number of particles in a certain process, or certain quantum numbers, etc). In the dataset that we have the eruption intervals were rounded up to a nearest integer. Which really cannot stop us from using the kernel estimator to recover the density. // stpasha » 02:47, 19 October 2010 (UTC)
kde. You can use a discrete one if you want to put probability mass on other integer values, but you have to be clear from the beginning weather the data is assumed to be discrete or continuous. I am afraid professional statisticians get confused on this issue. Perhaps it is because they know little about smoothing of discrete data. The fact that many people like you think the example is OK does not make it OK. Popular opinion does not justify incorrect mathematics. Unfortunately, I do not have time to argue with the prevailing view, no matter how wrong it is, so I will let this one go and hope people will see reason at some future point in time.
This is all a moot point if the bandwidth is >> than the discretization error. In the other limit, where the bandwidth is << than the discretization error, the flaw in applying KDE will become very apparent. ---( 192.161.76.82 ( talk)) —Preceding undated comment added 23:48, 18 October 2017 (UTC)
a) Discussion of boundary effects. These effects are important in many practical applications. Statements in the article about optimal AMISE rates are incorrect in the presence of boundary bias.
b) High-order kernels. If you are willing to assume that the kernel is not necessarily positive, these should be mentioned. Again, statements in the article about optimal AMISE rates are incorrect if high-order kernels are allowed (these kernels lead to better asymptotic convergence rates). — Preceding unsigned comment added by 95.32.237.124 ( talk) 22:23, 19 August 2011 (UTC)
Maybe if someone could check : h <- hpi(x=waiting) function hpi is not accesable (maybe removed) plot(fhat, drawpoints=TRUE) drawpoints is not longer a parameter — Preceding unsigned comment added by 143.169.52.102 ( talk) 09:45, 14 August 2012 (UTC)
This article mentions: “after [Emanuel Parzen] and [Murray Rosenblatt], who are usually credited with independently creating it in its current form”. This is not correct. Parzen refers to Rosenblatt (so clearly NOT independent, and btw both papers are in the same journal), and the paper of Parzen does not provide the simple equations that we find in this paper.
If there is no objection, I'd like to correct this. Phdb ( talk) 22:08, 25 September 2011 (UTC)
The first sentence says "In statistics, kernel density estimation (KDE) is a **non-parametric** way to estimate the probability density function of a random variable." Then further down is: "The bandwidth of the kernel is a **free parameter** which exhibits a strong influence on the resulting estimate." So which is it? — Preceding unsigned comment added by 115.187.246.104 ( talk) 10:05, 3 June 2014 (UTC)
It is sort of parametric though in that it relies on an assumption about the smootheness of the distribution... which is equivalent to parameterizing the distribution by the amplitudes of the kernels. Maybe a clarification would be helpful? 2601:4A:857F:4880:F8EC:DAAF:2844:296F ( talk) 23:57, 12 December 2020 (UTC)
"K(•) is the kernel — a symmetric but not necessarily positive function that integrates to one"
Shouldn't it be the exact opposite: "K(•) is the kernel — a positive but not necessarily symmetric function that integrates to one"?
-- Lucas Gallindo ( talk) 21:40, 18 October 2014 (UTC)
Yes, I agree. I'll change it.
DWeissman (
talk)
21:20, 25 November 2014 (UTC)
Dr. Guillen has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:
I would insert a note at the end of the Badnwidth selection section.
Bandwidth selection for kernel density estimation in heavy-tailed distributions is difficult.
I would insert this reference Bolance, C., Buch-Larsen, T., Guillén, M. and Nielsen, J.P. (2005) “Kernel density estimation for heavy-tailed distributions using the Champernowne transformation” Statistics, 39, 6, 503-518.
We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.
Dr. Guillen has published scholarly research which seems to be relevant to this Wikipedia article:
ExpertIdeasBot ( talk) 06:47, 8 July 2015 (UTC)
Done! -
User:Harish victory
09:17, 15 November 2015 (UTC)
This section starts off with an incorrect claim:
> Kernel density estimation may have certain limitations because it is based on Gaussian distribution statistics
There are KDE methods that use the Gaussian distribution as a starting point (pilot estimator). There are KDE methods that use Gaussian kernels. These are not necessary or exhaustive choices. There are also KDE methods for heavy-tailed distributions, such as variable bandwidth KDE and data transformations.
Then there is the unsupported claim:
> In this connection, head/tail breaks[27] and its deduced TIN-based density estimation[28] can better characterize the heavy-tailed data.
Looking at the two references on the ArXiv, how can you say that the TIN method better characterizes a certain kind of data when (1) you never compared method TIN to KDE (nether paper mentions KDE) and (2) have not worked out the statistical properties of method TIN (not published in a stats journal btw)?
---( 192.161.76.82 ( talk)) —Preceding undated comment added 23:06, 18 October 2017 (UTC)
I guess it must be "Intuitively one wants to choose h as large as the data will allow" in the Definition section, because you can let go h to zero, achieving a perfect fit between data and probability density (sum of delta functions). Am I right or wrong? Syspedia ( talk) 10:12, 16 November 2017 (UTC)
Histogram bin edges need to be defined: -4, -2, 0, ..., 8.
Also, if there are 6 data points, why are the bars 1/12 high? — Preceding unsigned comment added by 2600:1700:EB40:86D0:707F:873F:613E:BAB8 ( talk) 00:01, 18 December 2021 (UTC)
Tree-structured Parzen estimators link here, but shouldn't they have their own article? There is more to say, see here: https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f Biggerj1 ( talk) 21:04, 12 October 2022 (UTC)