In statistics and machine learning, double descent is the phenomenon where a statistical model with a small number of parameters and a model with an extremely large number of parameters have a small error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a large error. [2]
Early observations of double descent in specific models date back to 1989, [3] [4] while the double descent phenomenon as a broader concept shared by many models gained popularity around 2019. [5] [6] [7] The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant error (an extrapolation of bias-variance tradeoff), [8] and the empirical observations in the 2010s that some modern machine learning models tend to perform better with larger models. [6] [9]
[10] shows that double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.
A model of double descent at the thermodynamic limit has been analyzed by the replica method, and the result has been confirmed numerically. [11]
The scaling behavior of double descent has been found to follow a broken neural scaling law [12] functional form.
Part of a series on |
Machine learning and data mining |
---|
In statistics and machine learning, double descent is the phenomenon where a statistical model with a small number of parameters and a model with an extremely large number of parameters have a small error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a large error. [2]
Early observations of double descent in specific models date back to 1989, [3] [4] while the double descent phenomenon as a broader concept shared by many models gained popularity around 2019. [5] [6] [7] The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant error (an extrapolation of bias-variance tradeoff), [8] and the empirical observations in the 2010s that some modern machine learning models tend to perform better with larger models. [6] [9]
[10] shows that double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.
A model of double descent at the thermodynamic limit has been analyzed by the replica method, and the result has been confirmed numerically. [11]
The scaling behavior of double descent has been found to follow a broken neural scaling law [12] functional form.
Part of a series on |
Machine learning and data mining |
---|