This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
|
Shouldn't the article define what the problem is? -- Doradus ( talk) 02:29, 23 January 2015 (UTC)
Perhaps it would be useful to cite other solutions, such as:
What do you think? Miniquark ( talk) —Preceding undated comment added 12:15, 9 August 2016 (UTC)
How many nodes in an unfolded RNN are viable without LSTM? i.e. where is the practical cut off point where the gradient hasn't vanished? There must be some rule of thumb that if your patterns in time occur in less than N samples then you can use RNN. If greater than M samples you are better off with LSTM? robertbowerman ( talk) 04:30, 9 February 2017 (UTC)
I really don't see the point of having both vanishing gradient and exploding gradient pages. We just have two inbound redirects, and bold both inbound terms in the lead. Should be fine IMO. — MaxEnt 00:10, 21 May 2017 (UTC)
It is a well known problem in ML and pretty much everyone calls it the vanishing gradient problem. Sometimes they'll say vanishing/exploding gradient problem, but even that is rare. I've never heard it called the extreme gradient problem. Themumblingprophet ( talk) 02:21, 15 April 2020 (UTC)
Fundamentally, the problem is about attractors in the parameter space of the error function; the problematic regions are stabilisers when you consider derivatives of this space parallel to various axes. This perspective is probably more abstract than the level at which most programmers operate, whereas "vanishing gradient" is reasonably concrete. 80.230.156.224 ( talk) 08:55, 7 May 2024 (UTC)
This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
|
Shouldn't the article define what the problem is? -- Doradus ( talk) 02:29, 23 January 2015 (UTC)
Perhaps it would be useful to cite other solutions, such as:
What do you think? Miniquark ( talk) —Preceding undated comment added 12:15, 9 August 2016 (UTC)
How many nodes in an unfolded RNN are viable without LSTM? i.e. where is the practical cut off point where the gradient hasn't vanished? There must be some rule of thumb that if your patterns in time occur in less than N samples then you can use RNN. If greater than M samples you are better off with LSTM? robertbowerman ( talk) 04:30, 9 February 2017 (UTC)
I really don't see the point of having both vanishing gradient and exploding gradient pages. We just have two inbound redirects, and bold both inbound terms in the lead. Should be fine IMO. — MaxEnt 00:10, 21 May 2017 (UTC)
It is a well known problem in ML and pretty much everyone calls it the vanishing gradient problem. Sometimes they'll say vanishing/exploding gradient problem, but even that is rare. I've never heard it called the extreme gradient problem. Themumblingprophet ( talk) 02:21, 15 April 2020 (UTC)
Fundamentally, the problem is about attractors in the parameter space of the error function; the problematic regions are stabilisers when you consider derivatives of this space parallel to various axes. This perspective is probably more abstract than the level at which most programmers operate, whereas "vanishing gradient" is reasonably concrete. 80.230.156.224 ( talk) 08:55, 7 May 2024 (UTC)