![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
During backpropagation learning for the normal path
and for the skipper paths (note that they are close to identical)
In both cases we have
If the skippers have fixed weights, then they will not be updated. If they can be updated, then the rule will be an ordinary backprop update rule.
In the general case there can be skipper weight matrices, thus
As the learning rules are similar the weight matrices can be merged and learned in the same step.— Preceding unsigned comment added by Petkond ( talk • contribs) 23:58, 19 August 2018 (UTC)
I wrote During later learning it will stay closer to the manifold and thus learn faster.
but now it is Towards the end of training, when all layers are expanded, it stays closer to the manifold and thus learns faster.
I would say the rephrasing is wrong. Initial learning with skipped layers will bring the solution somewhat close to the manifold. When skipping is progressively dropped, with further learning in progress, then the network will stay close to the manifold during this learning. Staying close to the manifold is not something that only happen during final training.
Jeblad (
talk)
20:27, 6 March 2019 (UTC)
I wrote The intuition on why this work is that the neural network collapses into fewer layers in the initial phase, which makes it easier to learn, and then gradually expands as it learns more of the feature space.
which is now Skipping effectively compresses the network into fewer layers in the initial training stages, which speeds learning.
I believe it is wrong to say this is a compression of layers, as there are no learned network to be compressed at this point. It would be more correct to say that the initial simplified network, is easier to learn due to less vanishing gradients, is gradually expanded into a more complex network.
Jeblad (
talk)
20:33, 6 March 2019 (UTC)
Agree "simplified" makes more sense than "compressed". I think the idea of the network being (effectively) expanded as training progresses is conveyed by the rest of the paragraph, no? AliShug ( talk) 01:07, 9 March 2019 (UTC)
I have no idea why DenseNets are linked to Sparse network. DenseNets is a moinker used for a specific way to implement residual neural networks. If the link text had been "dense networks" it could have made sense to link to an opposite. Jeblad ( talk) 20:51, 6 March 2019 (UTC)
The biological analog section seems to say that cortical layer VI neurons receive significant input from layer I; I haven't been able to find any references for this. The notion that 'skip' synapses exist in biology does seem to be supported, but I haven't been able to find any existing sources that explicitly compare residual ANNs with biological systems - if this section is speculation, it should be removed. Any source (even a blog post) would be fine. AliShug ( talk) 22:03, 11 March 2019 (UTC)
![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
During backpropagation learning for the normal path
and for the skipper paths (note that they are close to identical)
In both cases we have
If the skippers have fixed weights, then they will not be updated. If they can be updated, then the rule will be an ordinary backprop update rule.
In the general case there can be skipper weight matrices, thus
As the learning rules are similar the weight matrices can be merged and learned in the same step.— Preceding unsigned comment added by Petkond ( talk • contribs) 23:58, 19 August 2018 (UTC)
I wrote During later learning it will stay closer to the manifold and thus learn faster.
but now it is Towards the end of training, when all layers are expanded, it stays closer to the manifold and thus learns faster.
I would say the rephrasing is wrong. Initial learning with skipped layers will bring the solution somewhat close to the manifold. When skipping is progressively dropped, with further learning in progress, then the network will stay close to the manifold during this learning. Staying close to the manifold is not something that only happen during final training.
Jeblad (
talk)
20:27, 6 March 2019 (UTC)
I wrote The intuition on why this work is that the neural network collapses into fewer layers in the initial phase, which makes it easier to learn, and then gradually expands as it learns more of the feature space.
which is now Skipping effectively compresses the network into fewer layers in the initial training stages, which speeds learning.
I believe it is wrong to say this is a compression of layers, as there are no learned network to be compressed at this point. It would be more correct to say that the initial simplified network, is easier to learn due to less vanishing gradients, is gradually expanded into a more complex network.
Jeblad (
talk)
20:33, 6 March 2019 (UTC)
Agree "simplified" makes more sense than "compressed". I think the idea of the network being (effectively) expanded as training progresses is conveyed by the rest of the paragraph, no? AliShug ( talk) 01:07, 9 March 2019 (UTC)
I have no idea why DenseNets are linked to Sparse network. DenseNets is a moinker used for a specific way to implement residual neural networks. If the link text had been "dense networks" it could have made sense to link to an opposite. Jeblad ( talk) 20:51, 6 March 2019 (UTC)
The biological analog section seems to say that cortical layer VI neurons receive significant input from layer I; I haven't been able to find any references for this. The notion that 'skip' synapses exist in biology does seem to be supported, but I haven't been able to find any existing sources that explicitly compare residual ANNs with biological systems - if this section is speculation, it should be removed. Any source (even a blog post) would be fine. AliShug ( talk) 22:03, 11 March 2019 (UTC)