From Wikipedia, the free encyclopedia

In order to map x to linear models, we need to perform non-linear mapping on the inputs.

Options

1. Choose a very generic φ, like an infinite-dimensional that is implicitly used by kernel machines based on the RBF kernel. (Not good for advanced problems)

2. Manually engineer φ.

3. Use deep learning to learn φ. y=f(x;θ, w) =φ(x;θ)Tw

Example: Learning (approximating) XOR function

Goal Function


If we choose a linear model with and , then the model is defined as , which will output and when solved. This isn't very useful to have a function that outputs a constant value over all of the inputs.

If we add a hidden layer, such that and with the complete model being

There must be a non-linear function to be able to turn our linear model into a non-linear. This is usually done by affine transformation followed by a fixed nonlinear activation function.

, where W provides the weight of a linear transformation and c the biases.

The default recommended activation function is rectified linear unit, or ReLU, defined as

The complete network can now be specified as

From Wikipedia, the free encyclopedia

In order to map x to linear models, we need to perform non-linear mapping on the inputs.

Options

1. Choose a very generic φ, like an infinite-dimensional that is implicitly used by kernel machines based on the RBF kernel. (Not good for advanced problems)

2. Manually engineer φ.

3. Use deep learning to learn φ. y=f(x;θ, w) =φ(x;θ)Tw

Example: Learning (approximating) XOR function

Goal Function


If we choose a linear model with and , then the model is defined as , which will output and when solved. This isn't very useful to have a function that outputs a constant value over all of the inputs.

If we add a hidden layer, such that and with the complete model being

There must be a non-linear function to be able to turn our linear model into a non-linear. This is usually done by affine transformation followed by a fixed nonlinear activation function.

, where W provides the weight of a linear transformation and c the biases.

The default recommended activation function is rectified linear unit, or ReLU, defined as

The complete network can now be specified as


Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook