AFAIK, it's using a Deep Neural Network; which means, the inputs are, basically,...

boomzilla · on July 29, 2014

yep, they try to learn an image's high level features by learning an autoencoder (that is a transform that takes an image and tries to produce the same image) via a sandglass shape multi layer network. Here is a very readable paper by Hinton himself that describes the approach:

http://www.cs.toronto.edu/~hinton/science.pdf

tiger10guy · on July 29, 2014

I'm pretty sure there's not an autoencoder involved, it just looks like a vanilla conv net.

This is the implementation: http://torontodeeplearning.github.io/convnet/

3rd3 · on July 29, 2014

Could it maybe be worthwhile to augment the data with simple image features? E.g. the human visual system is believed to rely on high-level/top down as well as on local/bottom up features (although that might also be simply because of the necessity to compress things for the low nerve count in the optical nerve).

agibsonccc · on July 29, 2014

A Deep Net (to be specific: a deep belief network which is a series of stacked RBMs, not Stacked Denoising AutoEncoders for clarification that there's a difference) usually can benefit from a moving window approach (slicing up an image in to chunks) to simulate a convolutional net. This can help a deep net generalize better.

That being said: even deep learning requires some sort of feature engineering at times (even if its pretty good with either hessian free training or pretraining).

The main thing with images is ensuring scaling them.

The trick with deep belief networks in particular is to make sure the RBMs have the right visible and hidden units (Hinton recommends Gaussian Visible, Rectified Linear Hidden).

Happy to answer other questions as well!

im3w1l · on July 29, 2014

I think it is a convolutional network trained only with gradient descent, since pressing source code links to convnet project.