I attended the Neural Information Processing Systems (NIPS) 2015 conference this week in Montreal. It was an incredible experience, like drinking from a firehose of information. Special thanks to my employer Dropbox for sending me to the show (we're hiring!)
Here's some of the trends I noticed this week; note that they are biased towards deep and reinforcement learning as those are the tracks I attended at the conference:
Most state of the art neural network architectures either for perception, language translation, etc. are moving beyond just simple feed forward or convolutional architectures. In particular, they are mixing and matching different neural network techniques like LSTMs, convolutions, custom objective functions, multiple cortical columns, etc.
Most state of the art systems are incorporating LSTMs into systems to have a sense of memory to capture repeating patterns.
Some, but not all, systems are beginning to bake in a notion of “attention”, or allowing the neural network to learn where to place its “focus” as it works on a task. These aren't a part of the normal neural network pipeline yet, but are showing up here and there.
The work on Neural Turing Machines, or being able to differentially train a neural network to learn algorithms, remains interesting but is not yet being harnessed in real applications. They remain complex and only able to tackle toy problems so far.
Convolutional neural networks first showed up in computer vision but are now used in some NLP systems, while LSTMs and leaning heavily on recurrent neural networks first made their mark in NLP tasks like sequence-to-sequence translation but are now cropping into computer vision neural network tasks.
In addition, the intersection of computer vision and NLP remains fertile, with common embeddings being used for tasks like image captioning.
As neural network architectures and their objective functions become more sophisticated and custom, manually deriving their gradients for back propagation by hand becomes even more difficult and error prone. The latest toolkits like Google's TensorFlow bake in automatic symbolic differentiation so you can build up your architecture and objective functions, automatically figuring out the correct differentiation amongst the pieces to ensure error gradients can back propagate during training.
Multiple teams showed different ways to fairly drastically compress the amount of weights needed for a trained model: binarization, fixed floating point, iterative pruning and fine tuning steps, and more.
These techniques open up important possibilities for applications: it might be possible to fully fit very sophisticated trained models on mobile devices, for example, not requiring latency talking to the cloud to get results, such as speech recognition. In addition, if we can rapidly query a model at a high frame rate since both its space and computational run time cost are much lower, such as 30 FPS, it might open new kinds of near real-time computer vision tasks on mobile devices using sophisticated trained neural network models.
NIPS showed these compression techniques, but I didn't see anyone leveraging them yet. I suspect we might see that in 2016.
While no major results were shown at NIPS this year in reinforcement learning, the Deep Reinforcement Learning workshop was standing room only and showed the excitement possible when we can combine deep neural networks with reinforcement learning's ability to plan.
Exciting work is happening in domains such as end-to-end robotics, using deep and reinforcement learning together to go directly from raw sensory data to actual motor actuators. We are moving beyond just classification to trying to figure out how to put planning and action into the equation. Much more work remains but the early work is exciting.
Batch normalization is now considered a standard part of the neural network toolkit and was referenced throughout work at the conference.
You need to be able to have researchers innovate new neural network approaches, then have an approach to scale these out to actual application production quickly. Google's TensorFlow is one of the few libraries that allows this: researchers can quickly create new network topologies as graphs, then these can be scaled in different configurations across single, multiple, or mobile devices using main stream programming languages like Python or C++.
However, note that it is still early days for TensorFlow; Caffe is here to stay for now. TensorFlow's single-device performance is still not as strong as other frameworks; Google has announced they will release a distributed version using Kubernetes and gRPC soon but distributed training does not work yet; and using TensorFlow does not yet work on Amazon's AWS. The future is exciting though for TensorFlow.
Subscribe to my RSS feed and follow me on Twitter to stay up to date on new posts.
Please note that this is my personal blog — the views expressed on these pages are mine alone and not those of my employer.