Joining Init.ai to build deep learning for conversational apps.
As I write, DeepMind’s algorithm is surpassing the world’s best Go players in ability. AlphaGo plays against Lee Se-dol, the world’s second best Go player, in a set of five matches this week. Soon, the same deep learning technologies will approach human-level natural language ability. That means we, as software users, will be able to converse with our computers in a natural way. Our utterances towards our machines will have fewer constraints, yet still be correctly interpreted.
We at init.ai are bringing deep learning technology to conversational apps. We are studying the research in the field that is openly published every day. We have seen the steady performance gains on tasks in vision and language over the past several years. It is time to start building upon these proven technologies which are now available.
Who are you?
- You are developers. You build user-centric applications with backend services providing functionality through web APIs.
- You wish to simplify your product with natural language instead of a user interface.
- You do not yet have interest in gaining deep learning or natural language processing expertise.
- You want to scale your user base beyond walled gardens to vender neutral text media.
- You want to ensure your users experience quality natural language interactions that continuously improve.
Why is this now possible? Deep learning describes a set of trainable components that stack together in many layers. These components, neural networks, are fully differentiable. Their gradient, or derivative, may be calculated throughout the network given an error signal. An error signal communicates how much incorrect behavior the network has, and how to reduce the error.
It appeared, for many years, that these components were too difficult to optimize. Altogether, they lack a mathematical property that simpler machine learning algorithms have: convexity. Convexity ensures that an algorithm always makes progress during training. But recent research shows for large non-convex models, training is as possible as for convex models.
Along an axis, a critical point on the function can be a minimum, maximum or saddle point. Critical points of high dimensional models are increasingly unlikely to be local minima. Thus, an optimization algorithm almost-always has an escape route to reach an accurate score. Deep learning research formulates training schemes to bypass saddle points via these routes.
Now that training any given model is likely, the important choice is to shape your model architecture to solve your problem. This is the differentiable programming paradigm: specify a model shape but learn the logic from data.
Many model shapes have arisen in the research, and several apply to language understanding. The word2vec algorithm represents a word with a dense vector of learned parameters. The vectors are derived from the word’s contexts within a relevant dataset. The contextual knowledge word vectors provide is like the visual context AlphaGo exploits to reduce its game search tree.
Given a word vector model, the next advancement is the sequence model. The recurrent neural network is the simplest, and the long short-term memory model and the gated recurrent unit model build upon it. These models can process sequences of inputs and perform a task on each token in the sequence, or on the entire sequence.
The sequence-to-sequence model goes further. It digests an input sequence into a summary, which it expands into an output sequence. Sequence-to-sequence can perform language translation.
Further developments have two prominent branches: attention-based models and automata-augmented models. Each performs both sequence and sequence-to-sequence tasks.
Attention-based models review inputs and cached intermediate states to weight their relative importance. These models are able to perform text and image question answering tasks.
Automata augmented models give recurrent neural networks a differentiable memory unit. These units could be a stack, queue, deque or random access memory array, as in the Neural Turing Machine. Such models are algorithmically capable. They can learn to perform a sequence of steps to implement some simple algorithms.
The Neural GPU may be one of the most impressive. It improves upon the Neural Turing Machine by parallelizing its computations through convolution operations. Essentially, it learns cellular automata to implement algorithms.
At init.ai, we will be employing some of these and other relevant models for powering truly conversational apps.
Robert is a software developer working on backend services and deep learning infrastructure.
If you’re interested in more about conversational interfaces, follow him and Init.ai on Medium and Twitter. And if you’re looking to create a conversational interface for your app, service, or company, check out Init.ai.