The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!
Will deep learning really live up to its promise? We dont actually know. But if its going to, it will have to assimilate how classical computer science algorithms work. This is what DeepMind is working on, and its success is important to the eventual uptake of neural networks in wider commercial applications.
Founded in 2010 with the goal of creating AGI artificial general intelligence, a general purpose AI that truly mimics human intelligence DeepMind is on the forefront of AI research. The company is also backed by industry heavyweights like Elon Musk and Peter Thiel.
Acquired by Google in 2014, DeepMind has made headlines for projects such as AlphaGo, a program that beat the world champion at the game of Go in a five-game match, and AlphaFold, which found a solution to a 50-year-old grand challenge in biology.
Now DeepMind has set its sights on another grand challenge: bridging the worlds of deep learning and classical computer science to enable deep learning to do everything. If successful, this approach could revolutionize AI and software as we know them.
Petar Velikovi is a senior research scientist at DeepMind. His entry into computer science came through algorithmic reasoning and algorithmic thinking using classical algorithms. Since he started doing deep learning research, he has wanted to reconcile deep learning with the classical algorithms that initially got him excited about computer science.
Meanwhile, Charles Blundell is a research lead at DeepMind who is interested in getting neural networks to make much better use of the huge quantities of data theyre exposed to. Examples include getting a network to tell us what it doesnt know, to learn much more quickly, or to exceed expectations.
When Velikovi met Blundell at DeepMind, something new was born: a line of research that goes by the name of Neural Algorithmic Reasoning (NAR), after a position paper the duo recently published.
NAR traces the roots of the fields it touches upon and branches out to collaborations with other researchers. And unlike much pie-in-the-sky research, NAR has some early results and applications to show for itself.
Velikovi was in many ways the person who kickstarted the algorithmic reasoning direction in DeepMind. With his background in both classical algorithms and deep learning, he realized that there is a strong complementarity between the two of them. What one of these methods tends to do really well, the other one doesnt do that well, and vice versa.
Usually when you see these kinds of patterns, its a good indicator that if you can do anything to bring them a little bit closer together, then you could end up with an awesome way to fuse the best of both worlds, and make some really strong advances, Velikovi said.
When Velikovi joined DeepMind, Blundell said, their early conversations were a lot of fun because they have very similar backgrounds. They both share a background in theoretical computer science. Today, they both work a lot with machine learning, in which a fundamental question for a long time has been how to generalize how do you work beyond the data examples youve seen?
Algorithms are a really good example of something we all use every day, Blundell noted. In fact, he added, there arent many algorithms out there. If you look at standard computer science textbooks, theres maybe 50 or 60 algorithms that you learn as an undergraduate. And everything people use to connect over the internet, for example, is using just a subset of those.
Theres this very nice basis for very rich computation that we already know about, but its completely different from the things were learning. So when Petar and I started talking about this, we saw clearly theres a nice fusion that we can make here between these two fields that has actually been unexplored so far, Blundell said.
The key thesis of NAR research is that algorithms possess fundamentally different qualities to deep learning methods. And this suggests that if deep learning methods were better able to mimic algorithms, then generalization of the sort seen with algorithms would become possible with deep learning.
To approach the topic for this article, we asked Blundell and Velikovi to lay out the defining properties of classical computer science algorithms compared to deep learning models. Figuring out the ways in which algorithms and deep learning models are different is a good start if the goal is to reconcile them.
For starters, Blundell said, algorithms in most cases dont change. Algorithms are comprised of a fixed set of rules that are executed on some input, and usually good algorithms have well-known properties. For any kind of input the algorithm gets, it gives a sensible output, in a reasonable amount of time. You can usually change the size of the input and the algorithm keeps working.
The other thing you can do with algorithms is you can plug them together. The reason algorithms can be strung together is because of this guarantee they have: Given some kind of input, they only produce a certain kind of output. And that means that we can connect algorithms, feeding their output into other algorithms input and building a whole stack.
People have been looking at running algorithms in deep learning for a while, and its always been quite difficult, Blundell said. As trying out simple tasks is a good way to debug things, Blundell referred to a trivial example: the input copy task. An algorithm whose task is to copy, where its output is just a copy of its input.
It turns out that this is harder than expected for deep learning. You can learn to do this up to a certain length, but if you increase the length of the input past that point, things start breaking down. If you train a network on the numbers 1-10 and test it on the numbers 1-1,000, many networks will not generalize.
Blundell explained, They wont have learned the core idea, which is you just need to copy the input to the output. And as you make the process more complicated, as you can imagine, it gets worse. So if you think about sorting through various graph algorithms, actually the generalization is far worse if you just train a network to simulate an algorithm in a very naive fashion.
Fortunately, its not all bad news.
[T]heres something very nice about algorithms, which is that theyre basically simulations. You can generate a lot of data, and that makes them very amenable to being learned by deep neural networks, he said. But it requires us to think from the deep learning side. What changes do we need to make there so that these algorithms can be well represented and actually learned in a robust fashion?
Of course, answering that question is far from simple.
When using deep learning, usually there isnt a very strong guarantee on what the output is going to be. So you might say that the output is a number between zero and one, and you can guarantee that, but you couldnt guarantee something more structural, Blundell explained. For example, you cant guarantee that if you show a neural network a picture of a cat and then you take a different picture of a cat, it will definitely be classified as a cat.
With algorithms, you could develop guarantees that this wouldnt happen. This is partly because the kind of problems algorithms are applied to are more amenable to these kinds of guarantees. So if a problem is amenable to these guarantees, then maybe we can bring across into the deep neural networks classical algorithmic tasks that allow these kinds of guarantees for the neural networks.
Those guarantees usually concern generalizations: the size of the inputs, the kinds of inputs you have, and their outcomes that generalize over types. For example, if you have a sorting algorithm, you can sort a list of numbers, but you could also sort anything you can define an ordering for, such as letters and words. However, thats not the kind of thing we see at the moment with deep neural networks.
Another difference, which Velikovi noted, is that algorithmic computation can usually be expressed as pseudocode that explains how you go from your inputs to your outputs. This makes algorithms trivially interpretable. And because they operate over these abstractified inputs that conform to some preconditions and post-conditions, its much easier to reason theoretically about them.
That also makes it much easier to find connections between different problems that you might not see otherwise, Velikovi added. He cited the example of MaxFlow and MinCut as two problems that are seemingly quite different, but where the solution of one is necessarily the solution to the other. Thats not obvious unless you study it from a very abstract lens.
Theres a lot of benefits to this kind of elegance and constraints, but its also the potential shortcoming of algorithms, Velikovi said. Thats because if you want to make your inputs conform to these stringent preconditions, what this means is that if data that comes from the real world is even a tiny bit perturbed and doesnt conform to the preconditions, Im going to lose a lot of information before I can massage it into the algorithm.
He said that obviously makes the classical algorithm method suboptimal, because even if the algorithm gives you a perfect solution, it might give you a perfect solution in an environment that doesnt make sense. Therefore, the solutions are not going to be something you can use. On the other hand, he explained, deep learning is designed to rapidly ingest lots of raw data at scale and pick up interesting rules in the raw data, without any real strong constraints.
This makes it remarkably powerful in noisy scenarios: You can perturb your inputs and your neural network will still be reasonably applicable. For classical algorithms, that may not be the case. And thats also another reason why we might want to find this awesome middle ground where we might be able to guarantee something about our data, but not require that data to be constrained to, say, tiny scalars when the complexity of the real world might be much larger, Velikovi said.
Another point to consider is where algorithms come from. Usually what happens is you find very clever theoretical scientists, you explain your problem, and they think really hard about it, Blundell said. Then the experts go away and map the problem onto a more abstract version that drives an algorithm.The experts then present their algorithm for this class of problems, which they promise will execute in a specified amount of time and provide the right answer. However, because the mapping from the real-world problem to the abstract space on which the algorithm is derived isnt always exact, Blundell said, it requires a bit of an inductive leap.
With machine learning, its the opposite, as ML just looks at the data. It doesnt really map onto some abstract space, but it does solve the problem based on what you tell it.
What Blundell and Velikovi are trying to do is get somewhere in between those two extremes, where you have something thats a bit more structured but still fits the data, and doesnt necessarily require a human in the loop. That way you dont need to think so hard as a computer scientist. This approach is valuable because often real-world problems are not exactly mapped onto the problems that we have algorithms for and even for the things we do have algorithms for, we have to abstract problems. Another challenge is how to come up with new algorithms that significantly outperform existing algorithms that have the same sort of guarantees.
When humans sit down to write a program, its very easy to get something thats really slow for example, that has exponential execution time, Blundell noted. Neural networks are the opposite. As he put it, theyre extremely lazy, which is a very desirable property for coming up with new algorithms.
There are people who have looked at networks that can adapt their demands and computation time. In deep learning, how one designs the network architecture has a huge impact on how well it works. Theres a strong connection between how much processing you do and how much computation time is spent and what kind of architecture you come up with theyre intimately linked, Blundell said.
Velikovi noted that one thing people sometimes do when solving natural problems with algorithms is try to push them into a framework theyve come up with that is nice and abstract. As a result, they may make the problem more complex than it needs to be.
The traveling [salesperson], for example, is an NP complete problem, and we dont know of any polynomial time algorithm for it. However, there exists a prediction thats 100% correct for the traveling [salesperson], for all the towns in Sweden, all the towns in Germany, all the towns in the USA. And thats because geographically occurring data actually has nicer properties than any possible graph you could feed into traveling [salesperson], Velikovi said.
Before delving into NAR specifics, we felt a naive question was in order: Why deep learning? Why go for a generalization framework specifically applied to deep learning algorithms and not just any machine learning algorithm?
The DeepMind duo wants to design solutions that operate over the true raw complexity of the real world. So far, the best solution for processing large amounts of naturally occurring data at scale is deep neural networks, Velikovi emphasized.
Blundell noted that neural networks have much richer representations of the data than classical algorithms do. Even inside a large model class thats very rich and complicated, we find that we need to push the boundaries even further than that to be able to execute algorithms reliably. Its a sort of empirical science that were looking at. And I just dont think that as you get richer and richer decision trees, they can start to do some of this process, he said.
Blundell then elaborated on the limits of decision trees.
We know that decision trees are basically a trick: If this, then that. Whats missing from that is recursion, or iteration, the ability to loop over things multiple times. In neural networks, for a long time people have understood that theres a relationship between iteration, recursion, and the current neural networks. In graph neural networks, the same sort of processing arises again; the message passing you see there is again something very natural, he said.
Ultimately, Blundell is excited about the potential to go further.
If you think about object-oriented programming, where you send messages between classes of objects, you can see its exactly analogous, and you can build very complicated interaction diagrams and those can then be mapped into graph neural networks. So its from the internal structure that you get a richness that seems might be powerful enough to learn algorithms you wouldnt necessarily get with more traditional machine learning methods, Blundell explained.
Go here to read the rest:
DeepMind aims to marry deep learning and classic algorithms - VentureBeat