Is Deep Learning the future of AI?
Deep learning is perhaps the most significant advance in artificial intelligence (AI) of the last decade. Without a doubt, it is a revolution that has redirected research and specialization of a great scientific-technical mass all over the world.
Before going any further, it is worth approaching the subject with some caution, because words can be misleading. The common use of such striking terms as “deep learning”, “machine learning”, or even “artificial intelligence”, added to the expectations created by some scientists and indiscriminately inflated by the press, tend to generate in society the feeling that we are in a more advanced technological state. For example, the very familiar term “artificial intelligence”, do we really know what it means? The popular understanding of AI rather relates to what experts call “artificial self-awareness”, “artificial consciousness” or “general artificial intelligence”. Artificial self-consciousness, or artificial consciousness, refers to a machine’s ability to be aware of its own existence and thought process, while general artificial intelligence addresses machines capable of understanding and learning any intellectual task that a human being can perform. The use of italics here is not arbitrary, as these concepts also require precise definitions within the technological world, which may not fully coincide with the meaning given in less technical fields. In any case, what do experts mean when they talk simply about AI?
The truth is that even among them the term has become a hodgepodge. AI generally refers to technology that displays capabilities that resemble cognitive processes, basically learning and solving problems with a certain level of complexity. Someone could argue here that a pocket calculator should legitimately be considered AI, yet not many people will subscribe to such an opinion. The fact is that what we consider “complex” evolves year after year. At some point, the mix of human expectations and scientific-technical developments has created a somewhat paradoxical phenomenon known as the AI effect, namely: “as soon as AI successfully solves a problem, the problem is no longer part of AI”. Douglas Hofstadter puts it smartly by quoting Larry Tesler’s Theorem: “AI is whatever hasn’t been done yet”, furthermore revealing, in my opinion, a certain shadow of philosophical doubt behind the purposes underlying AI.
From a more everyday perspective, when an expert says that she works on AI she usually means two things: either she is talking to a non-specialist audience that could be confused with more technical terms, or she wants to express that her approach to solving technical problems is open to methods that are not considered “machine learning” or that are not included among machine learning techniques, usually referring to complex multi-layered frameworks or systems designed with ad-hoc architectures.
And what is “machine learning”? Machine learning occupies the heart of AI and is primarily concerned with certain families of algorithms that either extract data models or quickly retrieve already processed instances to solve a problem by approximation. Given that machine learning has traditionally taken up most of the attention of AI, we can say that AI has a more practical side, while machine learning is rather theoretical in nature. In other words, machine learning aims to generalize problem-solving methodologies, while AI (which commonly incorporates machine learning) aims to be the technical solution to a real-life challenge. Figure 1 serves to illustrate how the concepts discussed in this article are framed within each other in a broad sense.
Deep learning or neural networks?
Within the world of machine learning, deep learning has been attracting almost absolute attention for some time now, although it is important to mention that machine learning includes many more techniques that are not deep learning. Such alternatives have been and are continuously used in fundamental research as well as in industrial and commercial applications. In fact, some of them present peculiarities that, in certain cases, overcome the limitations of deep learning and can even face problems that are beyond the scope of the most common deep learning architectures. On the other hand, some of the weaknesses and problems that are commented below as typical of deep learning are also common in other forms of machine learning.
Leaving aside other options, it is practically impossible to talk about deep learning without mentioning the also very well-known neural networks, as both terms are used interchangeably even by experts. However, it should be noted that neural networks have existed since the 1960s, while deep learning emerged around 2010. Are they then the same or not? A neural network is nothing more than a type of computer architecture that emulates the biological brain and tries to make the most of its parallel and highly interconnected structure. Deep learning is built on neural networks with many internal layers, thus providing “depth” to the structure and making it capable of creating maps in which abstractions are formed and stored to achieve the desired learning. This skill is called “feature extraction”. To give a simple example of its implications, a deep learning algorithm that has been trained to identify people in images should be able to abstract a feature that corresponds to the concept “face”. In other words, deep learning is somehow about capturing internally what makes a thing itself (something like the Platonic essence). But note—here a first limitation of deep learning—that this is an ideal aspiration, and the ability of deep learning to learn essential characteristics are strongly linked to the richness and variability of the data used during its training. The abstractions obtained may well contain elements that are not necessary, but contingent. It is common to illustrate students with the case of a deep learning algorithm, which was taught to identify animals in pictures, but then during evaluation identified as “wolf” any image that contained intense blue skies and snowy backgrounds. This is a consequence of having trained the algorithm with a collection of images in which the most coincident and discriminating in those labeled as “wolf” were blue skies and snowy backgrounds.
Again, clarifying terms, traditional neural networks do not have the aforementioned ability to extract features (or not so effectively and flexibly), which is on the other hand an intrinsic component of deep learning. From a different perspective we can say that, while neural networks are a type of computational structure, deep learning refers to a capacity attributable to certain algorithms. What made it possible to move from neural networks to deep learning were some specific technical discoveries about the former but above all the spectacular and continuous increase in computing power and data generation that has taken place over the last two decades. After all, deep learning is a method of learning by brute force.
Limitations of deep learning
But what is the potential of deep learning? What is it able to do? Actually, many things, and, to put it briefly, it can solve any technological challenge in which such capacity to extract traits and patterns at different levels plays an important role in generalizing knowledge; for example, voice recognition, language translation, vehicles that drive themselves, identification of objects, animals or people in images, or learning to play certain games better than top-level experts. In short, deep learning is able to make machines abstract and synthesize a given context; therefore, it enables them to make precise descriptions, predictions, prescriptions, and recommendations within that context. The most elementary working scheme of deep learning (and, in general, of supervised machine learning) is an application in which the machine is trained with a huge set of labeled examples; later on, the machine is able to label by itself objects or situations never seen before.
Perhaps a good way of assessing the potential of deep learning is to look at what it cannot do or to study its limitations. Not surprisingly, the limitations of deep learning also outline the limitations of AI and thus also some of the challenges that AI researchers currently face or must face in the future. I have already mentioned that deep learning is brute force; as such, it carries out painful bombarding learning in which the machine processes tons of data (instances, examples, scenarios). These data try to cover all possible forms to learn in their maximum variability and representation. There is no subtlety in deep learning. To explain this process with an example, in order to learn to identify a cat in an image, deep learning requires being trained with hundreds of images containing cats. This brutal way of understanding learning reveals why deep learning works so well for developing competitive IAs in most games (both board games and computer games likewise). In games, contexts are closed and controlled; the universe of possibilities, although it may be huge, is limited and expressible with a textual or numerical representation that covers it completely. In a game, practically all the information we need to know to opt for a winning strategy is always available and any move that leads to new situations is potentially estimable and predictable by applying the rules of the game itself.
In any case, why does deep learning need so much data? Why this bombardment? Why this waste of computer power? Why so many pictures? The answer to these questions points to one of the most immediate shortcomings of deep learning. The problem is that it does not understand its own abstractions. It is deep only in terms of architecture, but not in the internalization of the discovered patterns and models. Therefore, it does not conceptualize nor is able to carry out inferences or hierarchical structures of knowledge as a human being or even an animal does. If we want to imagine deep learning as a thinking being, it is much closer to an idiot savant than to a philosopher, a theorist, or an engineer (in the purest sense of the term). In fact, deep learning does not even really think, it is simply a very efficient method of detecting correlations. Its patterns are not concepts, logical or causal relationship rules, but the overlapping of coinciding factors in flat contexts. Therefore, the abstractions it generates are not only opaque to deep learning itself, but also to us. To refer to this aspect with a simple example, even if deep learning is capable of grasping a numerical function and guessing correctly the results in the face of untrained values, we will hardly be able to obtain an equation from the learned model. There are leading lines of research that try to improve the explainability of deep learning, but not because we expect to directly find rules that allow us to understand the phenomena or revolutionary ways of expressing knowledge, but because we want to obtain clues about where to focus our attention in order to better understand the problem under analysis. In other words, understanding the models learned by deep learning is as frustrating as trying to understand what people are thinking by observing the activation of their neurons.
In deep learning, the learned knowledge is strongly linked to the structure of the neural network, it is difficult to transport, meaning that the information cannot be separated from the container, or barely expressed beyond the background structure. In short, deep learning cannot formalize or explain its knowledge, even for other deep learning structures. This same inability to formalize ideas also makes it incapable of understanding them. Deep learning only feeds on data, on uninterpreted facts (beyond the fact that the representation format itself implies an interpretation). Due to the obscurity of its representations, combining what it learns with other forms of knowledge is very difficult, normally forced and superficial when tried. For example, if deep learning is to play chess well, it is obliged to discover for itself an idea such as “castling is usually convenient”. This cannot be injected directly into deep learning, as we do not know how to translate the idea of castling into its internal knowledge map. It must be learned the hard way; that is, by introducing thousands of games in which castling has proven to be a good strategic move. Even the abstraction that deep learning can achieve from the idea of “castling” does not come close to the depth of the concept for us. In other words, since deep learning lacks that last step of formalization, it cannot be supplied with ideas that are out of context, and the machine cannot be expected to learn such ideas from a couple of solitary examples. The line of research that seeks to improve this aspect is known as transfer learning.
Although deep learning is very flexible when it comes to learning, the acquired knowledge is rigid and difficult to transport. In part, formalizing means decoupling the essence from the specific cases that inspired it, something that never really happens with deep learning. Therefore, any deviation from contexts severely degrades the quality of the learned information. In other words, the adaptability of deep learning is low. Coming back to the chess example, if a deep learning expert in classical chess plays the variant known as chess-960, in which the pieces in the first row are placed randomly, the level of the machine will be much worse than expected, probably incurring in losing moves that a novice player would not make. This happens because it does not possess the flexibility that we have to take advantage of the acquired knowledge despite changes in the context frame. Deep learning is not robust or self-controlled, we cannot rely on it for delicate or high-risk tasks without first adding some kind of external control. Note that if we leave in the hands of deep learning a task whose complexity exceeds human capacity and where we have no way of assessing the validity of the machine’s response, the consequences of a failure might be dramatic. The worst thing is that we may not realize or do it too late. It is frightening to think of a network of machines with the same type of AI (and therefore algorithmically destined to make the same type of errors) simultaneously coinciding in the same wrong decision. The paradox of deep learning is that they are trained to work in environments beyond their training, but their response is only reliable within the limits set by the training.
Even within the training scope, deep learning is severely limited. As we have already seen, deep learning seems to be a good interpolator, but a bad extrapolator. However, in the context of machine learning, where spaces are twisted and we use tricks and projections to avoid the limitations of linear methods, the difference between interpolation and extrapolation becomes blurred. In other words, deep learning does not have a good imagination (or it is too inconsistent). Since it lacks a strong, meaningful conceptualization, and because it is likely to capture arbitrary artifacts as essential components, the resulting knowledge is weak and susceptible to deception. Popular image identification experiments consist of manipulating pictures in such a way that changes, imperceptible to the human eye, make deep learning fail dramatically (for instance, making it identify a “pig” as an “airplane”). This field of research is known as adversarial machine learning. Beyond the importance of this weakness in applications such as fraud detection or cybersecurity, it highlights the inherent lack of robustness and unpredictability of deep learning. Note that it does not necessarily follow a law of proportional response, meaning that a small drift in the environment can project it to the antipodes of its impenetrable internal mapping.
Finally, without going into further aspects that might be too technical for the scope of this article, it is worth mentioning a subtle disadvantage which has to do with the effect of deep learning on the experts’ mentality and methodologies. We mentioned before the ability to extract features as one of the main virtues of deep learning. It is becoming common to use deep learning to avoid facing the cumbersome task of finding out the variables that best solve a problem—actually the real challenge of classic machine learning. This is known as “feature engineering”. Traditionally, the art of machine learning and data experts lay rather in the selection of a proper representation than in the technique of analysis. This human skill is more related to a consolidated, large experience in the application domain—perhaps also some divergent thinking—than with expertise in machine learning and data analysis. Note that, in a way, brute-force methods also make their users brute, and thus deep learning makes experts lazy and enervates their inquisitive talent, as they tend to feed algorithms with raw data and rely on its ability to extract “features” by itself. It is important here to remember two or three observations that should be almost canonical: that, in many cases, raw data may be untreatable for the algorithm; that deep learning does not replace feature engineering; that the more effort in the representation and preprocessing, the better the capacity of the algorithm is exploited; and that many of the applications faced with machine learning can often be solved more cleanly, transparently, and elegantly through simple mathematics.
I am not trying to convey a pessimistic impression of deep learning, because its potential is frankly amazing. However, I wanted to show a more realistic view of the current state of AI and to insist that there is still a long way to go to achieve something even close to general artificial intelligence. Deep learning alone is not the solution, but it is probably a fundamental part of it. My opinion is that the future of AI does not lie in monolithic solutions, but in complex structures that adequately integrate deep learning with other approaches (also very interesting, but perhaps with less marketing) such as unsupervised learning, symbolic structures, etc.; and, undoubtedly, methods to formalize, transport, manipulate and hierarchically organize the knowledge abstracted by the algorithms. To conclude by answering the initial question, deep learning is the present of AI, most probably also a core component of its future, but not alone, and taking into account that at least one revolutionary step is still missing.
This article may contain inaccuracies, perhaps errors, and even some misconceptions. I am aware that recent research is successfully addressing some of the shortcomings of deep learning mentioned here. The astute reader, here as elsewhere, will read my opinion with healthy skepticism. Although I often use deep learning in my research, I am not 100% up to date with the latest developments (if anyone can) and there are far better experts in the field. Moreover, I am quite often wrong, to my regret. On such occasions, my first reaction is always to feel a little embarrassed, but I must admit that this is often followed by some joy and even relief. In the end, I usually remember something I read a long time ago, I think it was by E. R. Dodds, something like “today’s truth is tomorrow’s mistakes”.
Senior scientist at TU Wien