The AI Revolution That Was And Wasn’t In 201821 December, 2018 / Articles
Looking back on 2018, this has been a year in which AI has continued its meteoric rise over the digital landscape, infusing its magical powers into almost every corner of every industry and revolutionizing how society uses data. Or so one might be forgiven for thinking this year as companies big and small have rushed to demonstrate how they are harnessing deep learning to upend their business processes. The reality is that while AI has truly transformed areas like audiovisual recognition, given us powerful new tools for understanding language and offered a first glimpse at algorithms that possess glimmers of intuition, the mundane reality of the overwhelming majority of commercial AI applications to date have frequently offered little improvement over the traditional approaches they replaced if those systems had been built properly to begin with.
We speak today of deep learning in reverent tones and ascribe to it an almost mythical aura of superhuman capability. Companies rush to sprinkle the magical AI dust on every project. Even normally austere and risk adverse industries have been plunging headfirst into the AI world with reckless abandon, throwing deep learning models at every problem. The same funding agencies that once required the phrase “social media” in every successful proposal now require “deep learning” somewhere in the abstract to even consider funding a project, whether or not AI has even the slightest applicability to the problem at hand.
In the public consciousness and increasingly in the C-suite, AI is described as human-like algorithms that are basically childlike versions of ourselves that are improving by the day and that any limitations in their accuracy can be instantly fixed by just handing them a bit more training data.
The reality, of course, is that today’s deep learning algorithms are more art than science. Accuracy gains come not from simply blindly throwing more training data at an algorithm, but from careful hand selection of training data, intricate tuning, experimentation and often dumb luck. Successful algorithms are enigmas that even their own creators don’t fully understand and can’t automatically replicate in other domains. Even the most accurate models are frequently so brittle that the slightest change or malicious intervention can send them wildly off course.
Far from primitive silicon humans with childlike minds, today’s AI systems are nothing more than basic statistical encapsulations, more powerful and capable than past approaches, but little different from what we’ve been doing since the dawn of computing.
In some areas like audiovisual analysis, deep learning approaches have been genuinely transformative, allowing machines to achieve accuracy levels at understanding and generating images, speech and video that weren’t even imaginable several years ago. Neural vision systems can recognize a specific make and model of vehicle, even when it is driving through the dessert and covered with armor, weapons, flags and soldiers. It can understand the difference between a gun sitting on a table, a gun pointed in the air and a gun pointed at a person. It can estimate the geographic location the photo was taken, even if it looks dramatically different from the training images it saw. It can also create new imagery or speech that is eerily humanlike.
This is where the true applied AI revolution has occurred, in opening new modalities to machine understanding.
At the same time, using AI for more mundane textual and numeric analyses has not always shown quite the level of transformative improvement. Much like the statistical machine translation (SMT) it replaces, neural machine translation (NMT) can achieve human-like levels of fluency in good cases but fails just as miserably and comedically in others. While NMT systems can indeed achieve higher BLEU scores in academic competitions, when applied to routine day-to-day real-world content, the gains are not necessarily as noticeable as they become lost in the gibberish errors that confound fluent understanding.
The problem is that NMT is still at the end of the day merely blindly applying the statistical patterns it has learned from seeing huge volumes of training data, just like its SMT predecessor. An NMT system can only apply learned patterns to transform one set of tokens into another, like a child mimicking an artist by putting colors and shapes in the same general positions without understanding what they are trying to draw. Unlike a human translator, neural models of today do not actually understand the deeper meaning of the concepts and ideas they are reading, they merely recognize patterns of tokens much like SMT approaches. NMT systems are considerably superior in their ability to recognize far more complex patterns, perform much more sophisticated reorderings and operate across a much greater window of text, but even NMT systems still primarily operate at the level of a sentence or small block of text in isolation. We are still a long way from having production NMT systems that can read an entire passage of text, distill it down to the abstract ideas and perspectives it discusses and then render it into another language entirely from that abstract idea-based representation, bringing contextual and world knowledge to bear in disambiguation, contextualization and framing.
Moreover, the lack of training data for most languages means that even the most cutting edge NMT systems still fail just as comedically as SMT systems for many languages or suffer from the same issues of fluent passages being interrupted at regular intervals by gibberish that renders their key arguments undecipherable.
Neural text processing as a whole suffers from a fixation of process over outcome. Companies believe deep learning solutions will outperform any other solution and so focus on finding a deep learning solution at all costs, rather than recognizing that not every problem is well suited for current neural approaches.
I’ve seen far too many companies build deep learning solutions for the most basic of tasks like recognizing mentions of a specific person’s or company’s full name. When asked whether the massive and expensive deep learning model outperformed a simple keyword search for the name and a few variants, the answer is all too often that they never actually tried, they just assumed neural was the way to go. Eventual benchmarking, if it is performed at all, often shows that the neural approach was actually less accurate in that it was far too sensitive to typos and grammatical errors in the text and lacked sufficient training data to pick up most edge cases.
Neural entity recognition, classification, geocoding and sentiment analysis are all areas where even the most cutting-edge algorithms frequently struggle to outperform well written classical approaches. The key is that few commercial deployments are well written.
Most are hastily thrown together haphazard assortments of hand crafted rules or data-starved Bayesian models. Indeed, it is the rare classical algorithm that has been built from the domain down rather than the code up. Sentiment algorithms in particular have fixated on naïve simple-to-code algorithms built by programmers, rather than stepping back and working with psychologists and linguists to understand how humans communicate emotion and building tools to capture those real world complexities and nuances.
In such cases neural approaches can help standardize model creation and coerce it into more robust data practices, but the benefits often come primarily from the change of creation workflow rather than the power of the neural approach itself. Indeed, for many companies I’ve spoken with, the greatest benefits of deep learning approaches come not from the capabilities of neural networks, but rather from the standardized data-centric creation process enforced by current model construction workflows.
In my own experiences over the deep learning revolution of the past half-decade, applying nearly every imaginable machine understanding task to textual and audiovisual news content in more than 100 languages, I’ve come to alpha test an incredible diversity of approaches from neural to classical machine learning to hand crafted expert rules to every combination therein. I’ve tested everything from production commercial applications to bleeding edge research experiments, with the results always being the same: neural approaches offer massive accuracy and capability leaps for audiovisual content and select understanding and creation tasks, but their application to routine textual understanding can frequently be replicated or exceeded with well-designed non-neural solutions using far less training data and with far greater robustness.
The issue is that while truly capable deep learning experts are an extremely rare commodity, the total pool of data scientists able to step back and build robust systems that reflect the data and contexts they are used in is even smaller. In short, neural approaches bring considerable benefit to many companies not because of the use of deep learning, but rather because their classical data science workflows were so poor, focused on algorithms over outcomes.
Perhaps the biggest challenge today is the enormous gulf between the pioneering work of AI research groups like Alphabet’s DeepMind that are building tools that can learn to play video games and showing the first glimmers of intuition, compared with the rote deep learning systems being built in the commercial sector. Enabling machines to reason about the world, communicate and understand with the outside world, learn new tasks rapidly, abstract from examples to higher order representations and even create on their own, are all incredible capabilities that deep learning approaches are uniquely suited for. At the same time, these are a far cry from the rote categorization filters and entity extractors that form much of the commercial sector’s deep learning utilization.
Putting this all together, the mythology of AI today is more myth and marketing hype than reality. Companies rush to deploy AI anywhere and everywhere to lay claim to having an “AI-powered business” but these neural deployments aren’t always any more accurate than the classical systems they replace. In many cases they are actually worse. Neural approaches have truly transformed audiovisual understanding, but when it comes to textual understanding, neural approaches do not always represent a major leap forward. This may change as the pioneering applications of deep learning eventually graduate from the research labs of places like DeepMind and into the production commercial world, but for now companies would do well to stop and ask whether deep learning is really the answer to any given problem, conduct extensive benchmarking to test that conclusion and, most importantly, rethink how they create software systems in the first place and what happens when the creativity and rigor put into neural approaches is brought to bear on more traditional data science workflows.