Artificial-intelligence researchers are trying to fix the flaws of neural networks.

A self-driving car approaches a stop sign, but instead of slowing down, it accelerates into the busy intersection. An accident report later reveals that four small rectangles had been stuck to the face of the sign. These fooled the car’s onboard artificial intelligence (AI) into misreading the word “stop” as “speed limit 45”.

Such an event hasn’t actually happened, but the potential for sabotaging AI is very real. Researchers have already demonstrated how to fool an AI system into misreading a stop sign, by carefully positioning stickers on it. They have deceived facial-recognition systems by sticking a printed pattern on glasses or hats. And they have tricked speech-recognition systems into hearing phantom phrases by inserting patterns of white noise in the audio.

These are just some examples of how easy it is to break the leading pattern-recognition technology in AI, known as deep neural networks (DNNs). These have proved incredibly successful at correctly classifying all kinds of input, including images, speech, and data on consumer preferences. They are part of daily life, running everything from automated telephone systems to user recommendations on the streaming service Netflix. Yet making alterations to inputs in the form of tiny changes that are typically imperceptible to humans can flummox the best neural networks around.

These problems are more concerning than idiosyncratic quirks in a not-quite-perfect technology, says Dan Hendrycks, a Ph.D. student in computer science at the University of California, Berkeley. Like many scientists, he has come to see them as the most striking illustration that DNNs are fundamentally brittle: brilliant at what they do until, taken into unfamiliar territory, they break in unpredictable ways.

That could lead to substantial problems. Deep-learning systems are increasingly moving out of the lab into the real world, from piloting self-driving cars to mapping crime and diagnosing disease. But pixels maliciously added to medical scans could fool a DNN into wrongly detecting cancer, one study reported this year. Another suggested that a hacker could use these weaknesses to hijack an online AI-based system so that it runs the invader’s own algorithms.

In their efforts to work out what’s going wrong, researchers have discovered a lot about why DNNs fail. “There are no fixes for the fundamental brittleness of deep neural networks,” argues François Chollet, an AI engineer at Google in Mountain View, California. To move beyond the flaws, he and others say, researchers need to augment pattern-matching DNNs with extra abilities: for instance, making AIs that can explore the world for themselves, write their own code and retain memories. These kinds of the system will, some experts think, form the story of the coming decade in AI research.

Reality check

In 2011, Google revealed a system that could recognize cats in YouTube videos, and soon after came a wave of DNN-based classification systems. “Everybody was saying, ‘Wow, this is amazing, computers are finally able to understand the world,’” says Jeff Clune at the University of Wyoming in Laramie, who is also a senior research manager at Uber AI Labs in San Francisco, California.

But AI researchers knew that DNNs do not actually understand the world. Loosely modeled on the architecture of the brain, they are software structures made up of large numbers of digital neurons arranged in many layers. Each neuron is connected to others in layers above and below it.

The idea is that features of the raw input coming into the bottom layers such as pixels in an image trigger some of those neurons, which then pass on a signal to neurons in the layer above according to simple mathematical rules. Training a DNN network involves exposing it to a massive collection of examples, each time tweaking the way in which the neurons are connected so that, eventually, the top layer gives the desired answer such as always interpreting a picture of a lion as a lion, even if the DNN hasn’t seen that picture before.

A first big reality check came in 2013 when Google researcher Christian Szegedy and his colleagues posted a preprint called “Intriguing properties of neural networks”. The team showed that it was possible to take an image of a lion, for example, that a DNN could identify and, by altering a few pixels, convince the machine that it was looking at something different, such as a library. The team called the doctored images “adversarial examples”.

A year later, Clune and his then-PhD student Anh Nguyen, together with Jason Yosinski at Cornell University in Ithaca, New York, showed that it was possible to make DNNs see things that were not there, such as a penguin in a pattern of wavy lines. “Anybody who has played with machine learning knows these systems make stupid mistakes once in a while,” says Yoshua Bengio at the University of Montreal in Canada, who is a pioneer of deep learning. “What was a surprise was the type of mistake,” he says. “That was pretty striking. It’s a type of mistake we would not have imagined would happen.”

New types of mistakes have come thick and fast. Last year, Nguyen, who is now at Auburn University in Alabama, showed that simply rotating objects in an image was sufficient to throw off some of the best image classifiers around. This year, Hendrycks and his colleagues reported that even unadulterated, natural images can still trick state-of-the-art classifiers into making unpredictable gaffes, such as identifying a mushroom as a pretzel or a dragonfly as a manhole cover.

The issue goes beyond object recognition: any AI that uses DNNs to classify inputs such as speech can be fooled. AIs that play games can be sabotaged: in 2017, computer scientist Sandy Huang, a Ph.D. student at the University of California, Berkeley, and her colleagues focused on DNNs that had been trained to beat Atari video games through a process called reinforcement learning. In this approach, an AI is given a goal and, in response to a range of inputs, learns through trial and error what to do to reach that goal. It is the technology behind superhuman game-playing AIs such as AlphaZero and the poker bot Pluribus. Even so, Huang’s team was able to make their AIs lose games by adding one or two random pixels to the screen.

Earlier this year, AI Ph.D. student Adam Gleave at the University of California, Berkeley, and his colleagues demonstrated that it is possible to introduce an agent to an AI’s environment that acts out an “adversarial policy” designed to confuse the AI’s responses. For example, an AI footballer trained to kick a ball past an AI goalkeeper in a simulated environment loses its ability to score when the goalkeeper starts to behave in unexpected ways, such as collapsing on the ground.

An AI footballer in a simulated penalty-shootout is confused when the AI goalkeeper enacts an “adversarial policy”: falling to the floor (right). Credit: Adam Gleave.

Knowing where a DNN’s weak spots are could even let a hacker take over a powerful AI. One example of that came last year when a team from Google showed that it was possible to use adversarial examples not only to force a DNN to make specific mistakes but also to reprogram it entirely effectively repurposing an AI trained on one task to do another3.

Many neural networks, such as those that learn to understand language, can, in principle, be used to encode any other computer program. “In theory, you can turn a chatbot into whatever program you want,” says Clune. “This is where the mind starts to boggle.” He imagines a situation in the near future in which hackers could hijack neural nets in the cloud to run their own spambot-dodging algorithms.

For computer scientist Dawn Song at the University of California, Berkeley, DNNs are like sitting ducks. “There are so many different ways that you can attack a system,” she says. “And the defense’ is very, very difficult.”

With great power comes great fragility

DNNs are powerful because their many layers mean they can pick up on patterns in many different features of input when attempting to classify it. An AI trained to recognize aircraft might find that features such as patches of color, texture or background are just as strong predictors as the things that we would consider salient, such as wings. But this also means that a very small change in the input can tip it over into what the AI considers an apparently different state.

One answer is simply to throw more data at the AI; in particular, to repeatedly expose the AI to problematic cases and correct its errors. In this form of “adversarial training”, as one network learns to identify objects, a second try to change the first network’s inputs so that it makes mistakes. In this way, adversarial examples become part of DNN’s training data.

Hendrycks and his colleagues have suggested quantifying a DNN’s robustness against making errors by testing how it performs against a large range of adversarial examples. However, training a network to withstand one kind of attack could weaken it against others, they say. And researchers led by Pushmeet Kohli at Google DeepMind in London are trying to inoculate DNNs against making mistakes. Many adversarial attacks work by making tiny tweaks to the component parts of input such as subtly altering the color of pixels in an image until this tips a DNN over into a misclassification. Kohli’s team has suggested that a robust DNN should not change its output as a result of small changes in its input and that this property might be mathematically incorporated into the network, constraining how it learns.

For the moment, however, no one has a fix on the overall problem of brittle AIs. The root of the issue, says Bengio, is that DNNs don’t have a good model of how to pick out what matters. When an AI sees a doctored image of a lion as a library, a person still sees a lion because they have a mental model of the animal that rests on a set of high-level features ears, a tail, a mane and so on that lets them abstract away from low-level arbitrary or incidental details. “We know from prior experience which features are the salient ones,” says Bengio. “And that comes from a deep understanding of the structure of the world.”

One attempt to address this is to combine DNNs with symbolic AI, which was the dominant paradigm in AI before machine learning. With symbolic AI, machines reasoned using hard-coded rules about how the world worked, such as that it contains discrete objects and that they are related to one another in various ways. Some researchers, such as psychologist Gary Marcus at New York University, say hybrid AI models are the way forward. “Deep learning is so useful in the short term that people have lost sight of the long term,” says Marcus, who is a long-time critic of the current deep-learning approach. In May, he co-founded a start-up called Robust AI in Palo Alto, California, which aims to mix deep learning with rule-based AI techniques to develop robots that can operate safely alongside people. Exactly what the company is working on remains under wraps.

Even if rules can be embedded into DNNs, they are still only as good as the data they learn from. Bengio says that AI agents need to learn in richer environments that they can explore. For example, most computer-vision systems fail to recognize that a can of beer is cylindrical because they were trained on data sets of 2D images. That is why Nguyen and colleagues found it so easy to fool DNNs by presenting familiar objects from different perspectives. Learning in a 3D environment real or simulated will help.

But the way AIs do their learning also needs to change. “Learning about causality needs to be done by agents that do things in the world, that can experiment and explore,” says Bengio. Another deep-learning pioneer, Jürgen Schmidhuber at the Dalle Molle Institute for Artificial Intelligence Research in Manno, Switzerland, thinks along similar lines. Pattern recognition is extremely powerful, he says good enough to have made companies such as Alibaba, Tencent, Amazon, Facebook, and Google the most valuable in the world. “But there’s a much bigger wave coming,” he says. “And this will be about machines that manipulate the world and create their own data through their own actions.”

In a sense, AIs that use reinforcement learning to beat computer games are doing this already in artificial environments: by trial and error, they manipulate pixels on the screen in allowed ways until they reach a goal. But real environments are much richer than the simulated or curated data sets on which most DNNs train today.

(To be continued)

Source: Nature

Related posts: