Why is this a concern?
In many parts of computer science, there is a belief in "ultimate correctness": a perception that if we merely formalize our goals and our algorithms and successfully carry out a proof that our code accomplishes the goals, then we will have created a correct artifact. If the goals covered a set of safety objectives, then the artifact should be safe to use, up to the quality of our specification. Highly professional development practices and extensive testing strengthen our confidence; model-checking or other forms of machine-checked proofs that run on the real software carry this confidence to the point at which little more can be said, if you trust the specification.
Yet we also know how unrealistic such a perspective can be. Typical computing systems depend upon tens or hundreds of millions of lines of complex code, spread over multiple modules, perhaps including firmware programs downloaded into the hardware. All of this might then be deployed onto the cloud, and hence dependent on the Internet, and on the cloud data-center's thousands of computers. The normal behavior of such programs resides somewhere in the cross-product of the individual states of the component elements: an impossibly large space of possible configurations.
While we can certainly debug systems through thorough testing, software is never even close to flawless: by some estimates, a bug might lurk in every few tens or hundreds of lines of logic. The compiler and the operating system and various helper programs are probably buggy too. Testing helps us gain confidence that these latent bugs are rarely exercised under production conditions, not that the code is genuinely correct.
While we can certainly debug systems through thorough testing, software is never even close to flawless: by some estimates, a bug might lurk in every few tens or hundreds of lines of logic. The compiler and the operating system and various helper programs are probably buggy too. Testing helps us gain confidence that these latent bugs are rarely exercised under production conditions, not that the code is genuinely correct.
Beyond what is testable we enter the realm of Heisen-behaviors: irreproducible oddities that can be impossible to explain: perhaps caused by unexpected scheduling effects, or by cosmic rays that flip bits, or by tiny hardware glitches. The hardware itself is often better modelled as being probabilistic than deterministic: 1 + 1 certainly should equal 2, but perhaps sometimes the answer comes out as 0, or as 3. Obviously, frequent problems of that sort would make a machine sufficiently unreliable that we would probably repair or replace it. But an event happening perhaps once a week? Such problems often pass unnoticed.
Thus, ultimate correctness is elusive; we know this, and we've accommodated to the limitations of computing systems by doing our best to fully specify systems, holding semi-adversarial code reviews aimed at finding design bugs, employing clean-room development practices, then red-team testing, then acceptance testing, then integration testing. The process works surprisingly well, although patches and other upgrades commonly introduce new problems, and adapting a stable existing system to operate on new hardware or in a new setting can reveal surprising unnoticed issues that were concealed or even compensated for by the prior pattern of use, or by mechanisms that evolved specifically for that purpose.
The puzzle with deep learning and other forms of unsupervised or semi-supervised training is that we create systems that lack a true specification. Instead, they have a self-defined behavioral goal: reinforcement learning trains a system to respond to situations resembling ones it has seen before by repeating whatever response dominated in the training data. In effect: "when the traffic light turns yellow, drive very fast."
Thus we have a kind of autonomously-learned specification, and because the specification is extracted automatically by training against a data set, the learned model is inherently shaped by the content of the data set.
Train such a system on a language sample in which plurals always end in "s", and it won't realize that "cattle" and "calamari" are plural. Train it on images in which all the terrorists have dark hair and complexions, and the system will learn that anyone with dark hair or skin is a potential threat. Teach it to drive in California, where every intersection either has stop signs on one or both streets, or has a traffic signal, and it won't understand how to drive in Europe, where many regions use a "priority to the right" model, whereby incoming traffic (even from a small street) has priority over any traffic from the left (even if from a major road).
Machine learning systems trained in this way conflate correlation with causation. In contrast, human learning teases out causal explanations from examples. The resulting knowledge is different from a knowledge model learned by training today's machine learning technologies, no matter how impressive the machine learning system's ability to do pattern matching.
Human knowledge also understands time, and understands that behavior must evolve over time. Stephen Gould often wrote about being diagnosed as a young adult with a fatal circulatory cancer. Medical statistics of the period gave him a life expectancy of no more than a few months, perhaps a year at best. But as it happened, a new medication proved to be a true magic bullet: he was cured. The large-population statistics were based on prior treatments and hence not predictive of the outcomes for those who received this new treatment. The story resonated in Gould's case because in his academic life, he studied "punctuated equilibria", which are situations in which a population that has been relatively static suddenly evolves in dramatic ways: often, because of some significant change in the environment. Which is precisely he point.
Those who fail to learn from the past are doomed to repeat it. But those who fail to appreciate that the past may not predict the future are also doomed. Genuine wisdom comes not from raw knowledge, but also from the ability to reason about novel situations in robust ways.
Machine learning systems tend to learn a single set of models at a time. They squeeze everything into a limited collection of models, which blurs information if the system lacks a needed category: "drives on the left", or "uses social networking apps". Humans create models, revise models, and are constantly on the lookout for exceptions. "Is that really a pile of leaves, or has the cheetah realized it can hide in a pile of leaves? It never did that before. Clever cheetah!" Such insights once were of life-or-death importance.
The puzzle with deep learning and other forms of unsupervised or semi-supervised training is that we create systems that lack a true specification. Instead, they have a self-defined behavioral goal: reinforcement learning trains a system to respond to situations resembling ones it has seen before by repeating whatever response dominated in the training data. In effect: "when the traffic light turns yellow, drive very fast."
Thus we have a kind of autonomously-learned specification, and because the specification is extracted automatically by training against a data set, the learned model is inherently shaped by the content of the data set.
Train such a system on a language sample in which plurals always end in "s", and it won't realize that "cattle" and "calamari" are plural. Train it on images in which all the terrorists have dark hair and complexions, and the system will learn that anyone with dark hair or skin is a potential threat. Teach it to drive in California, where every intersection either has stop signs on one or both streets, or has a traffic signal, and it won't understand how to drive in Europe, where many regions use a "priority to the right" model, whereby incoming traffic (even from a small street) has priority over any traffic from the left (even if from a major road).
Machine learning systems trained in this way conflate correlation with causation. In contrast, human learning teases out causal explanations from examples. The resulting knowledge is different from a knowledge model learned by training today's machine learning technologies, no matter how impressive the machine learning system's ability to do pattern matching.
Human knowledge also understands time, and understands that behavior must evolve over time. Stephen Gould often wrote about being diagnosed as a young adult with a fatal circulatory cancer. Medical statistics of the period gave him a life expectancy of no more than a few months, perhaps a year at best. But as it happened, a new medication proved to be a true magic bullet: he was cured. The large-population statistics were based on prior treatments and hence not predictive of the outcomes for those who received this new treatment. The story resonated in Gould's case because in his academic life, he studied "punctuated equilibria", which are situations in which a population that has been relatively static suddenly evolves in dramatic ways: often, because of some significant change in the environment. Which is precisely he point.
Those who fail to learn from the past are doomed to repeat it. But those who fail to appreciate that the past may not predict the future are also doomed. Genuine wisdom comes not from raw knowledge, but also from the ability to reason about novel situations in robust ways.
Machine learning systems tend to learn a single set of models at a time. They squeeze everything into a limited collection of models, which blurs information if the system lacks a needed category: "drives on the left", or "uses social networking apps". Humans create models, revise models, and are constantly on the lookout for exceptions. "Is that really a pile of leaves, or has the cheetah realized it can hide in a pile of leaves? It never did that before. Clever cheetah!" Such insights once were of life-or-death importance.
Today, a new element enters the mix: systematic error in which a system is programmed to learn a pattern, but overgeneralizes and consequently behaves incorrectly every time a situation arises that exercises the erroneous generalization. Systematic error is counterintuitive, and perhaps this explains our seeming inability to recognize the risk: viewing artificially intelligent systems as mirrors of ourselves, we are blind to the idea that actually, they can exhibit bizarre and very non-random misbehavior. Indeed, it is in the nature of this form of machine learning to misbehave in unintuitive ways!
My concern is this: while we've learned to create robust solutions from somewhat unreliable components, little of what we know about reliability extends to this new world of machine-learning components that can embody systematic error, model inadequacies, or an inability to adapt and learn as conditions evolve. This exposes us to a wide range of new failure modalities never before seen, and that could challenge the industry and the computer science community to overcome. We lack systematic ways to recognize and respond to these new kinds of systematic flaws.
Systematic error also creates new and worrying attack surfaces that hackers and others might exploit. Knowing how a machine learning system is trained, a terrorist might circulate some photoshopped images of him or herself with very pale makeup and light brown or blond hair, to bind other biometrics that are harder to change (like fingerprints, corneal patterns) with interpretations suggesting "not a threat". Knowing how a self-driving car makes decisions, a hacker might trick it into driving into a pylon.
Welcome to the new world of complex systems with inadequate artificial intelligences. The public places far too much confidence in these systems, and our research community has been far too complacent. We need to open our eyes to the risks, and to teach the public about them, too.
My concern is this: while we've learned to create robust solutions from somewhat unreliable components, little of what we know about reliability extends to this new world of machine-learning components that can embody systematic error, model inadequacies, or an inability to adapt and learn as conditions evolve. This exposes us to a wide range of new failure modalities never before seen, and that could challenge the industry and the computer science community to overcome. We lack systematic ways to recognize and respond to these new kinds of systematic flaws.
Systematic error also creates new and worrying attack surfaces that hackers and others might exploit. Knowing how a machine learning system is trained, a terrorist might circulate some photoshopped images of him or herself with very pale makeup and light brown or blond hair, to bind other biometrics that are harder to change (like fingerprints, corneal patterns) with interpretations suggesting "not a threat". Knowing how a self-driving car makes decisions, a hacker might trick it into driving into a pylon.
Welcome to the new world of complex systems with inadequate artificial intelligences. The public places far too much confidence in these systems, and our research community has been far too complacent. We need to open our eyes to the risks, and to teach the public about them, too.
No comments:
Post a Comment
This blog is inactive as of early in 2020. Comments have been disabled, and will be rejected as spam.
Note: only a member of this blog may post a comment.