Vendors are building smart cars much
as they build web browsers and cloud applications. But when a browser crashes, nothing bad
happens. If a car malfunctions, people can
be killed. Product liability could
reshape the industry, but I don’t think this needs to be a setback (if
anything, it may be empowering, for reasons I’ll explain).
How one builds safety-critical systems
I’ve had several opportunities to design
distributed computing systems for settings where mistakes matter.
These projects have to be taken very
seriously. If an air traffic system
(human in the loop like in Europe, or partly automated like we do it here in
the US) authorizes a flight plan and the pilot trusts that authorization, lives
are on the line. In fact, such systems
have built-in redundancy: airplanes, for example, have a collision avoidance
system on board (TCAS). It operates
independently, warning the pilot of any possible violation of plane separation
rules and telling the pilot what to do.
The ATC system is also backed up: if the French ATC were to go down,
control shifts to neighboring countries: Italy, Switzerland, Germany, England, etc. There are layers and layers of redundancy. Of course we don’t want to go there: the last
thing anyone would want is an air traffic solution that frequently triggers the
last-resort collision avoidance solution.
So we use every imaginable technique to harden these kinds of
systems. But when the technology fails
even so, there is a fall-back.
Having humans make the ultimate decisions is especially important in systems that include large software components. In an ATC system, we use Linux and commercial databases and publish subscribe tools. It we also know that Linux is buggy, and the database subsystem, and the publish subscribe solution, and even the C++ compiler. We know this. So the system is built to be internally redundant and distrustful of itself: it just shuts things down, quite aggressively, if they seem to act strangely. We assume that things mostly work as intended, but we do not make life or death decisions that depend on trusting that this is so.
Further, even with all this paranoia, a ATC system requires close supervision. Ultimately, each recommended action is scrutinized by human controllers, and for major actions, human approval is still required before the recommendation is adopted. This approach matters more and more over the lifetime of such a project: by now this system has been in use nearly 30 years. Think of all the patches and upgrades that have occurred. Even if we had mathematical proofs and astonishing levels of testing before we rolled out ATC version 1.0, could we still feel equally confident in version 17.5.1871? If you don't apply patches, you run with known bugs. But it is patches introduce new bugs, too! The same with major new releases of important components.
These same concerns all apply to self-driving cars. Worse, cars are are easier for hackers to attack, and far more likely to be targeted. We took many measures to protect the French ATC. But protecting one system is easy. Once there are a million cars on the road, all being serviced by random garages twice a year, will there really be any possible justification to trust that the system is working as originally intended?
Having humans make the ultimate decisions is especially important in systems that include large software components. In an ATC system, we use Linux and commercial databases and publish subscribe tools. It we also know that Linux is buggy, and the database subsystem, and the publish subscribe solution, and even the C++ compiler. We know this. So the system is built to be internally redundant and distrustful of itself: it just shuts things down, quite aggressively, if they seem to act strangely. We assume that things mostly work as intended, but we do not make life or death decisions that depend on trusting that this is so.
Further, even with all this paranoia, a ATC system requires close supervision. Ultimately, each recommended action is scrutinized by human controllers, and for major actions, human approval is still required before the recommendation is adopted. This approach matters more and more over the lifetime of such a project: by now this system has been in use nearly 30 years. Think of all the patches and upgrades that have occurred. Even if we had mathematical proofs and astonishing levels of testing before we rolled out ATC version 1.0, could we still feel equally confident in version 17.5.1871? If you don't apply patches, you run with known bugs. But it is patches introduce new bugs, too! The same with major new releases of important components.
These same concerns all apply to self-driving cars. Worse, cars are are easier for hackers to attack, and far more likely to be targeted. We took many measures to protect the French ATC. But protecting one system is easy. Once there are a million cars on the road, all being serviced by random garages twice a year, will there really be any possible justification to trust that the system is working as originally intended?
“It ain’t what you don’t know that gets you in trouble. It’s what you think you know that just ain’t so.” (Mark Twain)
I love this Mark Twain quote. People are always so sure of themselves,
especially when they are totally wrong!
No matter how heroic the measures we take,
any technical solution to a problem ultimately depends on hardware, on
assumptions, on models, and on problem statements. The people in charge of defining the
computational task can get any or all of those wrong. When we build a solution, we might be
building the wrong thing without realizing it, or overlooking important cases
that weren’t properly identified.
I’ve often been given incorrect hardware
specifications (and it isn’t unusual to only notice that the specification was
wrong when it causes some really bizarre bug).
I’ve discovered at least ten serious compiler bugs over my career, and
innumerable operating systems bugs.
Programming languages are notorious for overlooking cases when
specifying the correct semantics for the compiler to employ, so even if the
compiler wasn’t buggy, the language design itself may be ambiguous in some
respects.
Or suppose we get it right, deploy our
system, and it works like a charm. Success
breeds ambition, so before you know it, the way that the system is deployed and
used can evolve. But this evolution over time could easily
invalidate some assumption without anyone really noticing. Thinking about Mark Twain’s point: sometimes,
the things you were sure about when you started the project cease to be true
down the road.
I could share some hair-raising stories
about how such events caused problems in systems I’ve helped build. But you know what? There hasn’t been a single time when at the
end of the day, those layers of fail-safe protections didn’t ultimately kick in
and prevent the system from actually doing any harm. So my actual experience has been that if you
take proper care, and approach mission-critical computing with adequate humility,
you can create safe solutions even using fairly complex technologies.
Self-driving cars are taking a different approach, and it worries me.
This brings us to my key concern: Today,
some important safety-critical technology areas have started activities that
seem to need the same standards that are used in air traffic control, and yet
aren’t using those standards.
Specifically, I’m concerned about self-driving cars. The core of my worry is this: I don’t
actually think that it is possible to build a genuinely safe, fully autonomous
driving solution. Even human drivers
misjudge one-another’s intent and accidents ensue (to say nothing of situations
where a driver is impaired in some way).
Self-driving cars operate in a world populated by those same erratic
human drivers, but complicated by their extensive dependency on incredibly
complex sensor technologies. And yet
they lack a good fail-safe option, because they are often operating in domains
where slamming on the brakes or swerving suddenly is dangerous too.
We never allow airplanes to get into that
sort of situation. If we did, they would
bang into each other in mid-air. Perhaps
you watched Breaking Bad, where a deeply depressed air traffic controller
actually steered two planes into a collision.
In practice, that couldn’t happen: alarms would go off in the air
traffic control center, and separately in the cockpits, and either some other
controller would have stepped in, or TCAS would have steered them away from
one-another. The air traffic control
system is designed to keep planes from getting dangerously close to
one-another.
In contrast, not long ago a self-driving
car operated by software developed at Tesla crashed into a truck, killing the
driver. Tesla accepted some blame: a
white truck, on a foggy morning with a bright white background, a street with
mostly white surrounding buildings, etc.
The car couldn’t find the contours of the truck, and thought it was
driving into an empty intersection.
Tesla also put lots of blame on the driver, suggesting that he had been
warned about the limits of its vehicles, and shouldn’t have trusted the
self-driving system under those conditions.
Because only the driver was killed, and the driver did make a decision
to activate the self-driving mode, Tesla has some chance of prevailing if this
question ends up decided in court.
Nobody seems to be suggesting that the company bears any major
liability.
This is common in computing: it is rare for
anyone to accept responsibility for anything.
But I think it is wrong-headed, and in this story, I see vivid evidence
that today’s self-driving cars are operating without those layers of
redundancy, and without air traffic control.
It doesn’t take insider knowledge to guess
that Tesla’s accident foreshadows further such events in the future, by Tesla
and other vendors as well. We are all
Teslas, to paraphrase John F. Kennedy.
Sooner or later, a self-driving car will cause an accident in which the
owner of the car is definitely not at fault, and in which completely innocent
people are killed or severely injured.
Maybe the problem will be the n’th
repeat of Testa’s white truck/white sky scenario, or maybe it will be some
other vendor, some other scenario, and will be the very first time that the
particular model of car, from the particular vendor, was ever exposed to the
situation that caused the crash.
Whatever the cause, lawyers will sue (because this is what lawyers do),
and those lawsuits will reveal a huge gap between software liability laws
(where the vendor is generally not responsible for harm), and the much tougher
liability rules used in life and safety-critical systems, like airplane
autopilots and cardiac defibrillators.
If you build part of an air traffic control
system, protection comes from using best standard practice, but also from
adopting a layered approach to safety: there needs to be a fail-safe backup
option. Self-driving cars take huge
risks, and they lack a safe backup.
So my suggestion is that we need to view
self-driving cars in this other way: we should subject self-driving car
technologies to the sort of extreme oversight used for air traffic control
systems, or airplane guidance systems.
And we should hold the vendor responsible for using best standard
practice, looking not just at other self-driving car vendors (that could be a
race to the bottom), but at other industries too, and specifically air traffic
control.
In suggesting this, I need to acknowledge
something: the situation actually isn’t completely clear-cut: Self-driving cars
are expected to have better safety records than human drivers. Computers don’t drive after a few drinks, or
text while driving, or doze off. Very
likely, accident rates will drop as self-driving cars enter into wide use. Some are already arguing that this better
average behavior should excuse the occasional horrific accident, even if a
software bug causes that accident, and even if the car would make the same
mistake, again and again, in the identical (but incredibly rare, unbelievably
unlikely) conditions. But I don’t agree.
Notice the apples-to-oranges comparison
occurring here. On the one hand, we have
an averaged out statistical argument that self-driving cars are safer. And on the other, we have bugs and design flaws,
and the absolute certainty that someday, a self-driving car will make an actual
mistake that, without question, will be the direct cause of a lethal
accident. Can one legitimately use a
statistical observation to counter a rigidly factual one? Does it matter whether or not we can trace
the accident to a specific bug, or to a specific failure in a sensor (that is,
are Heisenbug’s somehow excusable, but Bohrbugs intolerable)? Does it matter whether a human driver would
have found the road conditions tough? Or
are the only really problematic bugs ones that cause an accident under perfect
driving conditions, when the vendor definitely hasn’t warned that the
technology would be unsafe to use? And
while on that topic, shouldn’t the vendor’s software disable itself under
conditions where it can’t safely be used?
Why should the human driver have to make a guess about what the vehicle
can, and cannot, handle safely?
Where computers are concerned, mistakes are
inevitable: not only does complex software have bugs, but self-driving cars
will additionally depend on specialized sensing equipment: video cameras, GPS
systems, miniature radar devices, wireless connections to the “cloud” to get
updates about road conditions…. All of
these technologies can fail: the video cameras and radars can become dirty, or
malfunction during brutal winter weather, or get knocked out of alignment.
Indeed, a self-driving car might crash for
some reason that isn’t directly tied to a software bug. Any driver knows that a tire can suddenly
explode, or something might crash onto the roadway without warning. But how can software systems be designed to
deal with every imaginable hazard? Some
vendors are proposing to just tell the human driver to take over if such a
situation arises, but a sudden handoff could easily catch the driver by
surprise. It follows that accidents will
happen.
When humans make mistakes, there is always
an element of uncertainty: the defense team works to convince that judge and
jury that negligence wasn’t a factor, and if the jury isn’t certain, the
defendant is found innocent. But when
faced with identical conditions, software bugs often can be reproduced, again
and again. This sure sounds like
negligence by the vendor of the smart car.
Making matters worse, smart cars aren’t
designed and tested the way we design and test airplane autopilots or
smart-grid control technologies or cardiac pacemakers. In fact, smart car vendors use the same
techniques used to create the software on our mobile phones and desktop
computers. They don’t have much choice:
autopilots and cardiac pacemakers are incredibly simple when compared to a
self-driving car’s guidance system: not much really happens at 35,000 feet, and
the pilot of the plane can always take over if the autopilot senses possible
problems.
In contrast, a self-driving car operates
under complex, rapidly changing conditions.
For many possible scenarios, it isn’t even clear what the correct action
should be. With such a complex, vague,
problem specification, it is impossible to validate a smart car using the
safety standards imposed on an airplane autopilot. That sort of open-ended safety validation may
never be possible.
Could smart highways be the answer?
But there is an alternative. We don’t need to tolerate an anything-goes
mindset. Imagine that future cities were
linked by networks of smart
highways. Such a highway would have
a built in and continuously active traffic control system, fully automated,
like an air traffic control system but designed to control cars rather than
airplanes. In this approach, we wouldn’t
allow self-driving cars to make their own decisions, so the prospect of cars
driving themselves down residential neighborhood streets where children and
pets are playing would be eliminated.
Instead, self-driving systems would only be engaged while on a smart
highway, and even then, would be under control of that highway at all
times. The human driver would take over
when exiting the smart highway, and if anything at all were to go wrong, the
car would pull over and stop, or the entire highway would simply come to a
halt, quickly and safely.
And this gets to the crux of the
matter. A smart highway can be designed
with a safety-first mentality: if safety can’t be assured, then the cars stop. In contrast, today’s self-driving cars can
never be 100% certain that they are driving safely, or even that they are 100%
bug free. They don’t even try. We’re
creating a new world that will operate like a sky full of airplanes on
autopilot but without guidance from air traffic controllers, and this is a
clear recipe for disaster.
Moreover, a smart highway control system
could be built much as we build air traffic control software: with a primary
focus on safety and fail-safe behavior, which would shift the onus from
vehicles making totally independent decisions towards a model in which the
independence of the on-board control system is constrained by an overarching
safety policy: Now we would know not simply that the Tesla control system was
engaged, but also that the path planning in which it engaged is bounded by a
safety envelop that is the same as (or at least consistent with) the one that
the truck was using. We would have a
safety model on which the solution depends; if a smart car crashed into a smart
truck on a smart highway, it would be crystal clear where the model broke down:
perhaps, by not allowing enough stopping distance between vehicles, or perhaps
the truck lost control and just at this instant, the car brakes failed. But we could break the problem down, assign
clear responsibility, and if needed, assign liability.
A smart highway also introduces a form of
independent witness. Think of the
VW/Audi emission reporting scandal. What
if a smart car were to err, but then was programmed to lie about the inputs on
which it based its decisions, or the conditions that led to the crash? With a smart highway filming the event and
perhaps programmed to automatically persist that data in the event of an
accident, we would have concrete evidence from an independent witness that
could be used to detect this sort of preprogrammed fibbing.
Tesla’s accident makes it clear that blind
trust in machine intelligence is a lethal mistake. The good news is that this particular mistake
is one we can still correct, by building smart highways to control the smart
cars and mandating that the cars use those solutions. As technology advances, we’ll see more and
more autonomous computing in settings like hospitals, and there is a real risk
that if we blink now, weak standards will spill over into other areas that
currently demand stringent quality controls.
We really need to get this one right.
Integrating Artificial Intelligence into cars/driving/highways has excited me as well. I agree with your Mark Twain quote, it really does show how humans rely way too much on technology to rule their lives. I'm currently applying to Cornell, and I am extremely interested in your research in cloud computing. I have been thinking of a project to integrate cloud computing into public parking, where paying, reserving, and parking is all facilitated online. What do you think? Will this be something that we head into the near future?
ReplyDeleteI never saw this... sorry to notice it 8 months late! But I do think that parking is in this overall category of opportunities. There have been a number of smart parking projects in Europe.
ReplyDelete