Wednesday, 7 December 2016

Smart Cars will need Smart Highways

Vendors are building smart cars much as they build web browsers and cloud applications.  But when a browser crashes, nothing bad happens.  If a car malfunctions, people can be killed.  Product liability could reshape the industry, but I don’t think this needs to be a setback (if anything, it may be empowering, for reasons I’ll explain).

How one builds safety-critical systems

I’ve had several opportunities to design distributed computing systems for settings where mistakes  matter.
These projects have to be taken very seriously.  If an air traffic system (human in the loop like in Europe, or partly automated like we do it here in the US) authorizes a flight plan and the pilot trusts that authorization, lives are on the line.  In fact, such systems have built-in redundancy: airplanes, for example, have a collision avoidance system on board (TCAS).  It operates independently, warning the pilot of any possible violation of plane separation rules and telling the pilot what to do.   The ATC system is also backed up: if the French ATC were to go down, control shifts to neighboring countries: Italy, Switzerland, Germany, England, etc.  There are layers and layers of redundancy.  Of course we don’t want to go there: the last thing anyone would want is an air traffic solution that frequently triggers the last-resort collision avoidance solution.  So we use every imaginable technique to harden these kinds of systems.  But when the technology fails even so, there is a fall-back.

Having humans make the ultimate decisions is especially important in systems that include large software components.  In an ATC system, we use Linux and commercial databases and publish subscribe tools.   It we also know that Linux is buggy, and the database subsystem, and the publish subscribe solution, and even the C++ compiler.  We know this.  So the system is built to be internally redundant and distrustful of itself: it just shuts things down, quite aggressively, if they seem to act strangely.  We assume that things mostly work as intended, but we do not make life or death decisions that depend on trusting that this is so.

Further, even with all this paranoia, a ATC system requires close supervision.  Ultimately, each recommended action is scrutinized by human controllers, and for major actions,  human approval is still required before the recommendation is adopted.  This approach matters more and more over the lifetime of such a project: by now this system has been in use nearly 30 years.  Think of all the patches and upgrades that have occurred.  Even if we had mathematical proofs and astonishing levels of testing before we rolled out ATC version 1.0, could we still feel equally confident in version 17.5.1871?  If you don't apply patches, you run with known bugs.  But it is patches introduce new bugs, too!  The same with major new releases of important components.

These same concerns all apply to self-driving cars.  Worse, cars are are easier for hackers to attack, and far more likely to be targeted.  We took many measures to protect the French ATC.  But protecting one system is easy.  Once there are a million cars on the road, all being serviced by random garages twice a year, will there really be any possible justification to trust that the system is working as originally intended?

“It ain’t what you don’t know that gets you in trouble.  It’s what you think you know that just ain’t so.”  (Mark Twain)

I love this Mark Twain quote.  People are always so sure of themselves, especially when they are totally wrong!
No matter how heroic the measures we take, any technical solution to a problem ultimately depends on hardware, on assumptions, on models, and on problem statements.  The people in charge of defining the computational task can get any or all of those wrong.  When we build a solution, we might be building the wrong thing without realizing it, or overlooking important cases that weren’t properly identified. 
I’ve often been given incorrect hardware specifications (and it isn’t unusual to only notice that the specification was wrong when it causes some really bizarre bug).  I’ve discovered at least ten serious compiler bugs over my career, and innumerable operating systems bugs.  Programming languages are notorious for overlooking cases when specifying the correct semantics for the compiler to employ, so even if the compiler wasn’t buggy, the language design itself may be ambiguous in some respects. 
Or suppose we get it right, deploy our system, and it works like a charm.  Success breeds ambition, so before you know it, the way that the system is deployed and used can evolve.  But this evolution over time could easily invalidate some assumption without anyone really noticing.  Thinking about Mark Twain’s point: sometimes, the things you were sure about when you started the project cease to be true down the road.
I could share some hair-raising stories about how such events caused problems in systems I’ve helped build.  But you know what?  There hasn’t been a single time when at the end of the day, those layers of fail-safe protections didn’t ultimately kick in and prevent the system from actually doing any harm.  So my actual experience has been that if you take proper care, and approach mission-critical computing with adequate humility, you can create safe solutions even using fairly complex technologies.

Self-driving cars are taking a different approach, and it worries me.

This brings us to my key concern: Today, some important safety-critical technology areas have started activities that seem to need the same standards that are used in air traffic control, and yet aren’t using those standards.  Specifically, I’m concerned about self-driving cars.   The core of my worry is this: I don’t actually think that it is possible to build a genuinely safe, fully autonomous driving solution.  Even human drivers misjudge one-another’s intent and accidents ensue (to say nothing of situations where a driver is impaired in some way).  Self-driving cars operate in a world populated by those same erratic human drivers, but complicated by their extensive dependency on incredibly complex sensor technologies.  And yet they lack a good fail-safe option, because they are often operating in domains where slamming on the brakes or swerving suddenly is dangerous too.
We never allow airplanes to get into that sort of situation.  If we did, they would bang into each other in mid-air.  Perhaps you watched Breaking Bad, where a deeply depressed air traffic controller actually steered two planes into a collision.  In practice, that couldn’t happen: alarms would go off in the air traffic control center, and separately in the cockpits, and either some other controller would have stepped in, or TCAS would have steered them away from one-another.   The air traffic control system is designed to keep planes from getting dangerously close to one-another.
In contrast, not long ago a self-driving car operated by software developed at Tesla crashed into a truck, killing the driver.  Tesla accepted some blame: a white truck, on a foggy morning with a bright white background, a street with mostly white surrounding buildings, etc.  The car couldn’t find the contours of the truck, and thought it was driving into an empty intersection.  Tesla also put lots of blame on the driver, suggesting that he had been warned about the limits of its vehicles, and shouldn’t have trusted the self-driving system under those conditions.  Because only the driver was killed, and the driver did make a decision to activate the self-driving mode, Tesla has some chance of prevailing if this question ends up decided in court.  Nobody seems to be suggesting that the company bears any major liability.
This is common in computing: it is rare for anyone to accept responsibility for anything.  But I think it is wrong-headed, and in this story, I see vivid evidence that today’s self-driving cars are operating without those layers of redundancy, and without air traffic control.
It doesn’t take insider knowledge to guess that Tesla’s accident foreshadows further such events in the future, by Tesla and other vendors as well.  We are all Teslas, to paraphrase John F. Kennedy.  Sooner or later, a self-driving car will cause an accident in which the owner of the car is definitely not at fault, and in which completely innocent people are killed or severely injured.  Maybe the problem will be the n’th repeat of Testa’s white truck/white sky scenario, or maybe it will be some other vendor, some other scenario, and will be the very first time that the particular model of car, from the particular vendor, was ever exposed to the situation that caused the crash.  Whatever the cause, lawyers will sue (because this is what lawyers do), and those lawsuits will reveal a huge gap between software liability laws (where the vendor is generally not responsible for harm), and the much tougher liability rules used in life and safety-critical systems, like airplane autopilots and cardiac defibrillators. 
If you build part of an air traffic control system, protection comes from using best standard practice, but also from adopting a layered approach to safety: there needs to be a fail-safe backup option.  Self-driving cars take huge risks, and they lack a safe backup.
So my suggestion is that we need to view self-driving cars in this other way: we should subject self-driving car technologies to the sort of extreme oversight used for air traffic control systems, or airplane guidance systems.  And we should hold the vendor responsible for using best standard practice, looking not just at other self-driving car vendors (that could be a race to the bottom), but at other industries too, and specifically air traffic control.
In suggesting this, I need to acknowledge something: the situation actually isn’t completely clear-cut: Self-driving cars are expected to have better safety records than human drivers.  Computers don’t drive after a few drinks, or text while driving, or doze off.  Very likely, accident rates will drop as self-driving cars enter into wide use.  Some are already arguing that this better average behavior should excuse the occasional horrific accident, even if a software bug causes that accident, and even if the car would make the same mistake, again and again, in the identical (but incredibly rare, unbelievably unlikely) conditions.  But I don’t agree.
Notice the apples-to-oranges comparison occurring here.  On the one hand, we have an averaged out statistical argument that self-driving cars are safer.  And on the other, we have bugs and design flaws, and the absolute certainty that someday, a self-driving car will make an actual mistake that, without question, will be the direct cause of a lethal accident.  Can one legitimately use a statistical observation to counter a rigidly factual one?  Does it matter whether or not we can trace the accident to a specific bug, or to a specific failure in a sensor (that is, are Heisenbug’s somehow excusable, but Bohrbugs intolerable)?   Does it matter whether a human driver would have found the road conditions tough?  Or are the only really problematic bugs ones that cause an accident under perfect driving conditions, when the vendor definitely hasn’t warned that the technology would be unsafe to use?  And while on that topic, shouldn’t the vendor’s software disable itself under conditions where it can’t safely be used?  Why should the human driver have to make a guess about what the vehicle can, and cannot, handle safely?
Where computers are concerned, mistakes are inevitable: not only does complex software have bugs, but self-driving cars will additionally depend on specialized sensing equipment: video cameras, GPS systems, miniature radar devices, wireless connections to the “cloud” to get updates about road conditions….  All of these technologies can fail: the video cameras and radars can become dirty, or malfunction during brutal winter weather, or get knocked out of alignment. 
Indeed, a self-driving car might crash for some reason that isn’t directly tied to a software bug.  Any driver knows that a tire can suddenly explode, or something might crash onto the roadway without warning.  But how can software systems be designed to deal with every imaginable hazard?  Some vendors are proposing to just tell the human driver to take over if such a situation arises, but a sudden handoff could easily catch the driver by surprise.  It follows that accidents will happen.  
When humans make mistakes, there is always an element of uncertainty: the defense team works to convince that judge and jury that negligence wasn’t a factor, and if the jury isn’t certain, the defendant is found innocent.  But when faced with identical conditions, software bugs often can be reproduced, again and again.  This sure sounds like negligence by the vendor of the smart car.
Making matters worse, smart cars aren’t designed and tested the way we design and test airplane autopilots or smart-grid control technologies or cardiac pacemakers.  In fact, smart car vendors use the same techniques used to create the software on our mobile phones and desktop computers.  They don’t have much choice: autopilots and cardiac pacemakers are incredibly simple when compared to a self-driving car’s guidance system: not much really happens at 35,000 feet, and the pilot of the plane can always take over if the autopilot senses possible problems.
In contrast, a self-driving car operates under complex, rapidly changing conditions.  For many possible scenarios, it isn’t even clear what the correct action should be.  With such a complex, vague, problem specification, it is impossible to validate a smart car using the safety standards imposed on an airplane autopilot.  That sort of open-ended safety validation may never be possible. 
Could smart highways be the answer?
But there is an alternative.  We don’t need to tolerate an anything-goes mindset.  Imagine that future cities were linked by networks of smart highways.  Such a highway would have a built in and continuously active traffic control system, fully automated, like an air traffic control system but designed to control cars rather than airplanes.  In this approach, we wouldn’t allow self-driving cars to make their own decisions, so the prospect of cars driving themselves down residential neighborhood streets where children and pets are playing would be eliminated.  Instead, self-driving systems would only be engaged while on a smart highway, and even then, would be under control of that highway at all times.   The human driver would take over when exiting the smart highway, and if anything at all were to go wrong, the car would pull over and stop, or the entire highway would simply come to a halt, quickly and safely. 
And this gets to the crux of the matter.  A smart highway can be designed with a safety-first mentality: if safety can’t be assured, then the cars stop.  In contrast, today’s self-driving cars can never be 100% certain that they are driving safely, or even that they are 100% bug free.  They don’t even try. We’re creating a new world that will operate like a sky full of airplanes on autopilot but without guidance from air traffic controllers, and this is a clear recipe for disaster. 
Moreover, a smart highway control system could be built much as we build air traffic control software: with a primary focus on safety and fail-safe behavior, which would shift the onus from vehicles making totally independent decisions towards a model in which the independence of the on-board control system is constrained by an overarching safety policy: Now we would know not simply that the Tesla control system was engaged, but also that the path planning in which it engaged is bounded by a safety envelop that is the same as (or at least consistent with) the one that the truck was using.   We would have a safety model on which the solution depends; if a smart car crashed into a smart truck on a smart highway, it would be crystal clear where the model broke down: perhaps, by not allowing enough stopping distance between vehicles, or perhaps the truck lost control and just at this instant, the car brakes failed.  But we could break the problem down, assign clear responsibility, and if needed, assign liability.
A smart highway also introduces a form of independent witness.  Think of the VW/Audi emission reporting scandal.  What if a smart car were to err, but then was programmed to lie about the inputs on which it based its decisions, or the conditions that led to the crash?  With a smart highway filming the event and perhaps programmed to automatically persist that data in the event of an accident, we would have concrete evidence from an independent witness that could be used to detect this sort of preprogrammed fibbing.
Tesla’s accident makes it clear that blind trust in machine intelligence is a lethal mistake.  The good news is that this particular mistake is one we can still correct, by building smart highways to control the smart cars and mandating that the cars use those solutions.  As technology advances, we’ll see more and more autonomous computing in settings like hospitals, and there is a real risk that if we blink now, weak standards will spill over into other areas that currently demand stringent quality controls.  We really need to get this one right. 

1 comment:

  1. Integrating Artificial Intelligence into cars/driving/highways has excited me as well. I agree with your Mark Twain quote, it really does show how humans rely way too much on technology to rule their lives. I'm currently applying to Cornell, and I am extremely interested in your research in cloud computing. I have been thinking of a project to integrate cloud computing into public parking, where paying, reserving, and parking is all facilitated online. What do you think? Will this be something that we head into the near future?