Wednesday, 3 April 2019

The intractable complexity of machine-learned control systems for safety-critical settings.

As I read the reporting on the dual Boeing 737 Max air disasters, what I find worrying is that the plane seems to have depended on a very complicated set of mechanisms that interacted with each other, with the pilot, with the airplane flaps and wings, and with the environment in what might think of as a kind of exponentially large cross-product of potential situations and causal-sequence chains.  I hope that eventually we'll understand the technical failure that brought these planes down, but for me, the deeper story is already evident, and it concerns the limits on our ability to fully specify extremely complex cyber-physical systems, to fully characterize the environments in which they need to operate, to anticipate every plausible failure mode and the resultant behavior, and to certify that the resulting system won't trigger a calamity.   Complexity is the real enemy of assurance, and the failure to learn that lesson can result in huge loss of lives.

One would think that each event of this kind would be sobering and lead to a broad pushback against "over-automation" of safety-critical systems.  But there is a popular term that seemingly shuts down rational thinking: machine learning.

The emerging wave of self-driving cars will be immensely complex -- in many ways, even more than the Boeing aircraft, and also far more dependent upon a high quality of external information coming from systems external to the cars (from the cloud).  But whereas the public seems to perceive the Boeing flight control system as a "machine" that malfunctioned, and has been quick to affix blame, it doesn't seem as if accidents involving self-driving cars elicit a similar reaction: there have been several very worrying accidents by now, and several deaths, yet the press, the investment community and even the public appear to be enthralled.

This is amplified by ambiguity about how to regulate the area.  Although any car on the road is subject to safety reviews both by a federal organization called the NHTSA and by state regulators, the whole area of self-driving vehicles is very new.  As a result, these cars don't undergo anything like the government "red team" certification analysis required for planes before they are licensed to fly.  My sense is that because these cars are perceived as intelligent, they are somehow being treated differently from what we perceive as a more mechanical style of system when we think about critical systems on aircraft, quite possibly because machine intelligence brings such an extreme form of complexity that there actually isn't any meaningful way to fully model or verify their potential behavior.  Movie treatments of AI focus on themes like "transcendence" or "exponential self-evolution" and in doing so, they both highlight the fundamental issue here (namely, that we have created a technology we can't truly characterize or comprehend), while at the same time elevating it to human-like or even superhuman status.

Take a neural network: with even a few nodes, today's neural network models become mathematically intractable in the sense that although we do have mathematical theories that can describe their behavior, the actual instances would be too complex to model.  Thus one can definitely build such a network and experiment upon it, but it becomes impossible to make rigorous mathematical statements.  In effect, these systems are simply too complex to predict their behavior, other than by just running them and watching how they perform.  On the one hand, I suppose you can say this about human pilots and drivers too.  But on the other hand, this reinforces the point I just made above: the analogy is false, because a neural network is very different than an intelligent human mind, and when we draw that comparison we conflate two completely distinct control models.

With safety critical systems, the idea of adversarial certification, in which the developer proves the system safe while the certification authority poses harder and harder challenges, is well established.  Depending on the nature of the question, the developer may be expected to use mathematics, testing, simulation, forms of root-cause analysis, or other methods.  But once we begin to talk about systems that have unquantifiable behavior, and yet that may confront stimuli that could never have been dreamed up during testing,  we enter a domain that can't really be certified for safe operation in the way that an aircraft control system normally would be -- or at least, would have been in the past, since the Boeing disasters suggest that even aircraft control systems may have finally become overwhelmingly complex.

When we build systems that have neural networks at their core, for tasks like robotic vision or robotic driving, we enter an inherently uncertifiable world in which it just ceases to be possible to undertake a rigorous, adversarial, analysis of risks.

To make matters even worse, today's self-driving cars are designed in a highly secretive manner, tested by the vendors themselves, and are really being tested and trained in a single process that occurs out on the road, surrounded by normal human drivers, people walking their dogs or bicycling to work, children playing in parks and walking to school.  All this plays out even as the systems undergo the usual rhythm of bug identification and correction, frequent software patches and upgrades: a process in which the safety critical elements are continuously evolved even as the system is developed and tested.

The government regulators aren't being asked to certify instances of well-understood control technologies, as with planes, but are rather being asked to certify black boxes that in fact are nearly as opaque to their creators as to the government watchdog.  No matter what the question, the developer's response is invariably the same: "in our testing, we have never seen that problem."  Boeing reminds us to add the qualifier: "Yet."

The area is new for the NHTSA and the various state-level regulators, and I'm sure that they rationalize this inherent opacity by saying to themselves that over time, we will gradually develop  safety-certification  rules -- the idea that by its nature, these technologies may not permit a systematic certification seems not to have occurred to the government, or the public.  And yet a self-driving car is a 2-ton robot that can accelerate from 0 to 80 in 10 seconds.

You may be thinking that well, in both cases, the ultimate responsibility is with the human operator.  But in fact there are many reasons to doubt that a human can plausibly intervene in the event of a sudden problem: people simply aren't good at reacting in a fraction of a second.  In the case of the Boeing 737 Max, the pilots of the two doomed planes certainly weren't able to regain control, despite the fact that one of them apparently did disable the problematic system seconds into the flight.  Part of the problem relates to unintended consequences: apparently, Boeing recommends disabling the system by turning off an entire set of subsystems, and some of those are needed during takeoff, so the pilot was forced to reengage them, and with them, the anti-stall system reactivated. A second issue is just the lack of adequate time to achieve "affirmative control:"  People need time to formulate a plan when confronted with a complex crisis outside of their experience, and if that crisis is playing out very rapidly, may be so overwhelmed that even if a viable recovery plan is available they might fail to discover it.

I know the feeling.  Here in the frosty US north, it can sometimes happen that you find that your car is starting to skid on icy, snowy roads.  Over the years I've learned to deal with skids, but it takes practice.  The first times, all your instincts are wrong: in fact for an unexperienced driver faced with a skid, the safest reaction is to freeze.  The actual required sequence is to start by figuring out which direction the car is sliding in (and you have to do this while your car is rotating).  Then you should  steer towards that direction, no matter what it happens to be.  Your car should straighten out, at which point you can gently pump the brakes.  But all this takes time, and if you are skidding quickly, you'll be in a snowbank or a ditch before you manage to regain control.  In fact the best bet of all is to not skid in the first place, and after decades of experience, I never do.  But it takes training to reach this point.  How do we train the self-driving machine learning systems on these rare situations?  And keep in mind, every skid has its very own trigger.  The nature of the surface, the surroundings of the road, the weather, the slope or curve, other traffic -- all factor in.

Can machine learning systems powered by neural networks and other clever AI tools somehow magically solve such problems?

When I make this case to my colleagues who work in the area, they invariably respond that the statistics are great... and yes, as of today, anyone would have to acknowledge this point.  Google's Waymo has already driven a million miles without any accidents, and perhaps far more because that number has been out there for a while.

But then (sort of like with studies of new, expensive, medications) you run into the qualifiers.  It turns out that Google tests Waymo in places like Arizona, where roads are wide, temperatures can be high, and the number of people and pets and bicyclists would often be rather low (120F heat doesn't really make bicycling to work all that appealing).   They also carefully clean and tune their cars between each test drive, so the vehicles are in flawless shape.  The also benefit simply because so many human drivers shouldn't be behind the wheel in the first place: Waymo is never intoxicated, isn't likely to be distracted by music, phone calls, texting, arguments between the kids in the back.  It has some clear advantages even before it steers itself from the parking lot.

Yet there are easy retorts:
  • Let's face it: conditions in places like Phoenix or rural Florida are about as benign as can be imagined.  In any actual nationwide deployment, cars would need to cope with mud and road salt, misaligned components, power supply issues (did you know that chipmunks absolutely love to gnaw on battery wires?).  Moreover, these vehicles had professional drivers in the emergency backup role, and focused attentively on the road and the dashboard while being monitored by a second level of professionals with the specific role of reminding them to pay attention.  In a real deployment, the human operator might be reading the evening sports results while knocking back a beer or two and listening to the radio or texting a friend.  
  • Then we run into issues of roadwork that invalidates maps and lane markings, GPS signals are well known to bounce off buildings, resulting in echoes that can confuse a location sensor (if you have ever used Google Maps in a big city, you know what I mean).  Weather conditions can result in vehicle challenges never seen in Phoenix: blizzard conditions, flooded or icy road surfaces, counties that ran low on salt and money for plowing and left their little section of I87 unplowed in the blizzard, potholes hiding under puddles or in deep shadow, tires with uneven tread wear or that have gone out of balance, and the list is really endless.  On the major highways near New York, I've seen cars abandoned right in the middle lane, trashcans upended to warn drivers of missing manhole covers, and had all sorts of objects fly off flatbed trucks right in front of me: huge metal boxes, chunks of loose concrete or metal, a mattress, a refrigerator door...  This is the "real world" of driving, and self-driving cars will experience all of these things and more from the moment we turn them loose in the wild.
  • Regional driving styles vary widely too, and sometimes in ways that don't easily translate from place to place and that might never arise in Phoenix.  For example, teenagers who act out in New Jersey and New York are fond of weaving through traffic.  At very high speeds.  In Paris, this has become an entire new concept in which motorcyclists get to use the lanes "between" the lanes of cars as narrow, high-speed driving lanes (and they weave too).  But New Jersey has its version of a weird rule of the road too: on the main roads near Princeton, for some reason there is a kind of sport of not letting cars enter from, and even elderly drivers won't leave you more than a fraction of a second and a few inches to spare as you dive into the endless stream.  I'm a New Yorker, and can drive like a taxi driver there... well, any New York taxi driver worth his or her salary can confirm that this is a unique experience, a bit like a kind of high-speed automotive ballet (or if you prefer, like being a single fish in a school of fish).  The taxis flow down the NYC avenues at 50mph, trying to stay with the green lights, and flowing around obstacles in a strangely coordinated ways.  But New York isn't special. Over in Tel Aviv, drivers will switch lanes without a further through after glancing no more than 45 degrees to either side, and will casually pull in front of you leaving centimeters to spare.  Back in France, at the Arc de Triomphe and Place Victor Hugo, the roundabouts allow incoming traffic in priority over outgoing traffic... but only those two use this rule; in all the rest of Europe, the priority favors outgoing traffic (this makes a great example for teaching about deadlocks!)  And in Belgium, there are a remarkable number of unmarked intersections.  On those, the priority always allows the person entering from the right to cut in front of the person on his/her left even if the person from the right is crossing the street or turning, and even if the person on the left was on what seemed like the main road.  In Province, the roads are too narrow: everyone blasts down them at 70mph but also is quick to actually go off the road, with one tire on the grass, if someone approaches in the other direction.  If you didn't follow that rule... bang!  In New Dehli and Chennai, anything at all is accepted -- anything.  In rural Mexico, at least the last time I was there, the local drivers enjoyed terrifying the non-local ones (and I can just imagine how they would have treated robotic vehicles).   
And those are just environmental worries.  For me, the stranger part of the story is the complacency of the very same technology writers who are rushing to assign blame in the recent plane crashes.  This gets back to my use of the term "enthralled."  Somehow, for them, the mere fact that self-driving cars are "artificially intelligent" seems to blind technology reviewers to the evident reality: namely, that there are tasks that are far too difficult for today's machine learning solutions, and that they simply aren't up to the task of driving cars -- not even close!

What, precisely, is the state of the art?  Well, we happen to be wrapping up an exciting season of  faculty hiring focused on exactly these areas of machine learning.  In the past few weeks I've seen talks on vision systems that try to make sense of clutter or to anticipate what might be going on around a corner or behind some visual obstacle.  No surprise: the state of the art is rather primitive.  We've also heard about research on robotic motion just to solve basic tasks like finding a path from point A to point B in a complex environment, or ways to maneuver that won't startle humans in the vicinity.

Let me pause to just point out that if these basic tasks are considered to be cutting edge research, shouldn't it be obvious that the task of finding a safe path, in real-time (cars don't stop on a dime, you know), is actually not a solved one, either.  If we can't do it in a warehouse, how in the world have we talked ourselves into doing it on Phoenix city streets?

Self-driving cars center on deep neural networks for vision, and yet nobody quite understands how to relate the problem these devices solve to the real safety issue that cars confront.  Quite the opposite: neural networks for vision are known to act bizarrely for seemingly trivial reasons.  A neural network that is the world's best for interpreting photos can be completely thrown off by simply placing a toy elephant somewhere in the room.  A different neural network, that one a champ at making sense of roadway scenes, stops recognizing anything if you inject just a bit of random noise.  Just last night I read a report that Tesla cars can be tricked to veer into oncoming traffic if you put a few spots of white paint on the lane they are driving down, or if the lane marking to one side is fuzzy.  Tesla, of course, denies that this could ever occur in a real-world setting, and points out that they have never observed such an issue, not even once.

People will often tell you that even if the self-driving car concept never matures, at least it will spin some amazing technologies out.  I'll grant them this: the point is valid.

For example, Intel's Mobile Eye is a genuinely amazing little device that warns you if the cars up ahead of you suddenly brake.  I had it in a rental car recently and it definitely avoided a possible read-ender for me new Newark airport.  I was driving on the highway when a the wind blew a lot of garbage from a passing truck.  Everyone (me included) glanced in that direction, but someone up ahead must have also  slammed on the brakes.  A pileup was a real risk, but Mobile Eye made this weird squack (as if I was having a sudden close encounter with an angry duck), and vibrated the steering wheel, and it worked: I slowed down in time.

On the other hand, Mobile Eye also gets confused.  A few times it through I was drifting from my lane when actually, the lane markers themselves were just messy (some sort of old roadwork had left traces of temporary lane markings).  And at one point I noticed that it was watching speed limit signs, but was confused by the speed limits for exit-only lanes to my right, thinking that they also applied to the thru traffic lanes I was in.

Now think about this: if Mobile Eye gets confused, why should you assume that Waymo and Tesla and Uber self-driving cars never get confused?  All four use neural network vision systems.  This is a very fair question.

Another of my favorite spinouts is Hari Balakrishnan's startup in Boston.  His company is planning to monitor the quality of drivers: the person driving your car and perhaps those around your car too.  What a great idea!

My only worry is that if this were really to work well, could our society deal with the consequences?  Suppose that your head-up display somehow drew a red box around every dangerous car anywhere near you on the road.  On the positive side, now you would know which are were being driven by hormonal teenagers, which have drivers who are distracted by texting, which are piloted by drunk or stoned drivers, which have drivers with severe cataracts who can't actually see much of anything...

But on the negative side, I honestly don't know how we will react.  The fact is that we're surrounded by non-roadworthy cars, trucks carrying poorly secured loads of garbage,  and drivers who probably should be arrested!

Then it runs the other way, too.  If you are driving on a very poor road surface, you might be swerving to avoid the potholes or debris.  A hands-free phone conversation is perfectly legal, as is the use of Google Maps to find the address of that new dentist's office.  We wouldn't want to be "red boxed" and perhaps stopped by a state trooper for reasons like that.

So I do hope that Hari can put a dent in in road safety.  But I suspect that he and his company will need quite a bit of that $500M they just raised to pull it of.

So where am I going with all of this?  It comes down to an ethical question.  Right this second, the world is in strong agreement that Boeing's 737 Max is unsafe under some not-yet-fully-described condition.  Hundreds of innocent people have died because of that.  And don't assume that Airbus is somehow different -- John Rushby could tell you some pretty hair-raising stories about Airbus technology issues (their planes have been "fly by wire" for decades now, so they are not new to the kind of puzzle we've been discussing).  Perhaps you are thinking that well, at least Airbus hasn't killed anyone.  But is that really a coherent way to think about safety?

Self-driving cars may soon be carrying far more passengers under far more complex conditions than either of these brands of air craft.  And in fact, driving is a much harder job than flying a plane.  Our colleagues are creating these self-driving cars, and in my view, their solutions just aren't safe to launch onto our roads yet.  This generation of machine learning may simply not be up to the task.  Our entire approach to safety certification isn't yet ready to cope with the needed certification tasks.

When we agree to allow these things out on the road, that will include roads that you and your family will be driving on, too.  Should Hari's company put a red warning box around them, to help you stay far from them?  And they may be driving on your local city street too.  Your dog will be chasing sticks next to that street, your cats will out there doing whatever cats do, and your children will be learning to bicycle, all on those same shared roads.

There have already been too many  deaths.  Shouldn't our community be calling for this to stop, before far more people get hurt?