Thursday, 10 October 2019

If the world were a (temporal) database...

I thought I might share a problem we've been discussing in my Cornell PhD-level class on programming at the IoT edge.

Imagine that at some point in the future, a company buys into the idea (my idea!) that we'll really need smart highway systems to ever safely use self-driving cars. A bit like air traffic control, but scaled up.

In fact a smart highway might even offer guidance and special services to "good drivers", such as permission to drive at 85mph in a special lane for self-guided cars and superior drivers... for a fee.  Between the smart cars and the good drivers, there would be a lot of ways to earn revenue here.  And we could even make some kind of dent in the endless gridlock that one sees in places like Silicon Valley from around 6:45am until around 7pm.

So this company, call it Smart Highways Inc, sets out to create the Linux of smart highways: a new form of operating system that the operator of the highway could license to control their infrastructure.

What would it take to make a highway intelligent?  It seems to me that we would basically need to deploy a great many sensors, presumably in a pattern intendent to give us some degree of redundancy for fault-tolerance, covering such things as roadway conditions, weather, vehicles on the road, and so forth.

For each vehicle we would want to know various things about it: when it entered the system (for eventual tolls, which will be the way all of this pays for itself), its current trajectory (a path through space-time annotated with speeds and changes in speed or direction), information about the vehicle itself (is it smart?  is the driver subscribed to highway guidance or driving autonomously?), and so forth.

Now we could describe a representative "app": perhaps, for the stretch of CA 101 from San Francisco to San Jose, a decision has been made to document the "worst drivers" over a one month period.  (Another revenue opportunity: this data could definitely be sold to insurance companies!)   How might we do this?  And in particular, how might we really implement our solution?

What I like about this question is that it casts light on exactly the form of Edge IoT I've been excited about.  On the one hand, there is an AI/ML aspect: automated guidance to the vehicles by the highway, and in this example, an automated judgement about the quality of driving.  One would imagine that we train an expert system to take trajectories as input and output a quality metric: a driver swerving between other cars at high speed, accelerating and turning abruptly, braking abruptly, etc: all the hallmarks of poor driving!

But if you think more about this you'll quickly realize that to judge quality of driving you need a bit more information.  A driver who swerves in front of another car with inches to spare, passes when there are oncoming vehicles, causes others to break or swerve to avoid collisions -- that driver is far more of a hazard than a driver who swerves suddenly to avoid a pothole or some other form of debris, or one who accelerates only while passing, and passes only when nobody is anywhere nearby in the passing lane.  A driver who stays the right "when possible" is generally considered to be a better driver than one who lingers in the left, if the highway isn't overly crowded.

A judgment is needed: was this abrupt action valid, or inappropriate?  Was it good driving that evaded a problem, or reckless driving that nearly caused an accident?

So in this you can see that our expert system will need expert context information.  We would want to compute the set of cars near each vehicle's trajectory, and would want to be able to query the trajectories of those cars to see if they were forced to take any kind of evasive action.  We need to synthesize metrics of "roadway state" such as crowded or light traffic, perhaps identify bunches of cars (even on a lightly driven road we might see a grouping of cars that effectively blocks all the lanes), etc.  Road surface and visibility clearly are relevant, and roadway debris.  We would need some form of composite model covering all of these considerations.

I could elaborate but I'm hoping you can already see that we are looking at a very complicated real-time database question.  What makes it interesting to me is that on the one hand, it clearly does have a great deal of relatively standard structure (like any database): a schema listing information we can collect for each vehicle (of course some "fields" may be populated for only some cars...), one for each segment of the highway, perhaps one for each driver.  When we collect a set of documentation on bad drivers of the month, we end up with a database with bad-driver records in it: one linked to vehicle and driver (after all, a few people might share one vehicle and perhaps only some of the drivers are reckless), and then a series of videos or trajectory records demonstrating some of the "all time worst" behavior by that particular driver over the past month.

But on the other hand, notice that our queries also have an interesting form of locality: they are most naturally expressed as predicates over a series of temporal events that make up a particular trajectory, or a particular set of trajectories: on this date at this time, vehicle such-and-such swerved to pass an 18-wheeler truck on the right, then dove three lanes across to the left (narrowly missing the bumper of a car as it did so), accelerated abruptly, braked just as suddenly and dove to the right...  Here, I'm describing some really bad behavior, but the behavior is best seen as a time-linked (and driver/vehicle-linked) series of events that are easily judged as "bad" when viewed as a group, and yet that actually would be fairly difficult to extract from a completely traditional database in which our data is separated into tables by category.  

Standard databases (and even temporal one) don't offer particularly good ways to abstract these kinds of time-related event sequences if the events themselves are from a very diverse set.  The tools are quite a bit better for very regular structures, and for time series data with identical events -- and that problem arises here, too.  For example, when computing a vehicle trajectory from identical GPS records, we are looking at a rather clean temporal database question, and some very good work has been done on this sort of thing (check out Timescale DB, created by students of my friend and colleague, Mike Freedman!).  But the full-blown problem clearly is the very diverse version, and it is much harder.  I'm sure you are thinking about one level of indirection and so forth, and yes, this is how I might approach such a question -- but it would be hard even so.

In fact, is it a good idea to model a temporal trajectory a database relation?  I suspect that it could be, and that representing the trajectory that way would be useful, but this particular kind of relation just lists events and their sequencing.  Think also about this issue of event type mentioned above: here we have events linked by the fact that they involve some single driver (directly or perhaps indirectly -- maybe quite indirectly).  Each individual event might well be of a different type: the data documenting "caused some other vehicle to take evasive action" might depend on the vehicle, and the action, and would be totally different from the data documenting "swerved across three lanes" or "passed a truck on its blind side."  

Even explaining the relationships and causality can be tricky: Well, car A swerved in front of car B, which braked, causing C to brake, causing D to swerve and impact E.  A is at fault, but D might be blamed by E!

In fact, as we move through the world -- you or me, as drivers of our respective vehicles, or for that matter even as pedestrians trying to cross the road, this aspect of building self-centered domain-specific temporal databases seems to be something we do very commonly, and yet don't model particularly well in today's computing infrastructures.  Moreover, you and I are quite comfortable with highways that might have cars and trucks, motorcycles, double-length trucks, police enforcement vehicles, ambulances, construction vehicles... extremely distinct "entities" that are all capable of turning up on a highway, our standard ways of building databases seem a bit overly structured for dealing with this kind of information.

Think next about IoT scaling.  If we had just one camera, aimed at one spot on our highway, we still could do some useful tasks with it: we could for example equip it with a radar-speed detector that would trigger photos and use that to automatically issue speeding tickets, as they do throughout Europe.  But the task I described above fuses information from what may be tens of thousands of devices deployed over a highway more than 100 miles long at the location I specified, and that highway could have a quarter-million vehicles on it at peak commute hours.

As a product opportunity, Smart Highways Inc is looking at a heck of a good market -- but only if they can pull off this incredible scaling challenge.  They won't simply be applying their AI/ML "driving quality" evaluation to individual drivers, using data from within the car (that easier task is the one Hari Balakrisnan's Cambridge Mobile Telematics has tackled, and even this problem has his company valued in the billions as of round A).  Smart Highways Inc is looking at the cross-product version of that problem: combining data across a huge number of sensor inputs, fusing the knowledge gained, and eventually making statements that involve observations taken at multiple locations by distinct devices.  Moreover, we would be doing this at highway scale, concurrently, for all of the highway all of the time.

In my lecture today, we'll be talking about MapReduce, or more properly, the Spark/Databricks version of Hadoop, which combines an open source version of MapReduce with extensions to maximize the quality of in-memory caching and introduces a big-data analytic ecosystem.  The aspect of interest to me is the caching mechanism:  Spark centers on a kind of cacheable query object they call a Resilient Distributed Data object, or RDD.  An RDD describes a scalable computation designed to be applicable across a sharded dataset, which enables a form of SIMD computing at the granularity of files or tensors being processed on huge numbers of compute nodes in a datacenter.

The puzzle for my students, which we'll explore this afternoon, is whether the RDD model could be transferred from the batched, non-real-time settings in which it normally runs (and even more than that, functional, in the sense that Spark treats every computation as a series of read-only data transformation steps, from a static batched input set through a series of MapReduce stages to a sharded, distributed result).  So our challenge is: could a graph of RDDs and an interative compute model express tasks like the Smart Highway ones?  

RDDs are really a linkage between database models and a purely functional Lisp-style Map and Reduce functional computing model.  I've always liked them, although my friends who do database research tend to view them dimnly, at best.  They often feel that all of Spark is doing something a pure database could have done far better (and perhaps more easily).  Still, people vote with their feet and for whatever reason, this RDD + computing style of coding is popular.

So... could we move RDDs to the edge?  Spark itself, clearly, wouldn't be the proper runtime: it works in a batched way, and our setting is event-driven, with intense real-time needs.  It might also entail taking actions in real-time (even pointing a camera or telling it to take a photo, or to upload one, is an action).  So Spark per-se isn't quite right here.  Yet Spark's RDD model feels appropriate.  Tensor Flow uses a similar model, by the way, so I'm being unfair when I treat this as somehow Spark-specific.  I just have more direct experience with Spark, and additionally, see Spark RDDs as a pretty clear match to the basic question of how one might start to express database queries over huge IoT sensor systems with streaming data flows.  Tensor Flow has many uses, but I've seen far more work on using it within a single machine, to integrate a local computation with some form of GPU or TPU accelerator attached to that same host.  And again, I realize that this may be unfair to Tensor Flow.  (And beyond that I don't know anything at all about Julia, yet I hear that system name quite often lately...)

Anyhow, back to RDDs.  If I'm correct, maybe someone could design an IoT Edge version of Spark, one that would actually be suitable for connecting to hundreds of thousands of sensors, and that could really perform tasks like the one outlined earlier in real-time.  Could this solve our problem?  It does need to happen in real-time: a smart highway generates far too much data per second to keep much of it, so a quick decision is needed that we should document the lousy driving of vehicle A when driver so-and-so is behind the wheel, because this person has caused a whole series of near accidents and actual ones -- sometimes, quite indirectly, yet always through his or her recklessness.  We might need to make that determination within seconds -- otherwise the documentation (the raw video and images and radar speed data) may have been discarded.

If I was new to the field, this is the problem I personally might tackle.  I've always loved problems in systems, and in my early career, systems meant databases and operating systems.  Here we have a problem of that flavor.

Today, however, scale and data rates and sheer size of data objects are transforming the game.  The kind of system needed would span entire datacenters, and we will need to use accelerators on the data path to have any chance at all of keeping up.  So we have a mix of old and new... just the kind of problem I would love to study, if I was hungry for a hugely ambitious undertaking.  And who know... if the right student knocks on my door, I might even tackle it.

Wednesday, 24 July 2019

In theory, asymptotic complexity matters. In practice...

Derecho matches Keidar and Shraer’s lower bounds for dynamically uniform agreement:  No Paxos protocol can  safely deliver messages with fewer "information exchange" steps.  But does this matter?

Derecho targets a variety of potential deployments and use cases.  A common use would be to replicate state within some kind of "sharded" service -- a big pool of servers but broken into smaller replicated subservices that use state machine replication in subsets of perhaps 2, 3 or 5.  A different use case would be for massive replication -- tasks like sharing a VM image, a container, or a machine-learned model over huge numbers of nodes.  In those cases the number of nodes might be large enough for asymptotic protocol complexity bounds to start to matter -- Derecho's optimality could be a winning argument.  But would an infrastructure management service really stream high rates of VM images, containers, and machine-learned models? I suspect that this could arise in future AI Systems... it wouldn't today.

All of which adds up to an interesting question: if theoretical optimality is kind of a "meh" thing, what efficiency bounds really matter for a system like Derecho?  And how close to ideal efficiency can a system like this really come?

To answer this question, let me start by arguing that 99% of Derecho can be ignored.  Derecho actually consists of a collection of subsystems: you link your C++ code to one library, but internally, that library has several distinct sets of "moving parts".  A first subsystem is concerned with moving bytes: our data plane.  The second worries about data persistency and versioning.  A third is where we implement the Paxos semantics: Derecho's control plane.  In fact it handles more than just Paxos -- Derecho's control plane is a single thread that loops through a set of predicates, testing them one by one and then taking triggered actions for any predicate that turns out to be enabled.  A fourth subsystem handles requests that query the distributed state: it runs purely on data that has become stable and is totally lock-free and asynchronous -- the other three subsystems can ignore this one entirely.  In fact the other three subsystems are as lock-free and asynchronous as we could manage, too -- this is the whole game when working with high speed hardware, because the hardware is often far faster than the software that manages it.  We like to think of the RDMA layer and the NVM storage as two additional concurrent systems, and our way of visualizing Derecho is a bit like imagining a machine with five separate moving parts that interact in a few spots, but are as independent as we could manage.

For steady state performance -- bandwidth and latency -- we can actually ignore everything except the update path and the query path.  And as it happens, Derecho's query path is just like any query-intensive read-only subsystem: it uses a ton of hashed indices to approximate one-hop access to objects it needs, and it uses RDMA if that one hop involves somehow fetching data from a remote node, or sending a computational task to that remote node.  This leads to fascinating questions, in fact: you want those paths to be lock-free, zero-copy, ideally efficient, etc.  But we can set those questions to the side for our purposes here -- results like the one by Keidar and Shraer really are about update rates.  And for this, as noted a second ago, almost nothing matters except the data-movement path used by the one subsystem concerned with that role.  Let's have a closer look.

For large transfers Derecho uses a tree-based data movement protocol that we call a binomial pipeline.  In simple terms, we build a binary tree, and over it, create a flow pattern of point to point block transfers that obtains a high level of internal concurrency, like a two-directional bucket brigade -- we call this a reliable multicast over RDMA, or "RDMC").  Just like in an actual bucket brigade, every node settles into a steady behavior, receiving one bucket of data (a "chunk" of bytes) as it sends some other bucket, more or less simultaneously.  The idea is to max-out the RDMA network bandwidth (the hardware simply can't move data more efficiently).  The actual data structure creates a hypercube "overlay" (a conceptual routing diagram that lives on our actual network, which allows any-to-any communication) of dimension d, and then wraps d binomial trees over it, and you can read about it in our DSN paper, or in the main Derecho paper.

A binary tree is the best you can hope for when using point-to-point transfers to replicate large, chunked, objects.  And indeed, when we measure RDMC, it seems to do as well as one can possibly do on RDMA, given that RDMA lacks a reliable one-to-many chunk transfer protocol.   So here we actually do have an ideal mapping of data movement to RDMA primitives.

Unfortunately, RDMC isn't very helpful for data too small to "chunk".  If we don't have enough data a binomial tree won't settle into its steady-state bucket brigade mode and we would just see a series of point-to-point copying actions.  This is still "optimal" at large-scale, but recall that often we will be replicating in a shard of size two, three or perhaps five.  We decided that Derecho needed a second protocol for small multicasts, and Sagar Jha implemented what he calls the SMC protocol.

SMC is very simple.  The sender, call it process P, has a window, and a counter.  To send a message, P places the data in a free slot in its window (each sender has a different window, so we mean "P's window"), and increments the counter (again, P's counter).  When every receiver (call them P, Q and R: this protocol actually loops data back, so P sends to itself as well as to the other shard members) has received the message, the slot is freed and P can reuse it, round-robin.  In a shard of size three where all the members send, there would be one instance of this per member: three windows, three counters, three sets of receive counters (one per sender).

SMC is quite efficient with small shards.  RDMA has a direct-remote-write feature that we can leverage (RDMC uses a TCP-like feature where the receiver needs to post a buffer before the sender transmits, but this direct write is different: here the receiver declares a region of memory into which the sender can do direct writes, without locking).

Or is it?  Here we run into a curious philosophical debate that centers on the proper semantics of Derecho's ordered_send: should an ordered_send be immediate, or delayed for purposes of buffering, like a file I/O stream?  Sagar, when he designed this layer, opted for urgency.  His reasoning was that if a developer can afford to batch messages and send big RDMC messages that carry thousands of smaller ones, this is exactly what he or she would do.  So a developer opting for SMC must be someone who prioritizes immediate sends, and wants the lowest possible latency.

So, assume that ordered_send is required to be "urgent".  Let's count the RDMA operations that will be needed to send one small object from P to itself (ordered_send loops back), Q and R.  First we need to copy the data from P to Q and R: two RDMA operations, because  reliable one-sided RDMA is a one-to-one action.  Next P increments its full-slots counter and pushes it too -- the updated counter can't be sent in the same operation that sends the data because RDMA has a memory consistency model under which a single operation that spans different cache-lines only guarantees sequential consistency on a per-cache-line basis, and we wouldn't want P or Q to see the full-slots counter increment without certainty that the data would be visible to them.  You need two distinct RDMA operations to be sure of that (each is said to be "memory fenced.")  So, two more RDMA operations are required.  In our three-member shard, we are up to four RDMA operations per SMC multicast.

But now we need acknowledgements.  P can't overwrite the slot until P, Q and R have received the data and looked at it, and to report when this has been completed, the three update their receive counters.  These counters need to be mirrored to one-another (for fault-tolerance reasons), so P must send its updated receive counter to Q and R, Q to P and R, and R to P and Q: six more RDMA operations, giving a total of ten.  In general with a shard of size N, we will see 2*(N-1) RDMA operations to send the data and count, and N*(N-1) for these receive counter reports, a total of N^2+N-2.  Asymptotically, RDMC will dominate because of the N^2 term, but N would need to be much larger than five for this to kick in.  At a scale of two to five members, we can think of N as more or less a constant, and so this entire term is like a constant.

So by this argument, sending M messages using SMC with an urgent-send semantic "must" cost us M*(N^2+N-2) RDMA operations.  Is this optimal?

Here we run into a hardware issue.  If you check the specification for the Connect X4 Mellanox device used in my group's experiments, you'll find that it can transmit 75M RDMA messages per second, and also that it has peak performance of 100Gbps (12.5GB) in each link direction.   But if your 75M messages are used to report updates to tiny little 4-byte counters, you haven't used much of the available bandwidth: 75M times 4 bytes is only 300MB/s, and as noted above, the device is is bidirection.  Since we are talking about bytes, the bidirectional speed could be as high as 25GB/s with an ideal pattern of transfers.  Oops: we're too slow by a factor of 75x!

In our TOCS paper SMC peaks at around 7.5M small messages per second, which bears out this observation.  We seem to be leaving a lot of capacity unused.  If you think about it, everything centers on the assumption that ordered_send should be as urgent as possible.  This is actually limiting performance and for applications that average out at 7.5M SMC messages per second or less, but have bursts that might be much higher, this is even inflating latency (a higher-rate burst will just fill the window and the sender will have to wait for a slot).

Suppose our sender wants fast SMC streaming and low latency, and simply wasn't able to do application-level batching (maybe the application has a few independent subsystems of its own that send SMC messages).  Well, everyone is familiar with file I/O streaming and buffering.  Why not use the same idea here?

Clearly we could have aggregated a bunch of SMC messages, and then done one RDMA transfer for the entire set of full window slots (it happens that RDMA has a so-called "scatter gather/put feature", and we can use that to transfer precisely the newly full slots even if they wrap around the window).  Now one counter update covers the full set.  Moreover, the receivers can do "batched" receives, and one counter update would then cover the full batch of receives.

An SMC window might have 1000 sender slots in it, with the cutoff for "small" messages being perhaps 100B.  Suppose we run with batches of size 250.  We'll have cut the overhead factors dramatically: for 1000 SMC messages in the urgent approach, the existing system would send 1000*10 RDMA messages for the 3-member shard: 10,000 in total.  Modified to batch 250 messages at a time, only 40 RDMA operations are needed: a clean 250x improvement.  In theory, our 7.5M SMC messages per second performance could then leap to 1.9B/second.  But here, predictions break down: With 100 byte payloads, that rate would actually be substantially over the limit we calculated earlier, 25GB/s, which limits us to 250M SMC messages per second.  Still, 250M is quite a bit faster than 7.5M and worth trying to achieve.

It might not be trivial to get from here to there, even with batching.  Optimizations at these insane data rates often aren't be nearly as simple as a pencil-and-paper calculation might suggest.  And there are also those urgency semantics issues to think about:  A bursty sender might have some gaps in its sending stream.  Were one to occur in the middle of a 250 message batch, we shouldn't leave those SMC messages dangling: some form of automatic flush has to kick in.  We should also have an API operation so that a user could explicitly force a flush.  

Interestingly, once you start to think about this, you'll realize that in this latency sense, Sagar's original SMC is probably "more optimal" than any batched solution can be.  If you have just one very urgent notification to send, not a batch, SMC is already a very low-latency protocol; arguably, given his argument that the API itself dictates that SMC should be an urgent protocol, his solution actually is "ideally efficient."  What we see above is that if you question that assumption, you can identify an inefficiency -- not that the protocol as given is inefficient under the assumptions it reflects.

Moral of the story?  The good news is that right this second, there should be a way to improve Derecho performance for small messages, if the user is a tiny bit less worried about urgency and would like to enable a batching mode (we can make it a configurable feature).  But more broadly, you can see is that although Derecho lives in a world governed in part by theory, in the extreme performance range we target and with the various hardware constraints and properties we need to keep in mind, tiny decisions can sometimes shape performance to a far greater degree.

I happen to be a performance nut (and nobody stays in my group unless they share that quirk).  Now that we are aware of this SMC performance issue, which was actually called to our attention by Joe Israelevitz when he compared his Acuerdo protocol over RDMA with our Derecho one for 100B objects and beat us hands-down,  we'll certainly tackle it.  I've outlined one example of an optimization, but it will certainly turn out that there are others too, and I bet we'll end up with a nice paper on performance, and a substantial speedup, and maybe even some deep insights.  But they probably won't be insights about protocol complexity.  At the end of the day, Derecho may be quite a bit faster for some cases, and certainly this SMC one will be such a case.  Yet the asymptotic optimality of the protocol will not really have been impacted: the system is optimal in that sense today!  It just isn't as fast as it probably should be, at least for SMC messages sent in high-rate streams!

Wednesday, 26 June 2019

Whiteboard analysis: IoT Edge reactive path

One of my favorite papers is the one Jim Gray wrote with Pat Helland, Patrick O'Neil and Dennis Shasha, on the costs of replicating a database over a large set of servers, which they showed to be prohibitive if you don't fragment (shard) the database into smaller and independentally accessed portions: mini-databases.  In some sense, this paper gave us the modern cloud, because you can view Brewer's CAP conjecture and the eBay/Amazon BASE methodologies as both flowing from Gray's original insight.

Fundamentally, what Jim and his colleagues did was to undertake a whiteboard analysis of the scalability of concurrency control in an uncontrolled situation, where transactions are simply submitted to some big pool of servers, and then compete for locks in accordance with a two-phase locking model (one in which a transaction acquires all its locks before releasing any), and then terminates using a two-phase or three-phase commit.  They show that without some mechanism to prevent lock conflicts, there is a predictable and steadily increasing rate of lock conflicts leading to delay and even deadlock/rollback/retry.  The phenomenon causes overheads to rise as a polynomial in the number of servers over which you replicate the data, and quite sharply: I believe it was N^3 in the number of servers, and T^5 in the rate of transactions.  So your single replicated database will have a perform collapse.  With shards, using state machine replication (implemented using Derecho!) this isn't an issue, but of course we don't get the full SQL model at that point -- we end up with a form of NoSQL on the sharded database, similar to what MongoDB or Amazon's Dynamo DB offers.

Of course the "dangers" paper is iconic, but the techniques it uses are of broad value. And this was central to the way Jim approached problems: he was a huge fan in working out the critical paths and measuring costs along them.  In his cloud database setup, a bit of fancy mathematics let the group he was working with turn that sort of thinking into a scalability analysis that led to a foundational insight.  But even if you don't have an identical chance to change the world, it makes sense to try and follow a similar path.

This has had me thinking about paper-and-pencil analysis of the critical paths and potential consistency conflict points for large edge IoT deployments of the kind I described last week.  Right now, those paths are pretty messy, if you approach it this way.  Without an edge service, we would see something like this:

   IoT                             IoT         Function           Micro
Sensor  --------------->  Hub  ---> Server   ------> Service

In this example I am acting as if the function server "is" the function itself, and hiding the step in which the function server looks up the class of function that should handle this event, launches it (or perhaps had one waiting, warm-started), and then hands off the event data to the function for handling on one of its servers.  Had I included this handoff the image would be more like this:


   IoT                             IoT         Function        Function       Micro
Sensor  --------------->  Hub  ---> Server   ------>    F  -----> Service

F is "your function", coded in a language like C#, F# or C++ or Python, and then encapsulated into a container of some form.  You'll want to keep these programs very small and lightweight for speed.  In particular, a function is not the place to do any serious computing, or to try and store anything.  Real work occurs in the micro service, the one you built using Derecho.  Even so, this particular step looks costly to me: without warm-starting it, launching F could take a substantial fraction of a section.  And if F was warm-started, the context switch still involves some form of message passing, plus waking F up, and could still be many tens or even hundreds of milliseconds: an eternity at cloud speeds!

Even more concerning, many sensors can't connect directly to the cloud, and we end up cloning the architecture and running it twice: within an IoT Edge system (think of that as an operating system for a small NUMA machine or a cluster, running close to the sensors, and then relaying data to the main cloud if it can't handle the events out near the sensor device).

   IoT                            Edge      Edge Fcn                        IoT         Function              Micro
Sensor  --------------->  Hub  ---> Server -> F======>  Hub  ---> Server -> CF -> Service

Notice that now we have two user-supplied functions on the path.  The first one will have decided that the event can't be handled out at the edge, and forwarded the request to the cloud, probably via a message queuing layer that I haven't actually shown, but represented using a double-arrow: ===>.  This could have chosen to store the request and send it later, but with luck the link was up and it was passed to the cloud instantly, didn't need to sit in an arrival queue, and was instantly given to the cloud's IoT Hub, which in turn finally passed it to the cloud function server, the cloud function (CF) and the Micro Service.

The Micro Service may actually be a whole graph of mutually supporting Micro Services, each running on a pool of nodes, and each interacting with some of the others.  The cloud's "App Server" probably hosts these and provides elasticity if a backlog forms for one of them.

We also have the difficulty that many sensors capture images and videos.  These are initially stored on the device itself, which has substantial capacity but limited compute power.  The big issue is that the first link, from sensor to the edge hub, would often be bandwidth limited.  So we can't upload everything.  Very likely what travels from sensor to hub is just a thumbnail and other meta-data.  Then the edge function concludes that a download is needed (hopefully without too much delay), sends back a download request to the imaging device, and then the device moves the image to the cloud.

Moreover, there are industry standards for uploading photos and videos to a cloud, and those put the uploaded objects into the edge version of the blob store (short for "binary large objects"), which in turn is edge aware ands will mirror them to the main cloud blob store.  Thus we have a whole pathway from IoT sensor to the edge blob server, which will eventually generate another event later to tell us that the data is ready.  And as noted, for data that needs to reach the actual cloud and can't be processed at the edge, we replicate this path too, moving that image via the queuing service to the cloud.

So how long will all of this take?  Latencies are high and bandwidth low for the first hop, because sensors rarely have great connectivity, and almost never have the higher levels of power required for really fast data transfers (even with 5G).  So perhaps we will see a 10ms delay at that stop, plus more if the data is large.  Inside the edge we should have a NUMA machine or perhaps a small cluster, and can safely assume 10G connections with latencies of 10us or less, although of course software like TCP will often impose its own delays.  The big delay will probably be the handoff to the user-defined function, F.

My guess is that for an event that requires downloading a small photo, the very best performance will be something like 50ms before F sees the event (maybe even 100ms), then another 50-100 for F to request a download, then perhaps 200ms for the camera to upload the image to the blob server, and then a small delay (25ms?) for the blob server to trigger another event, F', saying "your image is ready!".  We're up near 350ms and haven't done any work at all yet!

Because the function server is limited to lightweight computing, it hands off to our micro-service (a quick handoff because the service is already running; the main delay will be the binding action by which the function connects to it, and perhaps this can be done off the critical path).  Call this 10ms?  And then the micro service can decide what to do with this image.

Add another 75ms or so if we have to forward the request to the cloud.  So the cloud might not be able to react to a photo in less than about 500ms, today.

None of this involved a Jim Gray kind of analysis of contention and backoff and retry.  If you took my advice and used Derecho for any data replication, the 500ms might be the end of the story.  But if you were to use a database solution like MongoDB (CosmosDB on Azure), it seems to me that you might easily see a further 250ms right there.

What should one do about these snowballing costs?  One answer is that many of the early IoT applications just won't care: if the goal is to just journal that "Ken entered Gates Hall at 10am on Tuesday", a 1s delay isn't a big deal.  But if the goal is to be reactive, we need to do a lot better.

I'm thinking that this is a great setting for various forms of shortcut datapaths, that could be set up after the first interaction and offer direct bypass options to move IoT events or data from the source directly to the real target.  Then with RDMA in the cloud, and Derecho used to build your micro service, the 500ms could drop to perhaps 25 or 30ms, depending on the image size, and even less if the photo can be fully handled on the IoT Edge server itself.

On the other hand, if you don't use Derecho but you do need consistency, you'll get into trouble quickly: with scale (lots of these pipelines all running concurrently), and contention, it is easy to see how you could trigger Jim's "naive replication" concerns.  So designers of smart highways had better beware: if they don't heed Jim's advice (and mine), by the time that smart highway warns that a car should "watch out for that reckless motorcycle approaching on your left!" it will already have zoomed past...   

These are exciting times to work in computer systems.  Of course a bit more funding wouldn't hurt, but we certainly will have our work cut out for us!

Saturday, 22 June 2019

Data everywhere but only a drop to drink...

One peculiarity of the IoT revolution is that it may explode the concept of big data.

The physical world is a domain of literally infinite data -- no matter how much we might hope to capture, at the very most we see only a tiny set of samples from an ocean of inaccessible information because we had no sensor in the proper place, or we didn't sample at the proper instant, or didn't have it pointing in the right direction or focused or ready to snap the photo, or we lacked bandwidth for the upload, or had no place to store the data and had to discard it, or misclassified it as "uninteresting" because the filters used to make those decisions weren't parameterized to sense the event the photo was showing.

Meanwhile, our data-hungry machine learning algorithms currently don't deal with the real world: they operate on snapshots, often ones collected ages ago.  The puzzle will be to find a way to somehow compute on this incredible ocean of currently-inaccessible data while the data is still valuable: a real-time constraint.  Time matters because in so many settings, conditions change extremely quickly (think of a smart highway, offering services to cars that are whizzing along at 85mph).

By computing at the back-end, AI/ML researchers have baked in very unrealistic assumptions, so that today's machine learning systems have become heavily skewed: they are very good at dealing with data acquired months ago and painstakingly tagged by an army of workers, and fairly good at using the resulting models to make decisions within a few tens of milliseconds, but in a sense consider the action of acquiring data and processing it in real-time to be part of the (offline) learning side of the game.  In fact many existing systems wouldn't even work if they couldn't iterate for minutes (or longer) on data sets, and many need that data to be preprocessed in various ways, perhaps cleaned up, perhaps preloaded and cached in memory, so that a hardware accelerator can rip through the needed operations.  If a smart highway were capturing data now that we would want to use to relearn vehicle trajectories so that we can react to changing conditions within fractions of a second, many aspects of this standard style of computing would have to change.

To me this points to a real problem for those intent on using machine learning everywhere and as soon as possible, but also a great research opportunity.  Database and machine learning researchers need to begin to explore a new kind of system in which the data available to us is understood to be a "skim" (I learned this term when I used to work with high performance computing teams in scientific computing settings where data was getting big decades ago.  For example the CERN particle accelerators capture far too much data to move data from the sensor, so even uploading "raw" data involves deciding which portions to keep, which to sample randomly, and which to completely ignore).

Beyond this issue of deciding what to include in the skim, there is the whole puzzle of supporting a dialog between the machine-learning infrastructure and the devices.  I mentioned examples in which one need to predict that a photo of such and such a thing would be valuable, anticipate the timing, point the camera in the proper direction, pre-focus it (perhaps, on an expected object that isn't yet in the field of view, so that the auto-focus wouldn't be useful because the thing we want to image hasn't yet arrived), plan the timing, capture the image, and then process it -- all under real-time pressure.

I've always been fascinated by the emergence of new computing areas.  To me this looks like one ripe for exploration.  It wouldn't surprise me at all to see an ACM Symposium on this topic, or an ACM Transactions journal.  Even at a glance one can see all the elements: a really interesting open problem that would lend itself to a theoretical formalization, but also one that will require substantial evolution of our platforms and computing systems.  The area is clearly of high real-world importance and offers a real opportunity for impact, and a chance to build products.  And it emerges at a juncture between systems and machine learning: a trending topic even now, so that this direction would play into gradually building momentum at the main funding agencies, which rarely can pivot on a dime, but are often good at following opportunities in a more incremental, thoughtful way.

The theoretical question would run roughly as follows.  Suppose that I have a machine-learning system that lacks knowledge required to perform some task (this could be a decision or classification, or might involve some other goal, such as finding a path from A to B).  The system has access to sensors, but there is a cost associated with using them (energy, repositioning, etc).  Finally, we have some metric for data value: a hypothesis concerning the data we are missing that tells us how useful a particular sensor input would be.  Then we can talk about the data to capture next that minimizes cost while maximizing value.  Given a solution to the one-shot problem, we would then want to explore the continuous version, where the new data changes these model elements, fixed-points for problems that are static, and quality of tracking for cases where the underlying data is evolving.

The practical systems-infrastructure and O/S questions center on the capabilities of the hardware and the limitations of today's Linux-based operating system infrastructure, particularly in combination with existing offloaded compute accelerators (FPGA, TPU, GPU, even RDMA).  Today's sensors run a gamut from really dumb fixed devices that don't even have storage to relatively smart sensors that can do various tasks on the device itself, have storage and some degree of intelligence about how to report data, etc.  Future sensors might go further, with the ability to download logic and machine-learned models for making such decisions: I think it is very likely that we could program a device to point the camera at such and such a lane on the freeway, wait for a white vehicle moving at high speed that should arrive in the period [T0,T1], obtain a well-focused photo showing the license plate and current driver, and then report the image capture accompanied by a thumbnail.  It might even be reasonable to talk about prefocusing, adjust the spectral parameters of the imaging system, selecting from a set of available lenses, etc.

Exploiting all of this will demand a new ecosystem that combines elements of machine learning on the cloud with elements of controlled logic on the sensing devices.  If one thinks about the way that we refactor software, here we seem to be looking at a larger-scale refactoring in which the machine learning platform on the cloud, with "infinite storage and compute" resources, has the role of running the compute-heavy portions of the task, but where the sensors and the other elements of the solution (things like camera motion control, dynamic focus, etc) would need to participate in a cooperative way.  Moreover, since we are dealing with entire IoT ecosystems, one has to visualize doing this at huge scale, with lots of sensors, lots of machine-learned models, and a shared infrastructure that imposes limits on communication bandwidth and latency, computing at the sensors, battery power, storage and so forth.

It would probably be wise to keep as much of the existing infrastructure as feasible.  So perhaps that smart highway will need to compute "typical patterns" of traffic flow over a long time period with today's methodologies (no time pressure there), current vehicle trajectories over mid-term time periods using methods that work within a few seconds, and then can deal with instantaneous context (a car suddenly swerves to avoid a rock that just fell from a dumptruck onto the lane) as an ultra-urgent real-time learning task that splits into the instantaneous part ("watch out!") and the longer-term parts ("warning: obstacle in the road 0.5miles ahead, left lane") or even longer ("at mile 22, northbound, left lane, anticipate roadway debris").  This kind of hierarchy of temporality is missing in today's machine learning systems, as far as I can tell, and the more urgent forms of learning and reaction will require new tools. Yet we can preserve a lot of existing technology as we tackle these new tasks.

Data is everywhere... and that isn't going to change.  It is about time that we tackle the challenge of building systems that can learn to discover context, and use current context to decide what to "look more closely" at, and with adequate time to carry out that task.  This is a broad puzzle with room for everyone -- in fact you can't even consider tackling it without teams that include systems people like me as well as machine learning and vision researchers.  What a great puzzle for the next generation of researchers!

Sunday, 12 May 2019

Redefining the IoT Edge

Edge computing has a dismal reputation.  Although continuing miniaturization of computing elements has made it possible to put small ARM processors pretty much anywhere, general purpose tasks don’t make much sense in the edge.  The most obvious reason is that no matter how powerful the processor could be, a mix of power, bandwidth and cost constraints argue against that model.

Beyond this, the interesting forms of machine learning and decision making can't possibly occur in an autonomous way.  An edge sensor will have the data it captures directly and any configuration we might have pushed to it last night, but very little real-time context:  if every sensor were trying to share its data with every other sensor that might be interested in that data, the resulting n^2 pattern would overwhelm even the beefiest ARM configuration.  Yet exchanging smaller data summaries implies that each device will run with different mixes of detail.

This creates a computing model constrained by hard theoretical bounds.  In papers written in the 1980's, Stoneybrook economics professor Pradeep Dubey studied the efficiency of game-theoretic  multiparty optimization.  His early results inspired follow-on research by  Berkeley's Elias Koutsoupias and Christos Papadimitriou, and by my colleagues here at Cornell, Tim Roughgarten and Eva Tardos.  The bottom line is unequivocal: there is a huge "price of anarchy.”  In an optimization system where parties independently work towards an optimal state using non-identical data, even when they can find a Nash optimal configuration, that state can be far from the global optimal.

As a distributed protocols person who builds systems, one obvious idea would be to explore more efficient data exchange protocols for the edge: systems in which the sensors iteratively exchange subsets of data in a smarter way, using consensus to agree on the data so that they are all computing against the same inputs.  There as been plenty of work on this, including some of mine.  But little of it has been adopted or even deployed experimentally.

The core problem is that communication constraints make direct sensor to sensor data exchange difficult and slow.  If a backlink to the cloud is available, it is almost always best to just use it.  But if you do, you end up with an IoT cloud model, where data first is uploaded to the cloud, then some computed result is pushed back to the devices.  The devices are no longer autonomously intelligent: they are basically peripherals of the cloud.

Optimization is at the heart of machine learning and artificial intelligence, and so all of these observations lead us towards a cloud-hosted model of IoT intelligence.  Other options, for example ones in which brilliant sensors are deployed to implement a decentralized intelligent system, might enable yield collective behavior but that behavior will be suboptimal, and perhaps even unstable (or chaotic).   I was once quite interested in swarm computing (it seemed like a natural outgrowth of gossip protocols, on which I was working at the time).   Today, I've come to doubt that robot swarms or self-organizing convoys of smart cars can work, and if they can, that the quality of their decision-making could compete against cloud-hosted solutions.

In fact the cloud has all sorts of magical superpowers that enable it to perform operations inaccessible to the IoT sensors.  Consider data fusion: with multiple overlapping cameras operated from different perspectives, we can reconstruct 3D scenes -- in effect, using the images to generate a 3D model and then painting the model with the captured data.  But to do this we need lots of parallel computing and heavy processing on GPU devices.  Even a swarm of brilliant sensors could never create such a fused scene given today’s communication and hardware options.

And yet, even though I believe in the remarkable power of the cloud, I'm also skeptical about an IoT model that presumes the sensors are dumb devices.   Devices like cameras actually possess remarkable powers too, ones that no central system can mimic.  For example, if preconfigured with some form of interest model, a smart sensor can classify images: data to  upload, data to retain but report only as a thumbnail with associated metadata, and data to discard outright.  A camera may be able to pivot so as to point the lens at an interesting location, or to focus in anticipation of some expected event, or to configure a multispectral image sensor.  It can decide when to snap the photo, and which of several candidate images to retain (many of today's cameras take multiple images and some even do so with different depths of field or different focal points).  Cameras can also do a wide range of on-device image preprocessing and compression.  If we overlook these specialized capabilities, we end up with a very dumb IoT edge and a cloud unable to compensate for its limitations.

The future, then, actually will demand a form of edge computing -- but one that will center on a partnership between the cloud (or perhaps a cloud edge running on a platform near the sensor, as with Azure IoT Edge), working in close concert with the attached sensors to dynamically configure them, perhaps reconfigure them as conditions change, and even to pass them knowledge models computed on the cloud that they can use on-camera (or radar, lidar, microphone) to improve the quality of information captured.  Each element has its unique capabilities and roles.

Even the IoT network is heading towards a more and more dynamic and reconfigurable model.  If one sensor captures a huge and extremely interesting object, while others have nothing notable to report, it may make sense to reconfigure the WiFi network to dedicate a maximum of resources to that one WiFi link.  Moments later, having pulled the video to the cloud edge, we might shift those same resources to a set of motion sensors that are watching an interesting pattern of activity, or to some other camera.

Perhaps we need a new term for this kind of edge computing, but my own instinct is to just coopt the existing term -- the bottom line is that the classic idea of edge computing hasn't really gone very far, and reviled or not, is best "known" to people who aren't even active in the field today.  The next generation of edge computing will be done by a new generation of researchers and product developers, and they might as well benefit from the name recognition -- I think they can brush off the negative associations fairly easily, given that edge computing never actually took off and then collapsed, or had any kind of extensive coverage in the commercial press.

The resulting research agenda is an exciting one.  We will need to develop models for computing that single globally optimal knowledge state, yet for also "compiling" elements of it to be executed remotely.  We'll need to understand how to treat physical-world actions like pivoting and focusing as elements of an otherwise Van Neuman computational framework, and to include the possibility of capturing new data side by side with the possibility of iterating a stochastic gradient descent one more time.  There are questions of long term knowledge (which we can compute on the back-end cloud using today's existing batched solutions), but also contextual knowledge that must be acquired on the fly, and then physical world "knowledge" such as a motion detection that might be used to trigger a camera to acquire an image.  The problem poses open questions at every level: the machine learning infrastructure, the systems infrastructure on which it runs, and the devices themselves -- not brilliant and autonomous, but not dumb either.  As the area matures and we gain some degree of standardization around platforms and approaches, the potential seems enormous!

So next time you teach a class on IoT and mention exciting ideas like smart highways that might sell access to high speed lanes or other services to drivers or semi-autonomous cars, pause to point out that this kind of setting is a perfect example of a future computing capability that will soon supplant past ideas of edge computing.  Teach your students to think of robotic actions like pivoting a camera, or focusing it, or even configuring it to select interesting images, as one facet of a rich and complex notion of edge computing that can take us into settings inaccessible to the classical cloud, and yet equally inaccessible even to the most brilliant of autonomous sensors.   Tell them about those theoretical insights: it is very hard to engineer around an impossibility proof, and if this implies that swarm computing simply won't be the winner, let them think about the implications.  You'll be helping them prepare to be leaders in tomorrow's big new thing!

Wednesday, 3 April 2019

The intractable complexity of machine-learned control systems for safety-critical settings.

As I read the reporting on the dual Boeing 737 Max air disasters, what I find worrying is that the plane seems to have depended on a very complicated set of mechanisms that interacted with each other, with the pilot, with the airplane flaps and wings, and with the environment in what might think of as a kind of exponentially large cross-product of potential situations and causal-sequence chains.  I hope that eventually we'll understand the technical failure that brought these planes down, but for me, the deeper story is already evident, and it concerns the limits on our ability to fully specify extremely complex cyber-physical systems, to fully characterize the environments in which they need to operate, to anticipate every plausible failure mode and the resultant behavior, and to certify that the resulting system won't trigger a calamity.   Complexity is the real enemy of assurance, and the failure to learn that lesson can result in huge loss of lives.

One would think that each event of this kind would be sobering and lead to a broad pushback against "over-automation" of safety-critical systems.  But there is a popular term that seemingly shuts down rational thinking: machine learning.

The emerging wave of self-driving cars will be immensely complex -- in many ways, even more than the Boeing aircraft, and also far more dependent upon a high quality of external information coming from systems external to the cars (from the cloud).  But whereas the public seems to perceive the Boeing flight control system as a "machine" that malfunctioned, and has been quick to affix blame, it doesn't seem as if accidents involving self-driving cars elicit a similar reaction: there have been several very worrying accidents by now, and several deaths, yet the press, the investment community and even the public appear to be enthralled.

This is amplified by ambiguity about how to regulate the area.  Although any car on the road is subject to safety reviews both by a federal organization called the NHTSA and by state regulators, the whole area of self-driving vehicles is very new.  As a result, these cars don't undergo anything like the government "red team" certification analysis required for planes before they are licensed to fly.  My sense is that because these cars are perceived as intelligent, they are somehow being treated differently from what we perceive as a more mechanical style of system when we think about critical systems on aircraft, quite possibly because machine intelligence brings such an extreme form of complexity that there actually isn't any meaningful way to fully model or verify their potential behavior.  Movie treatments of AI focus on themes like "transcendence" or "exponential self-evolution" and in doing so, they both highlight the fundamental issue here (namely, that we have created a technology we can't truly characterize or comprehend), while at the same time elevating it to human-like or even superhuman status.

Take a neural network: with even a few nodes, today's neural network models become mathematically intractable in the sense that although we do have mathematical theories that can describe their behavior, the actual instances would be too complex to model.  Thus one can definitely build such a network and experiment upon it, but it becomes impossible to make rigorous mathematical statements.  In effect, these systems are simply too complex to predict their behavior, other than by just running them and watching how they perform.  On the one hand, I suppose you can say this about human pilots and drivers too.  But on the other hand, this reinforces the point I just made above: the analogy is false, because a neural network is very different than an intelligent human mind, and when we draw that comparison we conflate two completely distinct control models.

With safety critical systems, the idea of adversarial certification, in which the developer proves the system safe while the certification authority poses harder and harder challenges, is well established.  Depending on the nature of the question, the developer may be expected to use mathematics, testing, simulation, forms of root-cause analysis, or other methods.  But once we begin to talk about systems that have unquantifiable behavior, and yet that may confront stimuli that could never have been dreamed up during testing,  we enter a domain that can't really be certified for safe operation in the way that an aircraft control system normally would be -- or at least, would have been in the past, since the Boeing disasters suggest that even aircraft control systems may have finally become overwhelmingly complex.

When we build systems that have neural networks at their core, for tasks like robotic vision or robotic driving, we enter an inherently uncertifiable world in which it just ceases to be possible to undertake a rigorous, adversarial, analysis of risks.

To make matters even worse, today's self-driving cars are designed in a highly secretive manner, tested by the vendors themselves, and are really being tested and trained in a single process that occurs out on the road, surrounded by normal human drivers, people walking their dogs or bicycling to work, children playing in parks and walking to school.  All this plays out even as the systems undergo the usual rhythm of bug identification and correction, frequent software patches and upgrades: a process in which the safety critical elements are continuously evolved even as the system is developed and tested.

The government regulators aren't being asked to certify instances of well-understood control technologies, as with planes, but are rather being asked to certify black boxes that in fact are nearly as opaque to their creators as to the government watchdog.  No matter what the question, the developer's response is invariably the same: "in our testing, we have never seen that problem."  Boeing reminds us to add the qualifier: "Yet."

The area is new for the NHTSA and the various state-level regulators, and I'm sure that they rationalize this inherent opacity by saying to themselves that over time, we will gradually develop  safety-certification  rules -- the idea that by its nature, these technologies may not permit a systematic certification seems not to have occurred to the government, or the public.  And yet a self-driving car is a 2-ton robot that can accelerate from 0 to 80 in 10 seconds.

You may be thinking that well, in both cases, the ultimate responsibility is with the human operator.  But in fact there are many reasons to doubt that a human can plausibly intervene in the event of a sudden problem: people simply aren't good at reacting in a fraction of a second.  In the case of the Boeing 737 Max, the pilots of the two doomed planes certainly weren't able to regain control, despite the fact that one of them apparently did disable the problematic system seconds into the flight.  Part of the problem relates to unintended consequences: apparently, Boeing recommends disabling the system by turning off an entire set of subsystems, and some of those are needed during takeoff, so the pilot was forced to reengage them, and with them, the anti-stall system reactivated. A second issue is just the lack of adequate time to achieve "affirmative control:"  People need time to formulate a plan when confronted with a complex crisis outside of their experience, and if that crisis is playing out very rapidly, may be so overwhelmed that even if a viable recovery plan is available they might fail to discover it.

I know the feeling.  Here in the frosty US north, it can sometimes happen that you find that your car is starting to skid on icy, snowy roads.  Over the years I've learned to deal with skids, but it takes practice.  The first times, all your instincts are wrong: in fact for an unexperienced driver faced with a skid, the safest reaction is to freeze.  The actual required sequence is to start by figuring out which direction the car is sliding in (and you have to do this while your car is rotating).  Then you should  steer towards that direction, no matter what it happens to be.  Your car should straighten out, at which point you can gently pump the brakes.  But all this takes time, and if you are skidding quickly, you'll be in a snowbank or a ditch before you manage to regain control.  In fact the best bet of all is to not skid in the first place, and after decades of experience, I never do.  But it takes training to reach this point.  How do we train the self-driving machine learning systems on these rare situations?  And keep in mind, every skid has its very own trigger.  The nature of the surface, the surroundings of the road, the weather, the slope or curve, other traffic -- all factor in.

Can machine learning systems powered by neural networks and other clever AI tools somehow magically solve such problems?

When I make this case to my colleagues who work in the area, they invariably respond that the statistics are great... and yes, as of today, anyone would have to acknowledge this point.  Google's Waymo has already driven a million miles without any accidents, and perhaps far more because that number has been out there for a while.

But then (sort of like with studies of new, expensive, medications) you run into the qualifiers.  It turns out that Google tests Waymo in places like Arizona, where roads are wide, temperatures can be high, and the number of people and pets and bicyclists would often be rather low (120F heat doesn't really make bicycling to work all that appealing).   They also carefully clean and tune their cars between each test drive, so the vehicles are in flawless shape.  The also benefit simply because so many human drivers shouldn't be behind the wheel in the first place: Waymo is never intoxicated, isn't likely to be distracted by music, phone calls, texting, arguments between the kids in the back.  It has some clear advantages even before it steers itself from the parking lot.

Yet there are easy retorts:
  • Let's face it: conditions in places like Phoenix or rural Florida are about as benign as can be imagined.  In any actual nationwide deployment, cars would need to cope with mud and road salt, misaligned components, power supply issues (did you know that chipmunks absolutely love to gnaw on battery wires?).  Moreover, these vehicles had professional drivers in the emergency backup role, and focused attentively on the road and the dashboard while being monitored by a second level of professionals with the specific role of reminding them to pay attention.  In a real deployment, the human operator might be reading the evening sports results while knocking back a beer or two and listening to the radio or texting a friend.  
  • Then we run into issues of roadwork that invalidates maps and lane markings, GPS signals are well known to bounce off buildings, resulting in echoes that can confuse a location sensor (if you have ever used Google Maps in a big city, you know what I mean).  Weather conditions can result in vehicle challenges never seen in Phoenix: blizzard conditions, flooded or icy road surfaces, counties that ran low on salt and money for plowing and left their little section of I87 unplowed in the blizzard, potholes hiding under puddles or in deep shadow, tires with uneven tread wear or that have gone out of balance, and the list is really endless.  On the major highways near New York, I've seen cars abandoned right in the middle lane, trashcans upended to warn drivers of missing manhole covers, and had all sorts of objects fly off flatbed trucks right in front of me: huge metal boxes, chunks of loose concrete or metal, a mattress, a refrigerator door...  This is the "real world" of driving, and self-driving cars will experience all of these things and more from the moment we turn them loose in the wild.
  • Regional driving styles vary widely too, and sometimes in ways that don't easily translate from place to place and that might never arise in Phoenix.  For example, teenagers who act out in New Jersey and New York are fond of weaving through traffic.  At very high speeds.  In Paris, this has become an entire new concept in which motorcyclists get to use the lanes "between" the lanes of cars as narrow, high-speed driving lanes (and they weave too).  But New Jersey has its version of a weird rule of the road too: on the main roads near Princeton, for some reason there is a kind of sport of not letting cars enter from, and even elderly drivers won't leave you more than a fraction of a second and a few inches to spare as you dive into the endless stream.  I'm a New Yorker, and can drive like a taxi driver there... well, any New York taxi driver worth his or her salary can confirm that this is a unique experience, a bit like a kind of high-speed automotive ballet (or if you prefer, like being a single fish in a school of fish).  The taxis flow down the NYC avenues at 50mph, trying to stay with the green lights, and flowing around obstacles in a strangely coordinated ways.  But New York isn't special. Over in Tel Aviv, drivers will switch lanes without a further through after glancing no more than 45 degrees to either side, and will casually pull in front of you leaving centimeters to spare.  Back in France, at the Arc de Triomphe and Place Victor Hugo, the roundabouts allow incoming traffic in priority over outgoing traffic... but only those two use this rule; in all the rest of Europe, the priority favors outgoing traffic (this makes a great example for teaching about deadlocks!)  And in Belgium, there are a remarkable number of unmarked intersections.  On those, the priority always allows the person entering from the right to cut in front of the person on his/her left even if the person from the right is crossing the street or turning, and even if the person on the left was on what seemed like the main road.  In Province, the roads are too narrow: everyone blasts down them at 70mph but also is quick to actually go off the road, with one tire on the grass, if someone approaches in the other direction.  If you didn't follow that rule... bang!  In New Dehli and Chennai, anything at all is accepted -- anything.  In rural Mexico, at least the last time I was there, the local drivers enjoyed terrifying the non-local ones (and I can just imagine how they would have treated robotic vehicles).   
And those are just environmental worries.  For me, the stranger part of the story is the complacency of the very same technology writers who are rushing to assign blame in the recent plane crashes.  This gets back to my use of the term "enthralled."  Somehow, for them, the mere fact that self-driving cars are "artificially intelligent" seems to blind technology reviewers to the evident reality: namely, that there are tasks that are far too difficult for today's machine learning solutions, and that they simply aren't up to the task of driving cars -- not even close!

What, precisely, is the state of the art?  Well, we happen to be wrapping up an exciting season of  faculty hiring focused on exactly these areas of machine learning.  In the past few weeks I've seen talks on vision systems that try to make sense of clutter or to anticipate what might be going on around a corner or behind some visual obstacle.  No surprise: the state of the art is rather primitive.  We've also heard about research on robotic motion just to solve basic tasks like finding a path from point A to point B in a complex environment, or ways to maneuver that won't startle humans in the vicinity.

Let me pause to just point out that if these basic tasks are considered to be cutting edge research, shouldn't it be obvious that the task of finding a safe path, in real-time (cars don't stop on a dime, you know), is actually not a solved one, either.  If we can't do it in a warehouse, how in the world have we talked ourselves into doing it on Phoenix city streets?

Self-driving cars center on deep neural networks for vision, and yet nobody quite understands how to relate the problem these devices solve to the real safety issue that cars confront.  Quite the opposite: neural networks for vision are known to act bizarrely for seemingly trivial reasons.  A neural network that is the world's best for interpreting photos can be completely thrown off by simply placing a toy elephant somewhere in the room.  A different neural network, that one a champ at making sense of roadway scenes, stops recognizing anything if you inject just a bit of random noise.  Just last night I read a report that Tesla cars can be tricked to veer into oncoming traffic if you put a few spots of white paint on the lane they are driving down, or if the lane marking to one side is fuzzy.  Tesla, of course, denies that this could ever occur in a real-world setting, and points out that they have never observed such an issue, not even once.

People will often tell you that even if the self-driving car concept never matures, at least it will spin some amazing technologies out.  I'll grant them this: the point is valid.

For example, Intel's Mobile Eye is a genuinely amazing little device that warns you if the cars up ahead of you suddenly brake.  I had it in a rental car recently and it definitely avoided a possible read-ender for me new Newark airport.  I was driving on the highway when a the wind blew a lot of garbage from a passing truck.  Everyone (me included) glanced in that direction, but someone up ahead must have also  slammed on the brakes.  A pileup was a real risk, but Mobile Eye made this weird squack (as if I was having a sudden close encounter with an angry duck), and vibrated the steering wheel, and it worked: I slowed down in time.

On the other hand, Mobile Eye also gets confused.  A few times it through I was drifting from my lane when actually, the lane markers themselves were just messy (some sort of old roadwork had left traces of temporary lane markings).  And at one point I noticed that it was watching speed limit signs, but was confused by the speed limits for exit-only lanes to my right, thinking that they also applied to the thru traffic lanes I was in.

Now think about this: if Mobile Eye gets confused, why should you assume that Waymo and Tesla and Uber self-driving cars never get confused?  All four use neural network vision systems.  This is a very fair question.

Another of my favorite spinouts is Hari Balakrishnan's startup in Boston.  His company is planning to monitor the quality of drivers: the person driving your car and perhaps those around your car too.  What a great idea!

My only worry is that if this were really to work well, could our society deal with the consequences?  Suppose that your head-up display somehow drew a red box around every dangerous car anywhere near you on the road.  On the positive side, now you would know which are were being driven by hormonal teenagers, which have drivers who are distracted by texting, which are piloted by drunk or stoned drivers, which have drivers with severe cataracts who can't actually see much of anything...

But on the negative side, I honestly don't know how we will react.  The fact is that we're surrounded by non-roadworthy cars, trucks carrying poorly secured loads of garbage,  and drivers who probably should be arrested!

Then it runs the other way, too.  If you are driving on a very poor road surface, you might be swerving to avoid the potholes or debris.  A hands-free phone conversation is perfectly legal, as is the use of Google Maps to find the address of that new dentist's office.  We wouldn't want to be "red boxed" and perhaps stopped by a state trooper for reasons like that.

So I do hope that Hari can put a dent in in road safety.  But I suspect that he and his company will need quite a bit of that $500M they just raised to pull it of.

So where am I going with all of this?  It comes down to an ethical question.  Right this second, the world is in strong agreement that Boeing's 737 Max is unsafe under some not-yet-fully-described condition.  Hundreds of innocent people have died because of that.  And don't assume that Airbus is somehow different -- John Rushby could tell you some pretty hair-raising stories about Airbus technology issues (their planes have been "fly by wire" for decades now, so they are not new to the kind of puzzle we've been discussing).  Perhaps you are thinking that well, at least Airbus hasn't killed anyone.  But is that really a coherent way to think about safety?

Self-driving cars may soon be carrying far more passengers under far more complex conditions than either of these brands of air craft.  And in fact, driving is a much harder job than flying a plane.  Our colleagues are creating these self-driving cars, and in my view, their solutions just aren't safe to launch onto our roads yet.  This generation of machine learning may simply not be up to the task.  Our entire approach to safety certification isn't yet ready to cope with the needed certification tasks.

When we agree to allow these things out on the road, that will include roads that you and your family will be driving on, too.  Should Hari's company put a red warning box around them, to help you stay far from them?  And they may be driving on your local city street too.  Your dog will be chasing sticks next to that street, your cats will out there doing whatever cats do, and your children will be learning to bicycle, all on those same shared roads.

There have already been too many  deaths.  Shouldn't our community be calling for this to stop, before far more people get hurt?