Wednesday, 20 May 2020

Contact Tracing Apps

The tension between privacy and the public interest is old, hence it is no surprise to see the question surface with respect to covid-19 contact-tracing apps.  

Proponents start by postulating that almost everyone has a mobile phone with Bluetooth capability.  In fact, not everyone has a mobile phone that can run apps (such devices are expensive).  Personal values bear on this too: even if an app had magical covid-prevention superpowers, not everyone would install it.  Indeed, not everyone would even be open to dialog.  

But let's set that thought to the side and just assume that in fact everyone has a suitable device and agrees to run the app.  Given this assumption, one can configure the phone to emit Bluetooth chirps within a 2m radius (achieved by limiting the signal power).  Chirps are just random numbers, not encrypted identifiers.  Each phone maintains a secure on-board record of chirps it generated, and those that it heard.  On this we can build a primitive contact-tracing structure.

Suppose that someone becomes ill.  The infected user would upload chirps the phone sent during the past 2 weeks into an anonymous database hosted by the health authority.  This step requires a permission code provided by the health authority, intended to block malicious users from undertaking a form of DDoS exploit.  In some proposals, the phone would also upload chirps it heard:  Bluetooth isn't perfect, hence "I heard you" could be useful as a form of redundancy.   The explicit permission step could be an issue: a person with a 104.6 fever who feels like she was hit by a cement truck might not be in shape to do much of anything.  But let's just move on.  

The next task is to inform people that they may have been exposed.  For this, we introduce a query mechanism.  At some frequency, each phone sends the database a filtering query encoding chirps it heard (for example, it could compute a Bloom Filter and pass it to the database as an argument to its query).  The database uses the filter to select chirps that might match ones the phone actually heard.  The filter doesn't need to be overly precise: we do want the response sent to the phone to include all the infected chirps, but it actually is desirable to include others that aren't ones the phone was researching.  Then, as a last step, the phone checks to see whether it actually did hear (or emit) any of these.  

Why did we want the database to always sent a non-empty list?  Well, if every response includes a set of chirps, the mere fact of a non-empty response reveals nothing.  Indeed, we might even pad the response to some constant size!

Next, assume that your phone discovers some actual matches.  It takes time in close proximity to become infected.  Thus, we would want to know whether there was just a brief exposure, as opposed to an extended period of contact.  A problematic contact might be something like this: "On Friday afternoon your phone detected a a close exposure over a ten minute period", meaning that it received positive chirps at a strong Bluetooth power level.  The time-constant and signal strength constant is a parameter, set using epidemic models that balance risk of infection against risk of false positives.

Finally, given a problematic contact your device would walk you through a process to decide if you need to get tested and self-quarantine.    This dialog is private: no public agency knows who you are, where you have been, or what chirps your device emitted or heard.  

Covid contact-tracing technology can easily result in false positives.  For example, perhaps a covid-positive person walked past your office a few times, but you kept the door closed...  the dialog might trigger and yet that isn't, by itself, determinative.   Moreover, things can go wrong in precisely the opposite way too.  Suppose that you were briefly at some sort of crowded event -- maybe the line waiting to enter the local grocery store.  Later you learn that in fact someone tested positive at this location... but the good news is that your app didn't pick anything up!  If you were certain that everyone was using a compatible app, that might genuinely tell you something.   But we already noted that the rate of use of an app like this might not be very high, and moreover, some people might sometimes disable it, or their phones might be in a place that blocks Bluetooth signals.  The absence of a notification conveys very little information.  Thus, the technology can also yield false negatives.

The kind of Covid contact tracing app I've described above is respectful of privacy.  Nobody can force you to use the app, and for all the reasons mentioned, it might not be active at a particular moment in time. Some of the apps won't even tell you when or where you were exposed or for how long, although at that extreme of protectiveness, you have to question whether the data is even useful.  And the government heath authority can't compel you to get tested, or to upload your chirps even if you do test positive.

But there are other apps that adopt more nuanced stances.  Suppose that your phone were to also track chirp signal power, GPS locations, and time (CovidSafe, created at University of Washington, has most of this information).  Now you might be told that you had a low-risk (low signal power) period of exposure  on the bus from B lot to Day Hall, but also had a short but close-proximity exposure when purchasing an expresso at the coffee bar.  The app would presumably help you decide if either of these crosses the risk threshold at which self-quarantine and testing is recommended.  On the other hand, to provide that type of nuanced advice, much more data is being collected.  Even if held in an encrypted form on the phone, there are reasons to ask at what point too much information is being captured.  After all, we all have seen endless reporting on situations in which highly sensitive data leaked or was even deliberately shared in ways contrary to stated policy and without permission.

Another issue now arises.  GPS isn't incredibly accurate, which matters because Covid is far more likely to spread with prolonged close exposure to an infectious person: a few meters makes a big difference (an especially big deal in cities, where reflections off surfaces can make GPS even less accurate -- which is a shame because a city is precisely the sort of place where you could have frequent but remote brief periods of proximity to Covid-positive individuals).  You would ideally want to know more.  And cities raise another big issue: GPS doesn't work inside buildings.  Would the entire 50-story building be treated as a single "place"?   If so, with chirps bouncing around in corridors and stairwells and atria, the rate of false positives would soar!

On campus we can do something to push back on this limitation.  One idea would be to try and improve indoor localization.  For example, imagine that we were to set up a proxy phone within spaces that the campus wants to track, like the Gimme! Coffee café in Gates Hall.  Then when so-and-so tests positive, the café itself learns that "it was exposed".  That notification could be useful to schedule a deep cleaning, and it would also enable the system to relay the risk notification, by listing the chirps that the café proxy phone emitted during the period from when the exposure occurred (on the theory that if you spend an hour at a table that was used by a covid positive person who was in the café twenty minutes, ago, that presumably creates a risk).   In effect, we would treat the space as an extension of the covid positive person who was in it, if they were there for long enough to contaminate it.

Similarly, a phone could be configured to listen for nearby WiFi signals.  With that information, the phone could "name" locations in terms of the MAC addresses it heard and their power levels.  Phone A could then warn that during a period when A's user was presumed infectious, there was a 90-minute period with 4-bars WiFi X and 2-bars WiFi Y, with WiFi Z flickering at a very low level.  One might hope that this defines a somewhat smaller space.    We could then create a concept of a WiFi signal strength distance metric, at which point phone B could discover problematic proximity to A.  This could work if the WiFi signals are reasonably steady and the triangulation is of high quality.  But WiFi devices vary their power levels depending on numbers of users and choice of channels, and some settings, like elevators, rapidly zip through a range of WiFi connectivity options...  Presumably there are research papers on such topics... 

Another idea I heard about recently was suggested by an avid FitBit user (the little app that encourages you to do a bit more walking each day).  Perhaps one could have a "social distancing score" for each user (indeed, if Fitbit devices can hear one-another, maybe Fitbit itself could compute such a score).  The score would indicate your degree of isolation, and your goal would be to have as normal a day as possible while driving that number down.  Notice that the score wouldn't be limited to contacts with Covid positive people.  Rather, it would simply measure the degree to which you are exposed to dense environments where spread is more likely to occur rapidly.  To do this, though, you really want to use more than just random numbers as your "chirp", because otherwise, a day spent at home with your family might look like a lot of contacts, and yet you all live together.  So the app would really want to count the number of distinct individuals with whom you have prolonged contacts.  A way to do this is for each device to stick to the same random number for a whole day, or at least for a few hours.  Yet such a step would also reduce anonymity... a problematic choice.

As you may be aware, Facebook owns Fitbit, and of course Facebook knows all about us.  This makes Facebook particularly qualified to correlate location and contact data with your social network, enabling it to build models of how the virus might spread if someone in your social group is ever exposed.  Doing so would enable various forms of proactive response.  For example, if a person is egregiously ignoring social distancing guidance, the public authorities could step in and urge that he or she change their evil ways.  If the social network were to have an exposure, we might be able to warn its members to "Stay clear of Sharon; she was exposed to Sally, and now she is at risk."  But these ideas, while cute, clearly have sharp edges that could easily become a genuine threat.  In particular, under the European GDPR (a legal framework for privacy protection), it might not even be legal to do research on such ideas, at least within the European union.  Here in the US, Facebook could certainly explore the options, but it would probably think twice before introducing products.

Indeed, once you begin to think about what an intrusive government or employer could do, you realize that there are already far too many options for tracking us, if a sufficiently large entity were so-inclined.  It would be easy to combine contact tracing from apps with other forms of contact data.  Most buildings these days use card-swipes to unlock doors and elevators, so that offers one source of rather precise location information.  It might be possible to track purchases at food kiosks that accept cards, and in settings where there are security cameras, it would even be possible to do image recognition...   There are people who already live their days in fear that this sort of big-brother scenario is a real thing, and in constant use.  Could covid contact tracing put substance behind their (at present, mostly unwarranted) worries?

Meanwhile, as it turns out, there is considerable debate within the medical community concerning exactly how Covid spreads.  Above, I commented that just knowing you were exposed is probably not enough.  Clearly, virus particles need to get from the infected person to the exposed one.   The problem is that while everyone agrees that direct interactions with a person actively shedding virus are highly risky, there is much less certainty about indirect interactions, like using the same table or taking the same bus.  If you follow the news, you'll know of documented cases in which covid spread fairly long distances through the air, from a person coughing at one table in a restaurant all the way around the room to people fairly far away, and you'll learn that covid can survive for long periods on some surfaces.   But nobody knows how frequent such cases really are, or how often they give rise to new infections.    Thus if we ratchet up our behavioral tracing technology, we potentially intrude on privacy without necessarily gaining a greater prevention.

When I've raised this point with people, a person I'm chatting with will often remark that "well, I don't have anything to hide, and I would be happy to take any protection this offers at all, even if the coverage isn't perfect."  This tendency to personalize the question is striking to me, and I tend to classify it along with the tendency to assume that everyone has equal technology capabilities, or similar politics and civic inclinations.  One sees this sort of mistaken generalization quite often, which is a surprise given the degree to which the public sphere has become polarized and political.  

Indeed, my own reaction is to worry that even if I myself don't see a risk to being traced in some way,  other people might have legitimate reasons to keep some sort of activity private.  And I don't necessarily mean illicit activities.  A person might simply want privacy to deal with a health issue or to avoid the risk of some kind of discrimination.  A person may need privacy to help a friend or family person deal with a crisis: but simply something that isn't suitable for a public space.  So yes, perhaps a few people do have nasty things to hide, but my own presumption tends to be that all of us sometimes have a need for privacy, and hence that all of us should respect one-another's needs without prying into the reasons.  We shouldn't impose a tracking regime on everyone unless the value is so huge that the harm the tracking system itself imposes is clearly small in comparison.

In Singapore, these contract-tracing apps were aggressively pushed by the government -- a government that at times has been notorious for repressing dissidents.  Apparently, this overly assertive rollout triggered a significant public rejection:  people were worried by the government's seeming bias in favor of monitoring and its seeming dismissal of the privacy risks, concluded that whatever other people might do, they themselves didn't want to be traced, and many rejected the app.  Others installed it (why rock the boat?), but then took the obvious, minor, steps needed to defeat it.  Such a sequence renders the technology pointless: a nuisance at best, an intrusion at worst, but infective as a legitimate covid-prevention tool.  In fact just last week (mid May) the UK had a debate about whether or not to include location tracking in their national app.  Even the debate itself seems to have reduced the public appetite for the app, and this seems to be true even though the UK ultimately leaned towards recommending a version that has no location tracing at all (and hence is especially weak, as such tools go).

I find this curious because, as you may know, the UK deployed a great many public video cameras back in the 1980's (a period when there was a lot of worry about street crimes together with high-visibility frequency terrorist threats).  Those cameras live on, and yet seem not to have limited value.  

When I spent a few months in Cambridge in 2016, I wasn't very conscious of them, but now and then something would remind me to actually look for the things, and they still seem to be ubiquitous.  Meanwhile, during that same visit, there was a rash of bicycle thefts and a small surge in drug-related street violence.  The cameras apparently had no real value in stopping such events, even though the mode of the bicycle thefts was highly visible: thieves were showing up with metal saws or acetylene torches, cutting through the 2-inch thick steel bike stand supports that the city installed during the last rash of thefts, and then reassembling the stands using metal rods and duct-tape, so that at a glance, they seemed to be intact.  Later a truck could pull up, they could simply pull the stand off its supports, load the bikes, and reassemble the stand.  

Considering quite how "visible" such things should be to a camera, one might expect that a CTV system should be able to prevent such flagrant crimes.  Yet they failed to do so during my visit.  This underscores the broader British worry that monitoring often fails in its stated purpose, yet leaves a lingering loss of privacy.  After all: the devices may not be foiling thefts, yet someone might still be using them for cyberstalking. We all know about web sites that aggregate open webcams, whether the people imaged know it or not.  Some of those sites even use security exploits to break into cameras that were nominally disabled.

There is no question that a genuinely comprehensive, successful, privacy-preserving Covid tracing solution could be valuable.  A recent report in the MIT technology review shows that if one could trace 90% of the contacts for each Covid-positive individual, the infection can be stopped in its tracks.  Clearly this is worthwhile if it can be done.  On the other hand, we've seen how many technical obstacles this statement raises.

And these are just technical dimensions.  The report I cited wasn't even focused on technology!  That study focused on human factors at scale, which already limit the odds of reaching the 90% level of coverage.  The reasons were mundane, but also seem hard to overcome.  Many people (myself included) don't answer the phone if a call seems like possible spam.  For quite a few, calls from the local health department probably have that look.  Some people wouldn't trust a random caller who claims to be a contact tracer.  Some people speak languages other than English and could have difficulty understanding the questions being posed, or recommendations.  Some distrust the government.  The list is long, and it isn't one on which "more technology" jumps out as the answer.  

Suppose that we set contact tracing per-se to the side.  Might there be other options worth exploring?  A different use of "interaction" information could be to just understand where transmission exposures are occurring, with the goal of dedensifying those spots, or perhaps using other forms of policy to reduce exposure events.  An analyst searching for those locations would need ways to carry out the stated task, yet we would also want to block him or her from learning irrelevant private information.  After all, if the goal is to show that a lot of exposure occurs at the Sunflower Dining Hall, it isn't necessary to also know that John and Mary have been meeting there daily for weeks.

This question centers on data mining with a sensitive database, and the task would probably need to occur on a big-data analytic platform (a cloud system).  As a specialist in cloud computing, I can point to many technical options for such a task.  For example, we could upload our oversight data into a platform running within an Intel SGX security enclave, with hardware-supported protection.  A person who legitimately can log into such a system (via HTTPS connections to it, for example) would be allowed to use the database for tasks like contact tracing, or to discover hot-spots on campus where a lot of risk occurs -- so this solution doesn't protect against a nosy researcher.  The good news is that unauthorized observers would learn nothing because all the data moved over the network is encrypted at all times, if you trust the software (but should we trust the software?)  

There are lots of other options.  You could also upload the data in an encrypted form, and perhaps query it without decrypting it, or perhaps even carry out the analysis using a fully homomorphic data access scheme.  You can create a database but inject noise into the query results, concealing individual data (this is called the differential privacy query model).  

On the other hand, the most secure solutions are actually the least widely used.  Fully homomorphic computing and Intel SGX, for example, are viewed as too costly.  Few cloud systems deploy SGX tools; there are a variety of reasons, but the main one is just that SGX requires a whole specialized "ecosystem" and we lack this.  More common is to simply trust the cloud (and maybe even the people who built and operate it), and then use encryption to form a virtually private enclave within which the work would be done using standard tools: the very same spreadsheets and databases and machine-learning tools any of us use when trying to make sense of large data sets.

But this all leads back to the same core question.  If we are go down this path, and explore a series of increasingly aggressive steps to collect data and analyze it, to what degree is all of that activity measurably improving public safety?  I mentioned the MIT study because at least it has a numerical goal: for contact tracing, a 90% level of coverage is effective; below 90% we rapidly lose impact.  But we've touched upon a great many other ideas... so many that it wouldn't be plausible to do a comprehensive study of the most effective place to live on the resulting spectrum of options.

The ultimate choice is one that pits an unquantifiable form of covid-safety tracing against the specter of intrusive oversight that potentially violates individual privacy rights without necessarily bringing meaningful value.   On the positive side, even a panacea might reassure a public nearly panicked over this virus, by sending the message that "we are doing everything humanly possible, and we regret any inconvenience."  Oddly, I'm told, the inconvenience is somehow a plus in such situations.  The mix of reassurance with some form of individual "impact" can be valuable: it provides an outlet and focus for anger and this reduces the threat that some unbalanced individual might lash out in a harmful way. Still, even when deploying a panacea, there needs to be some form of cost-benefit analysis!

Where, then, is the magic balancing point for Covid contact tracing?  I can't speak for my employer, but I'll share my own personal opinion.  I have no issue with installing CovidSafe on my phone, and I would probably be a good citizen and leave it running if doing so doesn't kill my battery.  Moreover, I would actually want to know if someone who later tested positive spent an hour at the some table where I sat down not longer afterwards.  But I'm under no illusion that covid contact tracing is really going to be solved with technology.  The MIT study has it right: this is simply a very hard and very human task, and we delude ourselves to imagine that a phone app could somehow magically whisk it away.

Sunday, 12 April 2020

Does the universe use irrational numbers?

I've been increasingly drawn to an area that I always thought of as being pretty much as esoteric as they come: Brouwer's theory of "intuitionist" algebra.  Although this has very little to do with distributed computing, it seems so interesting that I thought I might share it here.

If you've ever read the wonderful illustrated historical novel by Christos Papadimitrou, Logicomix, then you probably know that logic has struggled to deal with the kind of mathematics we learn in high school and that also underlies classical physics.  Logicomix is presented a bit like a comic book, but you'll find it serious in tone -- hence more or less a European "bande dessinée", meaning "illustrated story".    I highly recommend it.

Christos describes the history leading up to the introduction of computer science.  The early years of this history centered on logic, algebra, calculus and insights into what Wigner much more recently characterized as the "unreasonable effectiveness of mathematics in the natural sciences".  Christos centers his tale on the giants of mathematical logic and their struggle to deal with ideas like completeness, undecidability, high-order statements, and notions like empty sets, or infinity ones.  Logic appealed to its early practitioners for its elegance and clarity, yet turned out to be exceptionally hard to extend so that it could apply to real world mathematical and physical questions.

The early period of logic coincided with a tremendous revolution in physics.  Thus even as generation after generation of logicians lost their minds, abandoned mathematics to join monasteries, and generally flaked out, physics proved the existence of molecules, developed theories explaining the behavior of light and electromagnetic fields, uncovered the structure of the atom, discovered radiation and explained the phenomenon.  Quantum mechanics was discovered, and general relativity.  This led to a situation in which physicists (and many mathematicians) became dismissive of logic, which seemed mired in the past, stuck over obscure absurdities and increasingly incapable of expressing the questions that matter.

Yet the rapid advance of physics relative to logic has left a worrying legacy.   For example, have you ever seen Ramanujan's elegant little proof that the infinite sum of 1+2+3+4... is equal to -1/12?  He did this when trying to show that there was an inconsistency in the definitions he was using to study Euler's zeta function (an important tool in physics).  In effect, Ramanujan was concerned that when working algebraically you run a risk of serious errors when using algebraic variables to represent infinite quantities or even divergent infinite series.  Yet, this occurs all the time: physical theories make constant use of infinity, pi, e, α etc.

In the early 1900's, the Bourbaki consortium of mathematicians tried to save the day: They created a grand project to axiomatize all of arithmetic (but without treating logic in any special way).  One hundred years and many volumes later, their axiomatic work is still far from completion.

Brouwer, an early 20'th century logician, proposed a different approach: he offered a mix of theorems and conjectures that added up to an area he called "intuitionist mathematics", in which the goal was to rederive all of algebra and calculus from the ground up using a logically sound algebra: a fully self-consistent description that we could understand as a higher order logic over rational numbers.  Brouwer made huge strives, but his project, like the Bourbaki version, was unfinished when he passed away.  The big unfinished question centered on how to treat irrational and values.  But this open question is now seemingly being answered!

My colleague Bob Constable leads a group called the NuPRL project.  Bob is a close friend and we have done some work together: many years ago, he and Robbert van Renesse met with Nancy Lynch and at her urging, decided to see if we could formalize the virtual synchrony model of distributed computing using NuPRL.  I won't discuss that work here, except to say that it was remarkably successful, and ended up also including proofs of versions of Paxos and even a constructive version of the famous Fischer, Lynch and Patteson impossibility result, for fault-tolerant consensus.  But I do want to tell you a little about NuPRL itself and how it differs from other logic tools you may already be familiar with, like the ones used for program verification -- NuPRL has a grander goal.

The system implements a "program refinement logic", meaning that it establishes a formal correspondence between logical statements and executable programs.  In NuPRL when you prove that for every integer x there exists an integer y such that x < y,  you can also extra a function that will carry out your proof, generating this larger value y from a given x (for example, by adding 1, or whatever technique you used in your proof).

More broadly, in NuPRL every proof is automatically checked for correctness, and the semantics of any provable statement map to a piece of code (in O'CaML, which can then cross-compile to C or C++) that will perform the computation the proof describes.

As a fancier example, back when the 4-coloring theorem for planar graphs was first proved, the proof itself didn't tell us how to generate a coloring.  It simply showed that if a certain sequence of "embedded subgraphs" are 4-colorable, then any graph can be 4-colored.  But had that proof been carried out in NuPRL, the proof would have come with an executable artifact: a program that 4-colors any planar graph.

Central to NuPRL are two aspects.  First, the system never permits a proof by contradiction, which is important because the logic it uses is strong enough to encode undecidable statements.  Thus if we have some formula F and prove that F cannot be true, we cannot safely conclude that F is false: it might also be undecidable.  Bob has sometimes pointed out that there are proof tools out there today that overlook this issue, and hence are fully capable of verifying incorrect proofs.

Second, NuPRL is a higher order logic.  Thus if we were to write down the famous inconsistent statement defining S to be "the set of all sets that are not members of themselves", we would find that in NuPRL, the statement isn't inconsistent at all.  If expressed in a first order logic, we wouldn't be able to describe a set of sets.  If expressed in a higher order logic, we couldn't write this statement, the type of S (the new higher-order set) is distinct from the type of its elements (its lower-order members).    This prevents the NuPRL user from expressing an ill-formed assertion.

The exciting new developing is recent: over the past few years, the NuPRL project has tackled Brouwer's theory with tremendous results.  Bob and his main coworker, a remarkable "constructive logician" named Mark Bickford, recently proved a series of open conjectures that Brouwer had left on the table at the time of his death, and even more recently gave a talk at a major European logic and PL conference on the rapid progress of their broader effort, which by now seeks to complete Brouwer's agenda and fully axiomatize algebra and calculus.  Theorems like the chain theorem and the central limit theorem turn out to have elegant much shorter proofs in NuPRL, because by defining them over a logically sound foundation, Bob and Mark don't need to state all sorts of assumptions that one finds in any standard calculus textbook: the definitions used for continuous functions and algebraic objects turn out to carry all the needed baggage, and don't need to be separately restated.  They are in the middle of writing a book on the work, but some papers can already be found on the NuPRL website at Cornell.

The fascinating aspect of Brouwer's approach centers on the elimination of irrational numbers and transcendental constants.  In effect, Brouwer argues that rather than think of real numbers as infinite sequences of digits, we would do better to focus on algebraic definitions.  For irrational numbers, this entails representing the value as some form of polynomial equation.   pi and other transcendental numbers cannot be so expressed, but they can be expressed using a formal notion of convergence that defines them as fixed points of Taylor series approximations or other similar recursive expressions.   Brouwer's agenda was to show that this algebraic approach can accomplish everything we do in normal calculus, including such puzzles as continuity (which defeated those logicians of the 18th and 19th century).  Moreover, we avoid the inconsistencies that doing so can introduce -- the same ones that puzzled and frustrated Ramanujan!  The NuPRL twist is that an infinite sum, such as the one Ramanujan looked at, is a higher-order logic, like our set S was earlier.  One can define it, and use it in equations, but the type-checking rules don't allow such an object to be "misused" as if it was a first-order construction.   In Ramanujan's original letter observing the inconsistency of Euler's definitions, that sort of abuse of notation was at the root of the issue.

So, why am I sharing all of this?  Well, in addition to being a fan of NuPRL, I've always been fascinated by work on the foundations of quantum mechanics (I was a physics undergraduate, before abandoning the area to switch into computer science), and I recently found myself reading an article about a new movement to redefine physics using Brouwer's intuitionism.  So here we have a movement within the physics community to fix these inconsistencies in the mathematical formulation of physics laws side by side with a huge advance in the needed intuitionistic mathematical community, offering them just the tools they will need!  (Not to mention, a pretty nifty theorem prover they can work with).

Why are the physicists suddenly worried about inconsistencies?  One reason is that with the advent of quantum computing, physics have become fascinated with the computable, and with an "information" based understanding of the most basic levels of the universe itself.  A natural question this raises is whether it makes sense to presume that the universal laws that define reality really entail doing mathematics on irrational numbers, computing with transcendental constants and so forth.

Part of the agenda is to try and fix some of the issues seen when quantum mechanics and general relativity are merged into a single theory.  This new intuitionistic community is arguing that the inconsistencies seen in previous efforts to carry out such a merger are very similar to the issue Ramanujan noted in Euler's work.  Indeed, there are even physical theories in which the "fact" that the infinite sum mentioned earlier is equal to -1/12 seems to play a role.  They aren't claiming such work is wrong, but simply that it is strong evidence that sloppy mathematics may be limiting progress in physics.

And the cool thing is that there is even experimental support for their view.  For example, one recent study pointed out that the equations of motion for 3 non-rotating black holes are chaotic to such a degree that no rational approximation of their positions and trajectories can fully predict their future paths (or backtrack to their past).  That particular piece of research was focused on a possible explanation for the arrow of time: it suggests that this observation can help explain why time flows from past to future.

In an intuitionist model of physics, expressed over a granular space-time in which distances smaller than the Planck metric have no meaning, all sorts of puzzles vanish.  For example, in quantum mechanics, one sees fields that seem to be arbitrarily small.  We hear about the idea that a photon might be here in Ithaca... but with some low probability, might actually really be somewhere out near Alpha Centauri, light years away.  In an intuitionistic formulation, particularly one in a quantized model, probabilities shrink to zero -- not to a small value, but to zero.  Locations become pixelated, just like in any graphics system.  And because we no longer need to worry about the "effect" of photons over in Alpha Centauri on events right here and now in Ithaca, the universal computation (the one the universe itself does, to compute its own next state from its current state) requires only a bounded calculation, not an infinite one.

Hopefully, I've interested you in this enough to start a small reading project on the topic... I find it fascinating.  Wouldn't it be amazing if 2020 turns out to be the year when computer scientists at Cornell -- logicians, in fact -- helped the physics community put this most basic of sciences on a sound footing that could eliminate all those pesky inconsistencies and absurdities?  What a great step that would be, in a story that drove generations of mathematicians and logicians out of their minds...







Friday, 20 March 2020

A new kind of IoT cloud for distributed AI

Like much of the world, Ithaca has pivoted sharply towards self-isolation while the coronavirus situation plays out.  But work continues, and I thought I might share some thoughts on a topic I've been exploring with my good friend Ashutosh Saxena at Caspar.ai (full disclosure: when Ashutosh launched the company, I backed him, so I'm definitely not unbiased; even so, this isn't intended as advertising for him).

The question we've been talking about centers on the proper way to create a privacy-preserving IoT infrastructure.  Ashutosh has begun to argue that the problem forces you to think in terms of hierarchical scopes: data that lives within scopes, and is only exported in aggregated forms that preserve anonymity.  He also favors distributed AI computations, again scoped in a privacy-preserving manner.  The Caspar platform, which employs this approach, is really an IoT edge operating system, and the entire structure is hierarchical, which is something I haven't seen previously (perhaps I shouldn't be surprised, because David Cheriton, a leading operating systems researcher, has been very actively involved in the design).

Ashutosh reached this perspective after years of work on robotics.    The connection to hierarchy arises because a robot often has distinct subsystems:  when designing a robotic algorithm one trains machine-learned models to solve subtasks such as motion planning, gripping objects, or reorienting a camera to get a better perspective on a scene.  This makes it natural for Ashutosh to view smart homes, multi-building developments and cities as scaled up instances of that same robotic model.

Interestingly, this hierarchical perspective is a significant departure from today's norm in smart home technologies.  Perhaps because the cloud itself hasn't favored edge computing, especially for ML, there has been a tendency to think of smart homes and similar structures as a single big infrastructure with lots of sensors, lots of data flowing in, and then some form of scalable big-data analytic platform like Spark/Databricks on which you train your models and run inference tasks, perhaps in huge batches.  Without question, this how most AI solutions work today: Google maps, Facebook's TAO social networking infrastructure, etc.

The relevance is that Google is doing this computation using a scalable system that runs on very large data repositories in an offline warehouse environment.  This warehouse creates both temptation and reward: you created the huge data warehouse to solve your primary problem, but now it becomes almost irresistible to train ad placement models on the data.  If you make your money on ads, you might even convince yourself that you haven't violated privacy, because (after all), a model is just a parameterized equation that lumps everyone together.   This rewards you, because training advertising models on larger data sets is known to improve advertising revenues.  On the other hand, data mining potentially can directly violate privacy or user intent, and even a machine-learned model could potentially reveal unauthorized information.

Ashutosh believes that any data-warehousing solution is problematic if privacy is a concern.  But he also believes that data-warehousing and centralized cloud computations miss a larger opportunity: that the quality of the local action can be washed out by "noise" coming from the massive size of the data set, and that overcoming this noise will require an amount of computation that rises to an unacceptable level.  Hence, he argues, you eventually end up with privacy violations, an unsurmountable computational barrier, and a poorly trained local model.  But on the other hand, you've gained higher ad revenues and perhaps for this reason, might be inclined to shrug off the loss of accuracy and "contextualization quality", by which I mean the ability to give the correct individualized response to a query "in the local context" of the resident who issued the query.

We shouldn't blindly accept such a claim.  What would be the best count-argument?  I think the most obvious pushback is this: when we mine a massive data warehouse in the cloud, we don't often treat the whole data set as part of some single model (sometimes we do, but that isn't somehow a baked-in obligation).  More often we view our big warehouse as co-hosting millions of separate problem instances, sharded over one big data store but still "independent".  Then we run a batched computation: millions of somewhat independent sub-computations.  We gain huge efficiencies by running these in a single batched run, but the actual subtasks are separate things that execute in parallel.

What I've outlined isn't the only option: one actually could create increasingly aggregated models, and this occurs all the time: we can extract phonemes from a million different voice snippets, then repeatedly group them and process them, ultimately arriving at a single voice-understanding model that covers all the different regional accents and unique pronunciations.  That style of computation yields one speech model at the end, rather than a million distinct ones each trained for a million distinct accents (what I called "local" models, above).

Ashutosh is well aware of this, and offers two responses.  First, he points to the issue of needing to take actions or even dynamically learn in real-time.  The problem is that to create a giant batch of a million sub-computation, with tasks that trickle in, you would often need to delay some tasks for quite a long time.   But if the task is to understand a voice command, that delay would be intolerable.  And if you try to classify the request using a model you built yesterday, when conditions differed, you might not properly contextualize the command.

In a perspective focused primarily on computational efficiencies, one needs to also note that doing things one by one is costly: a big batch computation will amortize over the huge number of parallel sub-tasks.  But in the smart home, we have computing capability close to the end user in any case, if we are willing to use it.  So this argues that we should put the computation closer to the source of the data for real-time reasons, and in the process, will gain localization in a natural way.  Contextualized queries fall right out.  Then, because we never put all our most sensitive and private data in one big warehouse, we simultaneously have saved ourselves a huge and irresistible temptation that no ad-revenue driven company is likely to resist for very long.

The distributed AI (D-AI) community, with which Ashutosh identifies himself, adopts the view that a smart home is best understood as a community of expert systems.  You might have an AI trained to operate a smart lightswitch... it learns the gestures you use for various lighting tasks.  Some other AI is an expert on water consumption in your home and will warn if you seem to have forgotten that the shower is running.  Yet another is an expert specific to your stove and will know if dinner starts burning...

For Ashutosh, with his background in robotics, this perspective leads to the view that we need a way to compose experts into cooperative assemblies: groups of varying sizes that come together to solve tasks.  Caspar does so by forming a graph of AI components, which share information but also can hold and "firewall" information.  Within this graph, components can exchange aggregated information but only in accordance with a sharing policy.  We end up with a hierarchy in which very sensitive data is held as close to the IoT device where it was captured as possible, with only increasing aggregated and less sensitive summaries rising through the hierarchy.  Thus at the layer where one might do smart power management for a small community, controlling solar panels and wall batteries and even coordinating HVAC and hot water heaters to ramp power consumption up, or ease it off, the AI element responsible for those tasks has no direct way to tap into the details of your home power use, which can reveal all sorts of sensitive and private information.

I don't want to leave the impression that privacy comes for free in a D-AI approach.  First, any given level has less information, and this could mean that it has less inference power in some situations.  For example, if some single home is a huge water user during a drought, the aggregated picture of water consumption in that home's community could easily mask the abusive behavior.  A D-AI system that aggregates must miss the issue; one that builds a data warehouse would easily flag that home as a "top ten abuser" and could dispatch the authorities.

Moreover, D-AI is more of a conceptual tool than a fully fleshed out implementation option.  Even in Caspar's hierarchical operating system, it is best to view the system as a partner, working with a D-AI component that desires protection for certain data even as it explicitly shares other data: we don't yet know how to specify data flow policies and how to tag aggregates in such a way that we could automatically enforce the desired rules.  On the other hand, we definitely can "assist" a D-AI system that has an honest need for sharing and simply wants help to protect against accidental leakage, and this is how the Caspar platform actually works.

Ashutosh argues that D-AI makes sense for a great many reasons.  One is rather mathematical: he shows that if you look at the time and power complexity of training a D-AI system (which comes down to separately training its AI elements), the costs scale.  For a single big AI, those same training costs soar as the use case gets larger and larger.  So if you want a fine-grained form of AI knowledge, a D-AI model is appealing.

The Caspar IoT cloud, as a result, isn't a centralized cloud like a standard data warehouse might use.  In fact it has a hierarchical and distributed form too: it can launch a D-AI compute element, or even an "app" created by Caspar.ai's team or by a third party, in the proper context for the task it performs, blocking it from accessing data that isn't authorized to it.  Processing nodes can then be placed close to the devices (improving real-time responsiveness), and we can associate different data flow policies at each level of the hierarchy, so that higher-level systems have increasingly less detailed knowledge from the remote and more sensitive IoT manager systems that might know far more, but only for a specific reason, such as to better understand voice commands in a particular part of the home: a "contextualized" but more sensitive task.

Then one can carry all of this even further.  We can have systems that are permitted to break the rules, but only in emergencies: if a fire is detected in the complex, or perhaps a wildfire is active in the area, we can switch to a mode in which a secondary, side-by-side hierarchy activates and is authorized to report to the first responders: " there are two people in the A section of the complex, one in unit A-3 and one in unit A-9.  In unit A-3 the resident's name is Sally Adams and she is in the northeast bedroom..."  All of this is information a standard smart home system would have sent to the cloud, so this isn't a capability unique to Caspar.  But the idea of having an architecture that localizes this kind of data unless it is actually needed for an emergency is appealing: it removes the huge incentive that cloud providers currently confront, in which by mining your most private data they can gain monetizable insights.

In the full D-AI perspective, Caspar has many of these side-by-side hierarchies.  As one instantiates such a system over a great many homes, then communities, then cities, and specializes different hierarchies for different roles, we arrive at a completely new form of IoT cloud.  For me, as an OS researcher, I find this whole idea fascinating, and I've been urging Ashutosh and Dave to write papers about the underlying technical problems and solutions (after all, before both became full time entrepreneurs, both were full time researchers!)

We tend to think of the cloud in a centralized way, even though we know that any big cloud operator has many datacenters and treats the global deployment as a kind of hierarchy: availability zones with three data centers each, interconnected into a global graph, with some datacenters having special roles: IoT edge systems, Facebook point-of-presence systems (used heavily for photo resizing and caching), bigger ones that do the heavy lifting.  So here, with D-AI, we suddenly see new forms of hierarchy.  The appeal is that whereas the traditional model simply streams all the data upwards towards the core, this D-AI approach aggregates at what we could call the leaves, and sends only summaries (perhaps even noised to achieve differential privacy, if needed) towards the larger data warehouse platforms.

So how does this play out in practice?  It seems a bit early to say that Caspar has cracked the privacy puzzle, which is the event that could make smart homes far more palatable for most of us.   On the other hand, as the distributed IoT cloud's protection barriers grow more sophisticated over time, one could believe that it might ultimately become quite robust even if some apps are maliciously trying to subvert the rules (if we look at Apple's iPhone and iPad apps, or the ones on Google's Android, this is definitely the trend).  Meanwhile, even if privacy is just one of our goals, the D-AI concept definitely offers contextualization and localization that enables real-time responsiveness of an exciting kind.  The Caspar platform is actually up and running, used in various kinds of real-estate developments worldwide.  Their strongest uptake has been in residential communities for the elderly: having a system that can help with small tasks (even watching the pets!) seems to be popular in any case, but especially popular in groups of people who need a little help now and then, yet want to preserve their autonomy.

Friday, 21 February 2020

Quantum crypto: Caveat emptor...

There is a great deal of buzz around the idea that with quantum cryptographic network links, we can shift to eavesdropping-proof communication that would be secure against every known form of attack.  The catch?  There is a serious risk of being tricked into believing that a totally insecure network link is a quantum cryptographic one.  In fact, it may be much easier and cheaper to build and market a fake quantum link than to create a real one!  Worse, the user probably wouldn't even be able to tell the difference.  You could eavesdrop on a naive user or her easily if you built one of these fakes and managed to sell it.  What's not to love, if you are hoping to steal secrets?

So, first things first.  How does quantum cryptography actually work, and why is it secure?  A good place to start is to think about a random pad: this is a source of random bits that is created in a paired read-once form.  You and your friend each have one copy of the identical pad.  For each message, you tear off one "sheet" of random bits, and use the random bits as the basis of a coding scheme.

For example, the current sheet could be used as a key for a fast stream cryptographic protocol.  You would use it for a little while (perhaps even for just one message), then switch to the next sheet, which serves as the next key.  Even if an attacker somehow was able to figure out what key was used for one message, that information wouldn't help for the next message.

This is basically how quantum cryptography works, too.  We have some source of  entangled photons, and a device that can measure polarization, or "spin".  Say that up is 1 and down is 0.  In principle, you'll see a completely random sequences of 0/1 bits, just like one sheet of a random pad.

Because the photons are entangled, even though the property itself is random, if we measure this same property for both of the entangled photons, we obtain the same bit sequence.

Thus if we generate entangle photons, sending one member of the pair to one endpoint and the other photon to the other endpoint, we've created a quantum one-time pad.  Notice that no information is actually being communicated.  In some sense, the photons do not carry information-per se, and can't be forced to do so.  The actual bits will be random, but because the photons are entangled, we are able to leverage the correlation to read exactly two copies out, one copy at each endpoint.  Then we can use this to obscure our messages (a classical method is used to authenticate the parties at each end, such as with RSA-based public and private keys).

Quantum cryptography of this form is suddenly being discussed very widely in the media, and there are more and more companies willing to sell you these cables, together with the hardware to generate entangled photons and to read out the binary bit strings using measurements on the entangled photon pairs. So why shouldn't everyone leap up this very moment and rush down to Home Depot to buy one?

To see the issue, think back to the VW emissions scandal from 2015.  It turned out that from 2011 to 2015, the company was selling high-emission engines that had a way to sense when they were being tested.  In those periods, they would switch to a less economical (but very clean) mode of operations  This would fool the department of motor vehicles, after which the car could revert to its evil, dirty ways.

Suppose the same mindset was adopted by a quantum cable vendor.  For the non-tested case, instead of entangling photons the company could generate a pseudo-random sequence of perfectly correlated unentangled ones.  For example, it could just generate lots of photons and filter out the ones with an unwanted polarization.  The two endpoint receivers measure polarization and see the same bits.  This leads them to think they share a secret one-time pad... but in fact the vendor of the cable not only knows the bit sequence but selected it!

To understand why this would be viable, it helps to realize that today's optical communication hardware already encodes data using properties like the polarization or spin of photons.  So the hardware actually exists, and it even runs at high data rates!  Yet  the quantum cable vendor will know exactly what a user will measure at the endpoints.

How does this compare to the quantum version?  In a true quantum crytographic network link, the vendor hardware generates entanged data in a superposition state.  Now, this is actually tricky to achieve (superpositions are hard to maintain).  As a result, the vendor can predict that both endpoints will see correlated data, but because some photons will decorrelate in transmission, there will also be some quantum noise.  (A careful fake could mimic this too, simply by computing the statistical properties of the hardware and then deliberately transmitting different data in each direction now and then).

So as a consumer, how would you test a device to unmash this sort of nefarious behavior?

The only way that a skeptic can test a quantum communication device is by running what is called a Bell's Inequality experiment.  With Bell's, the skeptic runs the vendor's cable, but then makes a random measurement choice at the endpoints.  For example, rather than always measuring polarization at some preagreed angle, it could be measured at a randomly selected multiple of 10 degrees.   The idea is to pick an entangled superposition property and then to measure it in a way that simply cannot be predicted ahead of time.

Our fraudulent vendor can't know, when generating the original photons, what you will decide to measure, and hence can't spoof an entanglement behavior.  In effect, because you are making random measurements, you'll measure random values.  But if the cable is legitimate and the photons are genuinely entangled, now and then the two experiments will happen to measure the same property in the identical way -- for example, you will measure polarization at the identical angle at both endpoints.  Now entanglement kicks in: both will see the same result.  How often would this occur?  Well, if you and I make random selections in a range of values (say, the value that a dice throw will yield), sometimes we'll bet on the same thing.  The odds can be predicted very easily.

When we bet on the same thing, we almost always read the same value (as mentioned earlier, quantum noise prevents it from being a perfect match).  This elevated correlation implies that you've purchased a genuine quantum cryptography device.

But now think back to VW again.  The company didn't run with low emissions all the time -- they had a way to sense that the engine was being tested, and selected between emission modes based on the likelihood that someone might be watching.  Our fraudulent vendor could try the same trick.  When the cable is connected to the normal communication infrastructure (which the vendor supplies, and hence can probably detect quite easily), the cable uses fake entanglement and the fraudulent vendor can decode every message with ease.  When the cable is disconnected from the normal endpoint hardware, again easy to detect, the vendor sends entangled photons, and a Bell's test would pass!

Clearly, a quantum communications device will only be trustworthy if the user can verify the entire device.  But how plausible is this?  A device of this kind is extremely complex.

My worry is that naïve operators of systems that really need very good security, like hospitals, could easily be fooled.  The appeal of a quantum secure link could lure them to spend quite a lot of money, and yet most such devices may be black boxes, much like any other hardware we purchase.  Even if a device somehow could be deconstructed, who would have the ability to validate the design and implementation?  A skilled skeptical buyer might have no possible way to actually validate the design!

So, will quantum security of this form ever be a reality?  They already are, in lab experiments where the full system is implemented from the ground up.  But one cannot just purchase components and cobble such a solution together: the CIO of a hospital complex who wants a secure network would need to purchase an off-the-shelf solution.  I can easily see how one might spend money and end up with a system that would look as if it was doing something.  But I simply don't see a practical option for convincing a skeptical auditor that the solution actually works!

Saturday, 15 February 2020

The delicate art of the pivot

Success in computing often centers on having the flexibility to know when to pivot an idea, and yet simultaneously, the steadiness to not wander off on hopeless yet seductive tangents.  Pivoting is a surprisingly subtle skill.

A bit of context.  We say that a project has done a pivot if it sets out in one direction but later shifts to focus on some other way of using some of the same ideas.  A pivot that tosses out the technology isn't really what I mean... I'm more interested in the kind of mid-course correction that doesn't require a ground-up rethinking, yet might have a profound impact on the marketing of a product or technology.

Pivots are a universal puzzle for entrepreneurs and researchers alike.  I'm pretty good at finding the right research questions, but not nearly as capable of listening to the market.  I remember an episode at Reliable Network Solutions (RNS). This was a company founded by Werner Vogels, Robbert van Renesse and me, to do management solutions for what we would call cloud computing data centers today.  Back then the cloud term wasn't yet common, so those were really very early days.

We launched RNS to seize an initial opportunity that was irresistible: call it the ultimate business opportunity in this nascent cloud space.  In our minds, we would solve the problem for one of the earliest major players and instantly be established as the go-to company for a solution.  Now, as it happened, our initial customer had special requirements, so the solution we created for them was something of a one-off (they ultimately used it, and I think they still do), but our agreement left us the freedom to create a more general product that we could sell to everyone else.  So, the RNS business plan centered on this follow-on product.  Our vision was that we would create a new and more scalable, general-purpose, robust, self-repairing, fast, trustworthy management infrastructure solution.  We were sure it would find a quick market uptake: after all, customer zero had been almost overwhelmingly hungry for such a solution, albeit in a slightly more specialized setting unique to their datacenter setups.

RNS ended up deciding to base this general product on Astrolabe, a scalable gossip-based management information database that we came up with as a research prototype, inspired by that first product for customer zero.  We had the prototype running fairly quickly, and even got a really nice  ACM TOCS paper out of the work.   Astrolabe had every one of our desired properties, and it was even self-organizing -- a radical departure from the usual ways of monitoring and managing datacenter systems.

Astrolabe was a lovely, elegant idea.  Nonetheless, when we made this decision to bet the bank on it (as opposed to doing something much narrower, along the lines of what we did for that first customer), we let our fondness for the concept get way out ahead of what the market really wanted.

I remember one particular road trip to San Francisco.  We met with the technology leaders of a large company based there, and they set up a kind of brainstorming session with us.  The intent was to find great fits for Astrolabe in their largest datacenter applications.  But guess what?  They turned out to really need a lot less than Astrolabe... and they were nervous that Astrolabe was quite exotic and sophisticated... it was just too big a step for them.

In fact, they wanted something like what Derecho offers today: an ultra-fast tool for replicating files and other data.  They would have wanted Derecho to be useable as a stand-alone command-line solution (we lack this for Derecho: someone should build one!).  But they might have gone for it as a C++ library.  In effect, 15 years later, I finally have what those folks were basically looking for at the time.

At any rate, we had a potential next big customer and a fantastic opportunity for which our team was almost ideally suited -- my whole career has focused on data replication.  Even so, RNS just couldn't take this work on.  We had early customers for our Astrolabe concept, plus customer zero continued to give us some work, and our external investors had insisted on a set of Astrolabe milestones.  We were simply spread too thin, and so even though our friends in San Francisco spoke the truth, it was just impossible for us to take their advice.

But guess what?  Reliable Network Solutions never did manage to monetize Astrolabe in a big way, although we did make a number of sales, and the technology ultimately found a happy home.  We just sort of bumped along for a few years, making barely enough money to pay all our bills, and never hitting that home run that can transform a small company into a big one.  And then, in 2001, the 9-11 terrorist attack triggered a deep tech downturn.  By late 2002 our sales pipeline had frozen up (many companies pause acquisitions during tech downturns), and we had no choice but to close the doors.  35 people lost their jobs that day.

The deep lesson I learned was that a tech company always needs to be awake to the possibility that it has the strategy wrong, that its customers may be able to see the obvious even though company leaders are blind because of their enthusiasm for the technology they've been betting on, and that a pivot might actually save the day.  The path forwards sometimes leads to a dead end.

This is a hard lesson to teach.  I've sometimes helped the Runway "post-doc" incubator program, which is part of the Jacobs Institute at NYC Tech, the technology campus Cornell runs in New York jointly with Technion.  My roles have been pretty technical/engineering ones: I've helped entrepreneurs figure out how to leverage cloud computing to reduce costs.

But teaching entrepreneurs to be introspective about their vision and to pivot wisely?  This has been an elusive skill, for me.  The best person I've ever seen at this is our Runway director, Fernando Gomez-Baquero.  Fernando favors an approach that actually starts by accepting the ideas of the entrepreneurial team at face value.  But then he asks them to validate their concept.   He isn't unreasonable about this, and actually helps them come up with a plan.  Just the same, he never just accepts assumptions as given.  Everything needs to be tested.

This is often hard: one can stand in the hallway at a big tech show-and-tell conference and interview passing CTOs (who are often quite happy to hold forth on their visions for the future), but even if a CTO shows interest, one has to remember my experience with RNS: product one, even a successfully delivered product that gets deployed, might not be a scalable story that can be resold widely and take a company down the path to wild profits.

Fernando is also very good at the non-technology side of companies: understanding what it takes to make a product look sexy and feel right to its actual end-users.  A lot of engineering teams neglect the whole look and feel aspect because they become so driven by the technical ideas and research advances they started with that they just lose track of the bigger picture.  Yet you can take a fantastic technology and turn it into a horrible, unusable product.  Fernando's students never make that kind of mistake: for him, technology may be the "secret sauce", but you have to make the dish itself irresistible, and convince yourself that you've gotten that part right.

This end, what Fernando does is to ask the entrepreneurs to list 20 or so assumptions about their product and their market.  The first few are always easy.  Take RNS: we would have said we were betting that datacenters would become really immense (they did), that managing a million machines at a time is very hard (it is), that the management framework needs to be resilient, self-repairing, stable under stress (and indeed, all of these are true).  Fernando, by the way, wouldn't necessarily agree that these are three assumptions and would never split that last item into subitems.  Instead, he might actually call this just one or two assumptions, and if you have additional technology details to add, he would probably be inclined to just lump them in.  This gets back to my point above:   For a startup company, technology is only a tiny part of the story.  All these "superior properties" of the RNS solution?  Fernando would just accept that.  Ok, "you have technical advantages."  But he would insist on seeing the next 20 items on the list.

When you are constrained to carry out a seemingly strange task, it can feel natural to list human factors: that CTOs have a significant cost exposure on datacenter management, that they know this to be true, and that they would spend money to reduce that exposure.  That they might have an appetite for a novel approach because they view existing ones as poorly scalable.  You might start to toss in cost estimate items, or pricing guesses.  If 20 turns out to be too easy a target, Fernando would read your list, then ask for a few more.

Fernando once told me an anecdote that I remember roughly this way: Suppose that a set of entrepreneurs wanted to bring Korean ice cream to the K-12 school cafeterias in America.  They might outline this vision, extoll the virtues of ice cream in tiny little dots (Korean ice cream is made by dripping the ice cream mix into liquid nitrogen "drop by drop"), and explain that for the launch, they would be going with chocolate.   They would have charts showing the explosive sales growth of Korean ice cream in K-12 schools in Seoul.   Profits will be immense! Done deal, right?

Well, you can test some of these hypotheses.  It isn't so hard to visit K-12 cafeterias, and the head chef probably would be very happy to be interviewed in many of them.  But guess what you might learn?  Perhaps, FDA rules preclude schools from serving ice-cream in a format that could possibly be inhaled.  Perhaps chocolate is just not a popular flavor these days strawberry is the new chocolate, or mint, or Oreo crunch.  Korea happens to be a fairly warm country: the demand for ice cream is quite high during the school year there.  Maybe less-so here.  Anyhow, ice cream isn't considered healthy here, and not every cafeteria serves sweets.

Korean dot-style ice cream?  Awesome idea!  But this kind of 20-questions approach can rule out ill-fated marketing plans long before the company tries to launch its product.  Our ice-cream venture could easily have been headed straight for a costly (but yet, delicious) face-plant. Perhaps the company would have failed.  Launch the exact same concept on the nation's 1000 or so small downtown pedestrian malls, and it could have a chance.  But here we have a small pivot: how would that shift in plan change the expense models?  Would we rent space, or kisoks, or sell from machines?  Would there be new supply challenges, simply to get the ice cream to the points of sale?   Schools are easy to at least find: you can download a list.  How would you find those 1000 small malls, and who would be selling the product at each location?

Had we done a this 20-questions challenge with Astrolabe, we might have encountered the friendly folks in California far sooner, gotten the good advice we received early enough to act on it, and realized that Astrolabe was just too exotic for our target market to accept as a product.  Technically superior?  Absolutely, in some dimensions -- the ones academic researchers evaluate.  But not every product needs technical superiority to win in the market, and conversely, not every technical attribute is decisive in the eyes of CTO buyers.

Fernando is a wizard at guiding companies to pivot before they make that costly error and find themselves so committed to the dead end that there is simply no path except to execute the plan, wise or foolish.

Today, I look at some of my friends who lead companies, and I'm seeing examples everywhere of how smartly timed pivots can be game-changing.  I was blind to that sort of insight back in 2000 when I was actually running RNS.

The art of the pivot isn't necessarily natural for everyone.  By now, I understand the concept and how it can be applied, but even so, I find that I'm prone to fall in love with technology in ways that can cloud my ability to assess a market opportunity in an unbiased way.  It really isn't easy.

So here's my advice: Anyone reading this blog is a technology-centric datacenter person.  Perhaps you teach and do research, as I do (most of the time, anyway).  Perhaps you work for a big company... perhaps you are even toying with trying to launch one.  But I would argue that every one of us has something to learn from this question of technology pivots.  Our students, or our colleagues, or our direct reports: they may be holding fast to some sort of incorrect belief, betting more and more heavily on it, and yet that belief may be either outright wrong, or perhaps just in need of a tweak.

You yourself may be more locked-in on some aspect of your concept than you realize.  That rigidity could be your undoing!

Fernando's 20-questions challenge is awesome because it doesn't approach the issue confrontationally.  After all, the entrepreneurs themselves make the list of assumptions -- it isn't as if Fernando imposes them.  They turn each assumption into a question, too, Jeopardy style.  Then he just urges them to validate the resulting questions, easiest first.  Some pull this off... others come back eager to pivot after a few days, and open to brainstorming based on what they learned.

What an amazing idea.  I wish that back then, we could have launched RNS in the Runway program!  Today, I might be running all the world's cloud computing systems.

Friday, 20 December 2019

A few 10-Year Challenges for Distributed Systems and IoT


A newspaper column on "next decade" predictions got me thinking about crystal ball prognoses.   Tempting as it is to toss in my views on climate change, surveillance in China and self-driving cars, I'll focus this particular blog on computer systems topics in my area.

1. AI Sys and RT ML.  These terms relate to computer systems created to support AI/ML applications, and that often involve addressing real-time constraints.  There is a second meaning that centers on using AI tools in networks and operating systems and database platforms. I'm open-minded but, so far, haven't seen convincing demonstrations that this will yield big advances.  I'll focus on the first meaning here.

Although AI Sys terminology is trendy, the fact is that we are still at the very earliest stages of incorporating sensing devices into applications that leverage cloud-scale data and machine learning.  As this style of system deployment accelerates in coming years, we'll start to see genuinely smart power grids (existing grids often proclaim themselves to be "smart" but honestly, not much use is being made of ML as of yet), smart homes and offices, smart cities and highways, smart farms....  The long-term potential is enormous, but to really embrace it we need to rethink the cloud, and especially, the cloud edge where much of the reactive logic needs to run.  This is why the first of my predictions centers on the IoT edge: we'll see a new and trustworthy edge IoT architecture emerge and mature, in support of systems that combine sensors, cloud intelligence and big data.

Getting there will require more than just redesigning today's cloud edge components, but the good news is that an area begging for disruptive change can be an ideal setting for a researcher to tackle.  To give just one example: in today's IoT hub services, we use a database model to track the code revision level and parameter settings for sensors, connected and accessible or not.  The IoT hub manages secure connectivity to the sensors, pushes updates to them, and filters incoming event notifications, handing off to the function service for lightweight processing.  I really like the hub concept, and I think it represents a huge advance relative to the free-for-all that currently is seen when sensors are connected to the cloud.  Moreover, companies like Microsoft are offering strong quality of service guarantees (delay, bandwidth, and even VPN security) for connectivity to the edge.  They implement the software, then contract with ISPs and teleco's to obtain the needed properties.  From the customer's perspective what matters is that the sensors are managed in a trustworthy, robust and secure manner.

The puzzle relates to the reactive path, which is very far from satisfactory right now.  When a sensor sends some form of event to the cloud, the IoT hub operates like a windowing environment handling mouse movements or clicks: it functions as the main loop for a set of handlers that can be customized and even standardized (thus, a Canon camera could eventually have standard cloud connectivity with standard events such as "focused", or "low power mode", or "image acquired.").  Like with a GUI, the incoming events that need user-defined processing are passed to these user-defined functions, which can customize what will happen next.

The core problem is with the implementation.  First, we split the path: sensor-to-cloud uploads of large objects like photos are videos follow one path, and end up with the data automatically stoed into a binary large objects (BLOB) store, replicated for fault-tolerance.

Meanwhile, other kinds of events, like the ones just mentioned, are handled by small fragments of logic: cloud functions.  But these aren't just lambdas written in C++ or Scala -- they are typically full programs coded in Linux and then handed to the cloud as containers, perhaps with configuration files and even their own virtualized network mapping.  As a result, the IoT Hub can't just perform the requested action -- it needs to launch the container and pass the event into it.

The IoT hub accomplishes these goals using the "function service", which manages a pool of machines, picks one on which to launch this container for this particular Canon photo acquisition event, and then the program will load that event's meta-data and can decide what to do.  In effect, we launch a Linux command.

Normally, launching a Linux command has an overhead of a few milliseconds.  Doing so through the IoT hub is much slower: Today, this process takes as much as two seconds.   The issues are several: first, because the IoT Hub is built on a database like SQL server or Oracle, we have overheads associated with the way databases talk to services like the function service.  Next the function service itself turns out to do a mediocre job of warm-starting functions -- here the delay would center on caching, binding the function to any microservices it will need to talk to ahead of time (off the critical path), dealing with any synchronization the function may require.

I can't conceive of a sensible realtime use case where we can tolerate two seconds of delay -- even web page interactions are down in the 10-50ms range today, well below the 100ms level at which alpha-beta tests show that click-through drops.  So I would anticipate a complete redesign of the IoT hub and function layer to warm-start commonly needed functions, allow them to pre-bind to any helper microservices they will interact with (binding is a potentially slow step but can occur out of the critical path), and otherwise maintain a shorter critical path from sensor to user-mediated action.  I think we could reasonably target sub-1ms delays... and need to do so!

There are many other unnecessarily long delays in today's IoT infrastructures, impacting everything from photo and video upload to ML computation on incoming objects.  But none of this is inevitable, and from a commercial perspective, the value of reengineering it (in a mostly or fully compatible way) would be huge.

2. Cost-efficient sharable hardware accelerators for IoT Edge.  In prior blog postings, I've written about the puzzle of hardware for the IoT Edge (many people take that to mean "outside" the cloud, but I also mean "in the outermost tier of a data center supporting the cloud, like Azure IoT).  Here, the central question involves costs: modern ML and especially model training is cost-effective only because we can leverage hardware accelerators like GPU, TPU and custom FPGA to offload the computationally parallel steps into ultra-efficient hardware.  To this, add RDMA and NVM.

The current generation of hardware components evolved in backed back-end systems, and it is no surprise to realize that they are heavily optimized for batched, offline computing.  And this leads to the key puzzle:  today's ML accelerators are expensive devices that are cost effective only when they can be kept busy.  The big batches of work seen in the back-end enable today's accelerators to run in support of very long tasks, which keeps them busy and makes them cost-effective.  If the same devices were mostly idle, this style of accelerated ML would become extremely expensive.

In some sense, today's ML accelerators could have been at home in the old-styled batch computing systems of the 1970's.  As we migrate toward a more event-driven IoT edge, we also will need to migrate machine learning (model training) and inference into real-time contexts, and this means that we'll be using hardware accelerators in settings that lack the batched pipelining that dominates in the big-data HPC-style settings were those currently reside.  To be cost-effective we will either need completely new hardware (sharable between events or between users), or novel ways to repurpose our existing hardware for use in edge settings.

It isn't obvious how to get to that point, making it a fascinating research puzzle.  As noted, edge systems are event-dominated, although we do see streams of image and video data (image-processing tasks on photo or video streams can be handled fairly well with existing GPU hardware, so that particular case can be solved cost-effectively now).  The much harder case involves singleton events: "classify this speech utterance," or "decide whether or not to retain a copy of that photo."  So the problem is to do snap analysis of an event.  And while my examples involve photos and videos, any event could require an intelligent response.  We may only have milliseconds to react, and part of that reaction may entail retraining or incrementally adjusting the ML models -- dynamic learning.

The hardware available today isn't easily sharable across scaled out event-driven systems where the events may originate in very different privacy domains, or from different users.  We lack ways to protect data inside accelerators (Intel's new SIMD instruction set offers standard protections, but a GPU or TPU or FPGA is typically operated as a single security context: it is wide-open if a task runs on behalf of me immediately after one that ran on behalf of you: the kernel I've invoked could just reach over and extract any data left behind after your task was finished).

So why not use Intel's SIMD solutions?  For classification tasks, this may be the best option, but for training, which is substantially more expensive from a computational point of view, the Intel SIMD options are currently far slower than GPU or TPU (FPGA is the cheapest of all the options, but would typically be somewhere in between the SIMD instructions and a GPU on the performance scale).

It will be interesting to watch this one play out, because we can see the end goal easily, and the market pressure is already there.  How will the hardware vendors respond?  And how will those responses force us to reshape the IoT edge software environment?

3. Solutions for the problem blockchain was supposed to solve.  I'm pretty negative about cryptocurrencies but for me, blockchain is a puzzle.  Inside the data center we've had append-only logs for ages, and the idea of securing them against tampering using entangled cryptographic signatures wasn't particularly novel back when the blockchain for Bitcoin was first proposed.  So why is a tamper-proof append-only log like Microsoft's Corfu system not a blockchain?

There are several aspects in which blockchain departs from that familiar, well-supported option.  I'll touch on them in an unusual order: from practical uses first to more esoteric (almost, "religious") considerations, which I'll tackle last.  Then I want to argue that the use cases do point to a game changing opportunity, but that the whole story is confused by the religious zealotry around some of these secondary and actually, much less important aspects.

First among the novel new stories is the concept of a smart contract, which treats the blockchain as a database and permits the developer to place executable objects into Blockchain records, with the potential of representing complex transactions like the mortgage-backed securities that triggered the 2008 meltdown.  The story goes that if we can capture the full description of the security (or whatever the contract describes), including the underlying data that should be used to value it, we end up with a tamper roof and self-validating way to price such things, and our transactions will be far more transparent.

I see the value in the concept of a smart contract, but worry that the technology has gotten ahead of the semantics: as of the end of 2019 you can find a dozen tools for implementing smart contracts (Ethereum is the leader, but Hyperledger is popular too).  Less clear is the question of precisely how these are supposed to operate.  Today's options are a bit like the early C or Java programming languages: both omitted specifications for all sorts of things that actually turned out to matter, leaving it to the compiler-writer to make a choice.  We ended up with ambiguities that gave us today's security problems with C programs.

With blockchain and smart contracts you have even nastier risks because some blockchain implementations are prone to rollback (abort), and yet smart contracts create dependency graphs in which record A can depend on a future record B.  A smart contract won't seem so smart if this kind of ambiguity is allowed to persist... I predict that 2020 will start a decade when smart contracts with strong semantics will emerge.  But I'll go out on a limb and also predict that by the time we have such an option, there will be utter chaos in the whole domain because of these early but inadequate stories.  Smart contracts, the real kind that will be robust with strong semantics?  I bet we won't have them for another fifteen years -- and when we do get them, it will be because a company like Oracle or Microsoft steps in with a grown-up product that was thought through from bottom to top.  We saw that dynamic with Java and CORBA giving way to C# and LINQ and .NET, which in turn fed back into languages like C++.  And we will see it again, but it will take just as long!

But if you talk to people enamored with blockchain, it turns out that in fact, smart contracts are often seen as a cool curiosity.  I might have a narrow understanding of the field, but among people I'm in touch with, there is little interest in cryptocurrency and even less interest in smart contracts.  More common, as far as I can tell, is a focus on the auditability of a tamperproof ledger.

I'll offer one example that I run into frequently here at Cornell, in the context of smart farming.  You see variants of it in medical centers (especially ones with partner institutions that run their own electronic health systems), human resource management, supply chains, airports that need to track airplane maintenance, and the list goes on.  At any rate, consider farm to table cold-chain shipment for produce or agricultural products like cheese or processed meats.  A cup of yoghurt will start with the cow being milked, and even at that stage we might wish to track which cow we milked, how much milk she produced, the fat content, document that she was properly washed before the milking machine kicked in, that we tested for milk safety and checked her health, that the milk was promptly chilled and then stored at the proper temperature.  Later the milk is aggregated into a big batch, transported, tested again, pasteurized, homogenized, graded by fat content, cultured (and that whole list kicks in again: in properly sterile conditions, at the right temperature...).|

So here's the challenge: Could we use a blockchain to capture records of these kinds in a secure and tamperproof manner, and then be in a position to audit that blockchain for various tasks such as to confirm that the required safety steps were preserved, or to look for optimization opportunities?  Could we run today's ML tools on it, treating the records as an ordered collection and mapping that collection into an event Tensor Flow or Spark/Databricks could ingest and analyze?  I see this a fantastic challenge problem for the coming decade.

The task is fascinating and hard, for a lot of reasons.  One is that the domain is partly disconnected (my colleagues have created a system, Vegvisir, focused on this aspect).  A second question you can ask concerns integrity of our data capture infrastructure: can I trust that this temperature record is from the proper thermometer, correctly calibrated, etc?  Do I have fault-tolerant redundancy?  How can we abstract from the chain of records to a trustworthy database, and what do trust-preserving queries look like?  How does one do machine learning on a trusted blockchain, and what trust properties would the model then carry?  Can a model be self-certifying, too?  What would the trust certificate look like (at a minimum, it would need to say that "if you trust X and Y and Z, you can trust me for purpose A under assumption B...").  I'm reminded of the question of self-certifying code... perhaps those ideas could be applied in this domain.

I commented that this is the problem blockchain really should be addressing.  I say this because as far as I can tell, the whole area is bogged down on really debates that have more to do with religion than with rigorous technical arguments.  To me this is at least in part because of the flawed belief that anonymity and permissionless mining are key properties that every blockchain should offer.  The former is of obvious value if you plan to do money laundering, but I'm pretty sure we wouldn't even want this property in an auditing setting.  As for the permissionless mining model, the intent was to spread the blockchain mining revenue fairly, but this has never really been true in any of the main blockchain systems: they are all quite unfair, and all the revenue goes to shadowy organizations that operate huge block-mining systems.  As such, the insistence on permissionless mining with anonymity really incarnates a kind of political opinion, much like the "copyleft" clause built into GNU licenses, which incarnated a view that software shouldn't be monetized.  Permissionless blockchain incarnates the view that blockchains are for cybercurrency, that cybercurrency transactions shouldn't be taxed or regulated, and that management of this infrastructure is a communal opportunity, but also a communal revenue source.

Turning to permissionless blockchain as it exists today, we have aspects of this dreamed-of technology, but the solutions aren't fair, and in fact demand a profoundly harmful mining model that squanders energy in the form of hugely expensive proof-of-work certifications.  My colleague, Robbert van Renesse, has become active in the area and has been doing a survey recently to also look at some of the other ideas people have floated: proof of stake (a model in which the rich get richer, but the compute load is much reduced, so they spend less to earn their profits...), proof of elapsed time (a lovely acronym, PoET, but in fact a problematic model because the model can be subverted using today's Intel SGX hardware), and all sorts of one-way functions that are slow to compute and easy to verify (the parallelizable ones can be used for proof-of-work but the sequential ones  simply reward whoever has the fastest computer, which causes them to fail on a different aspect of the permissionless blockchain mantra: they are "undemocratic", meaning that they fail to distribute the income for mining blocks in a fair manner).  The bottom line, according to Robbert, is that for now, permissionless blockchain demands computational cycles and those cycles make this pretty much the least-green technology on earth. There is some irony here, because those who promote this model generally seem to have rather green politics in other ways.  I suppose this says something about the corrupting influence of potentially vast wealth.

Meanwhile, more or less betting on the buzz, we have a whole ecosystem of companies convinced that what people really want are blockchain curation products for existing blockchain models.  These might include tools that build the blockchain for you using the more-or-less standard protocols, that back it up, clean up any garbage, index it for quick access, integrate it with databases and AI/ML.  We also have companies promoting some exceptionally complex protocols, many of which seem to have the force of standards simply because people are becoming familiar with their names.  It will take many years to even understand whether or not some of these are correct -- I have serious doubts about a few of the most famous ones!

But here's my bet for the coming decade: in 2029, we'll be seeing this market morph into a new generation of WAN database consumers, purchasing products from today's database companies.  Those customers won't really be particularly focused on whether they use blockchain or some other technology (and certainly won't insist on permissive models with pervasive anonymity and proof of work).   They will be more interested in tamperproof audits and ML on the temporally-ordered event set.

Proof of work per-se will have long since died from resource exhaustion: the world simply doesn't have enough electrical power and cooling to support that dreadful model much longer (don't blame the inventors: the blame here falls squarely on the zealots in the cybercoin community, who took a perfectly good idea and twisted it into something harmful as part of their quest to become billionaires off the back of a pie-in-the-sky economic model).

The future WAN databases that emerge from the rubble will have sophisticated protection against tampering and the concept of trust in a record will have been elevated to a notion of a trustworthy query result, that can be checked efficiently by the skeptical end-user.  And this, I predict, will be a huge market opportunity for the first players to pull it off.  It would surprise me if those players don't turn out to include today's big database companies.

4. Leave-nothing-sensitive behind privacy.  The role of the cloud in smart settings -- the ones listed above, or others you may be thinking about -- is deeply enshrined by now:  very few smart application systems can avoid a cloud-centric computing model in which the big data and the machine intelligence is at least partly cloud-hosted.  However, for IoT uses, we also encounter privacy and security considerations that the cloud isn't terribly good at right now, with some better examples (Azure, on the whole, is excellent) and some particularly poor ones (I won't point a finger but I will comment that companies incented to place a lot of advertising often find it hard to avoid viewing every single user interaction as an invaluable asset that must be captured in perpetuity and then mined endlessly for every possible nugget of insight).

The upshot of this is that the cloud is split today between smart systems that are trying their best to spy on us, and smart systems that are just doing smart stuff to benefit us.  But I suspect that the spying will eventually need to end, at least if we hope to preserve our Western democracies.  How then can we build privacy-preserving IoT clouds?

I've written about this in the past, but in a nutshell, I favor a partnership: a style of IoT application that tries to "leave no trace behind" coupled to a cloud vendor infrastructure that promises not to deliberately spy on the end-user.  Thus for example when a voice command is given to my smart apartment, it may well need to be resolved up on the cloud, but shouldn't somehow be used to update databases about me, my private life, my friends...

I like the mental imagery of camping in a wilderness where there are some bears roaming around.  The cloud needs a model under which it can transiently step in to assist in making sense of my accent and choice of expressions, perhaps even contextualized by knowledge of me and my apartment, and yet when the task finishes, there shouldn't be anything left behind that can leak to third party apps that will rush into my empty campsite, hungry to gobble up any private data for advertising purposes (or worse, in countries like China, where the use of the Internet to spy on the population is a serious threat to personal liberties).  We need to learn to enjoy the benefits of a smart IoT edge without risk.

Can this be done?  I think so, if the cloud partner itself is cooperative.  Conversely, the problem is almost certainly not solvable if the cloud partner will see its revenue model break without all that intrusive information, and hence is hugely incented to cheat.  We should tackle the technical aspects now, and once we've enabled such a model, I might even favor asking legislative bodies to mandate privacy-preservation as a legally binding obligation on cloud vendor models.  I think this could be done in Europe, but the key is to first create the technology so that we don't end up with an unfunded and infeasible mandate.  Let's strike a blow against all those companies that want to spy on us!  Here's a chance to do that by publishing papers in top-rated venues... a win-win for researchers!

5. Applications that prioritize real-time.  Many IoT systems confront deadlines, and really have no choice except to take actions at the scheduled time.  Yet if we want to also offer guarantees, this poses a puzzle: how do we implemented solutions that are always sure to provide the desired timing properties, yet are also "as consistent" as possible, or perhaps "as accurate as possible", given those constraints?

To me this is quite an appealing question because it is easy to rattle off a number of ways one might tackle such questions.  For example, consider an ML algorithm that iterates until it converges, which typically involves minimizing some sort of error estimate.  Could we replace the fixed estimate by adopting a model that permits somewhat more error if the deadline is approaching?

Or here's an idea: What about simply skipping some actions because it is clear we can't meet the deadline for them?  I'm reminded of work Bart Selman, a colleague of mine, did fifteen years ago.  Bart was looking at situations in which an AI system confronted an NP complete question, but in a streaming context where variations on that question would be encountered every few seconds (he was thinking about robot motion planning but similar issues arise in many AI tasks).  What he noticed was that heuristics for solving these constrained optimization problems sometimes converge rapidly but in other situations diverge and compute endlessly.  So his idea, very clever, was to take the quick answers but to just pull the plug on computations that take too long.  In effect, Bart argued that if the robot is faced with a motion-planning task it won't be able to solve before its next step occurs, take the previously-planned step and then try again.  Sooner or later the computation will converge quickly, and the overall path will be both of high quality, and fast.

We could do similar things in many IoT edge settings, like the smart-things cases enumerated earlier.  You might do better to have a smart grid that finds an optimized configuration setting once every few seconds, but then coasts along using old settings, than to pause to solve a very hard configuration problem 20 times per second if in doing so, you'll miss the deadline for actually using the solution.  The same is true for management of traffic flow on a highway or in a dense city.

For safety purposes, we will sometimes still want to maintain some form of risk envelope.  If I'm controlling a smart car in a decision-loop that runs 20 times per second, I might not run a big risk if I toss up my hands even 4 or 5 times in a role. But we would not want to abandon active control entirely for 30 seconds, so there has to be a safety mechanism too, one that kicks in long before the car could cause an accident (or miss the next turn), forcing it into a safe mode.  I don't see any reason we couldn't do this: a self-driving car (or a self-managed smart highway) would need some form of safety monitor in any case, to deal with all sorts of possible mishaps, so having it play the role of making sure the vehicle has fresh guidance data seems like a fairly basic capability.  Then in the event of a problem, we would somehow put that car into a safe shutdown mode (it might use secondary logic to pull itself into a safety lane and halt, for example).

I could probably go on indefinitely, but every crystal ball eventually fogs over, so perhaps we'll call it quits here.  Have a great holiday and see you in the next decade!