Friday, 20 March 2020

A new kind of IoT cloud for distributed AI

Like much of the world, Ithaca has pivoted sharply towards self-isolation while the coronavirus situation plays out.  But work continues, and I thought I might share some thoughts on a topic I've been exploring with my good friend Ashutosh Saxena at Caspar.ai (full disclosure: when Ashutosh launched the company, I backed him, so I'm definitely not unbiased; even so, this isn't intended as advertising for him).

The question we've been talking about centers on the proper way to create a privacy-preserving IoT infrastructure.  Ashutosh has begun to argue that the problem forces you to think in terms of hierarchical scopes: data that lives within scopes, and is only exported in aggregated forms that preserve anonymity.  He also favors distributed AI computations, again scoped in a privacy-preserving manner.  The Caspar platform, which employs this approach, is really an IoT edge operating system, and the entire structure is hierarchical, which is something I haven't seen previously (perhaps I shouldn't be surprised, because David Cheriton, a leading operating systems researcher, has been very actively involved in the design).

Ashutosh reached this perspective after years of work on robotics.    The connection to hierarchy arises because a robot often has distinct subsystems:  when designing a robotic algorithm one trains machine-learned models to solve subtasks such as motion planning, gripping objects, or reorienting a camera to get a better perspective on a scene.  This makes it natural for Ashutosh to view smart homes, multi-building developments and cities as scaled up instances of that same robotic model.

Interestingly, this hierarchical perspective is a significant departure from today's norm in smart home technologies.  Perhaps because the cloud itself hasn't favored edge computing, especially for ML, there has been a tendency to think of smart homes and similar structures as a single big infrastructure with lots of sensors, lots of data flowing in, and then some form of scalable big-data analytic platform like Spark/Databricks on which you train your models and run inference tasks, perhaps in huge batches.  Without question, this how most AI solutions work today: Google maps, Facebook's TAO social networking infrastructure, etc.

The relevance is that Google is doing this computation using a scalable system that runs on very large data repositories in an offline warehouse environment.  This warehouse creates both temptation and reward: you created the huge data warehouse to solve your primary problem, but now it becomes almost irresistible to train ad placement models on the data.  If you make your money on ads, you might even convince yourself that you haven't violated privacy, because (after all), a model is just a parameterized equation that lumps everyone together.   This rewards you, because training advertising models on larger data sets is known to improve advertising revenues.  On the other hand, data mining potentially can directly violate privacy or user intent, and even a machine-learned model could potentially reveal unauthorized information.

Ashutosh believes that any data-warehousing solution is problematic if privacy is a concern.  But he also believes that data-warehousing and centralized cloud computations miss a larger opportunity: that the quality of the local action can be washed out by "noise" coming from the massive size of the data set, and that overcoming this noise will require an amount of computation that rises to an unacceptable level.  Hence, he argues, you eventually end up with privacy violations, an unsurmountable computational barrier, and a poorly trained local model.  But on the other hand, you've gained higher ad revenues and perhaps for this reason, might be inclined to shrug off the loss of accuracy and "contextualization quality", by which I mean the ability to give the correct individualized response to a query "in the local context" of the resident who issued the query.

We shouldn't blindly accept such a claim.  What would be the best count-argument?  I think the most obvious pushback is this: when we mine a massive data warehouse in the cloud, we don't often treat the whole data set as part of some single model (sometimes we do, but that isn't somehow a baked-in obligation).  More often we view our big warehouse as co-hosting millions of separate problem instances, sharded over one big data store but still "independent".  Then we run a batched computation: millions of somewhat independent sub-computations.  We gain huge efficiencies by running these in a single batched run, but the actual subtasks are separate things that execute in parallel.

What I've outlined isn't the only option: one actually could create increasingly aggregated models, and this occurs all the time: we can extract phonemes from a million different voice snippets, then repeatedly group them and process them, ultimately arriving at a single voice-understanding model that covers all the different regional accents and unique pronunciations.  That style of computation yields one speech model at the end, rather than a million distinct ones each trained for a million distinct accents (what I called "local" models, above).

Ashutosh is well aware of this, and offers two responses.  First, he points to the issue of needing to take actions or even dynamically learn in real-time.  The problem is that to create a giant batch of a million sub-computation, with tasks that trickle in, you would often need to delay some tasks for quite a long time.   But if the task is to understand a voice command, that delay would be intolerable.  And if you try to classify the request using a model you built yesterday, when conditions differed, you might not properly contextualize the command.

In a perspective focused primarily on computational efficiencies, one needs to also note that doing things one by one is costly: a big batch computation will amortize over the huge number of parallel sub-tasks.  But in the smart home, we have computing capability close to the end user in any case, if we are willing to use it.  So this argues that we should put the computation closer to the source of the data for real-time reasons, and in the process, will gain localization in a natural way.  Contextualized queries fall right out.  Then, because we never put all our most sensitive and private data in one big warehouse, we simultaneously have saved ourselves a huge and irresistible temptation that no ad-revenue driven company is likely to resist for very long.

The distributed AI (D-AI) community, with which Ashutosh identifies himself, adopts the view that a smart home is best understood as a community of expert systems.  You might have an AI trained to operate a smart lightswitch... it learns the gestures you use for various lighting tasks.  Some other AI is an expert on water consumption in your home and will warn if you seem to have forgotten that the shower is running.  Yet another is an expert specific to your stove and will know if dinner starts burning...

For Ashutosh, with his background in robotics, this perspective leads to the view that we need a way to compose experts into cooperative assemblies: groups of varying sizes that come together to solve tasks.  Caspar does so by forming a graph of AI components, which share information but also can hold and "firewall" information.  Within this graph, components can exchange aggregated information but only in accordance with a sharing policy.  We end up with a hierarchy in which very sensitive data is held as close to the IoT device where it was captured as possible, with only increasing aggregated and less sensitive summaries rising through the hierarchy.  Thus at the layer where one might do smart power management for a small community, controlling solar panels and wall batteries and even coordinating HVAC and hot water heaters to ramp power consumption up, or ease it off, the AI element responsible for those tasks has no direct way to tap into the details of your home power use, which can reveal all sorts of sensitive and private information.

I don't want to leave the impression that privacy comes for free in a D-AI approach.  First, any given level has less information, and this could mean that it has less inference power in some situations.  For example, if some single home is a huge water user during a drought, the aggregated picture of water consumption in that home's community could easily mask the abusive behavior.  A D-AI system that aggregates must miss the issue; one that builds a data warehouse would easily flag that home as a "top ten abuser" and could dispatch the authorities.

Moreover, D-AI is more of a conceptual tool than a fully fleshed out implementation option.  Even in Caspar's hierarchical operating system, it is best to view the system as a partner, working with a D-AI component that desires protection for certain data even as it explicitly shares other data: we don't yet know how to specify data flow policies and how to tag aggregates in such a way that we could automatically enforce the desired rules.  On the other hand, we definitely can "assist" a D-AI system that has an honest need for sharing and simply wants help to protect against accidental leakage, and this is how the Caspar platform actually works.

Ashutosh argues that D-AI makes sense for a great many reasons.  One is rather mathematical: he shows that if you look at the time and power complexity of training a D-AI system (which comes down to separately training its AI elements), the costs scale.  For a single big AI, those same training costs soar as the use case gets larger and larger.  So if you want a fine-grained form of AI knowledge, a D-AI model is appealing.

The Caspar IoT cloud, as a result, isn't a centralized cloud like a standard data warehouse might use.  In fact it has a hierarchical and distributed form too: it can launch a D-AI compute element, or even an "app" created by Caspar.ai's team or by a third party, in the proper context for the task it performs, blocking it from accessing data that isn't authorized to it.  Processing nodes can then be placed close to the devices (improving real-time responsiveness), and we can associate different data flow policies at each level of the hierarchy, so that higher-level systems have increasingly less detailed knowledge from the remote and more sensitive IoT manager systems that might know far more, but only for a specific reason, such as to better understand voice commands in a particular part of the home: a "contextualized" but more sensitive task.

Then one can carry all of this even further.  We can have systems that are permitted to break the rules, but only in emergencies: if a fire is detected in the complex, or perhaps a wildfire is active in the area, we can switch to a mode in which a secondary, side-by-side hierarchy activates and is authorized to report to the first responders: " there are two people in the A section of the complex, one in unit A-3 and one in unit A-9.  In unit A-3 the resident's name is Sally Adams and she is in the northeast bedroom..."  All of this is information a standard smart home system would have sent to the cloud, so this isn't a capability unique to Caspar.  But the idea of having an architecture that localizes this kind of data unless it is actually needed for an emergency is appealing: it removes the huge incentive that cloud providers currently confront, in which by mining your most private data they can gain monetizable insights.

In the full D-AI perspective, Caspar has many of these side-by-side hierarchies.  As one instantiates such a system over a great many homes, then communities, then cities, and specializes different hierarchies for different roles, we arrive at a completely new form of IoT cloud.  For me, as an OS researcher, I find this whole idea fascinating, and I've been urging Ashutosh and Dave to write papers about the underlying technical problems and solutions (after all, before both became full time entrepreneurs, both were full time researchers!)

We tend to think of the cloud in a centralized way, even though we know that any big cloud operator has many datacenters and treats the global deployment as a kind of hierarchy: availability zones with three data centers each, interconnected into a global graph, with some datacenters having special roles: IoT edge systems, Facebook point-of-presence systems (used heavily for photo resizing and caching), bigger ones that do the heavy lifting.  So here, with D-AI, we suddenly see new forms of hierarchy.  The appeal is that whereas the traditional model simply streams all the data upwards towards the core, this D-AI approach aggregates at what we could call the leaves, and sends only summaries (perhaps even noised to achieve differential privacy, if needed) towards the larger data warehouse platforms.

So how does this play out in practice?  It seems a bit early to say that Caspar has cracked the privacy puzzle, which is the event that could make smart homes far more palatable for most of us.   On the other hand, as the distributed IoT cloud's protection barriers grow more sophisticated over time, one could believe that it might ultimately become quite robust even if some apps are maliciously trying to subvert the rules (if we look at Apple's iPhone and iPad apps, or the ones on Google's Android, this is definitely the trend).  Meanwhile, even if privacy is just one of our goals, the D-AI concept definitely offers contextualization and localization that enables real-time responsiveness of an exciting kind.  The Caspar platform is actually up and running, used in various kinds of real-estate developments worldwide.  Their strongest uptake has been in residential communities for the elderly: having a system that can help with small tasks (even watching the pets!) seems to be popular in any case, but especially popular in groups of people who need a little help now and then, yet want to preserve their autonomy.