A Few Thoughts on Distributed Computing: Leave no trace behind: A practical model for IoT privacy?

IoT confronts us with a seeming paradox.

There is overwhelming evidence that machine learning requires big data, specialized hardware accelerators and substantial amounts of computing resources, hence must occur on the cloud.

The eyes and ears of IoT, in contrast, are lightweight power-limited sensors that would generally have primitive computing capabilities, mostly “dedicated” storage capacity (for storing acquired images or other captured data), and limited programmability. These devices have bandwidth adequate to upload metadata, such as thumbnails, and then can upload selected larger data objects, but they can’t transmit everything. And the actuators of the IoT world are equally limited: controllers for TVs and stereo receivers, curtains that offer robot controls, and similar simple, narrowly targeted robotic functionality.

It follows that IoT necessarily will be a cloud “play.” Certainly, we will see some form of nearby point-of-presence in the home or office, handling local tasks with good real-time guarantees and shielding the cloud from mundane workloads. But complex tasks will occur on the cloud, because no other model makes sense.

And here is the puzzle: notwithstanding these realities, IoT systems will collect data of incredible sensitivity! In aggregate, they will watch us every second of every day. There can be no privacy in a smart world equipped with pervasive sensing capabilities. How then can we avoid creating a dystopian future, a kind of technological Big Brother that watches continuously, knows every secret, and can impose any draconian policy that might be in the interests of the owners and operators of the infrastructure?

Indeed, the issue goes further: won’t society reject this degree of intrusiveness? In China, we already can see how dangerous IoT is becoming. Conversely, in Europe, privacy constraints are already very strong, and some countries, like Israel, even include a right to privacy in its constitution. If we want IoT to boom, we had better not focus on an IoT model that would be illegal in those markets, and would play into China’s most repressive instincts!

IoT is the most promising candidate for the next wave of technology disruption. But for this disruption to occur, and for it to enable the next wave of innovation and commerce, we need to protect the nascent concept against the risk posed by this seemingly inherent need to overshare with the cloud.

But there may be an answer. Think about the rule for camping: pack it in, then pack it out, leaving no trace behind. Could we extend the cloud to support a no-trace-left behind computing model?

What I have in mind is this. Our device, perhaps a smart microphone like Alexa, Siri, or Cortana hears a command but needs cloud help to understand it. Perhaps the command is uttered in a heavy accent, or makes reference to the speaker’s past history, or has a big data dimension. These are typical of cases where big data and hardware accelerators and all that cloud technology make a huge difference.

So we ship the information up to Google, Microsoft, Amazon. And here is the crux of my small proposal: suppose that this provider made a binding contractual commitment to retain no trace and to use every available technical trick to prevent attackers from sneaking in and stealing the sensitive data.

Today, many cloud operators do the opposite. But I’m proposing that the cloud operator forgo all that information, give up on the data-sales opportunities, and commit to perform the requested task in a secured zone (a secure “enclave” in security terminology).

Could this be done, technically? To me it seems obvious that the problem isn’t even very hard!

The home device can use firewalls, securely register and bind to its sensors, and send data over a secured protocol like https. Perfect? No. But https really is very secure.

In the cloud, the vendor would need to avoid cohosting the computation on nodes that could possibly also host adversarial code, which avoids the issue of leakage p such as with “meltdown.” It would have to monitor for intrusions, and for insider “spies” trying to corrupt the platform. It would need to scrub the execution environment before and after the task, making a serious effort to not leave traces of your question.

The vendor would have to carry this even further, since a machine learning tool that can answer a question like “does this rash look like it needs a doctor to evaluate it?” might need to consult with a number of specialized microservices. Those could be written by third parties hoping to sell data to insurance companies. We wouldn’t want any of them retaining data or leaking it. Same for apps that might run in the home.

But there is a popular “stateless” model for cloud computing that can solve this problem. We want those microservices walled off and by locking them into a stateless model (think of a firewall that blocks attempts to send data out), and only allowing them to talk to other stateless microservices, it can be done. A serious attempt to monitor behavior would be needed too: those third party apps will cheat if they can.

Today, many cloud companies are dependent on capturing private data and selling it. But I don’t see why other companies, not addicted to being evil, couldn’t offer this model. Microsoft has made very public commitments to be being a trusted, privacy-preserving, cloud partner. What I’ve described would be right up their alley! And if Azure jumped in with such a model, how long would it be before everyone else rushes to catch up?

To me this is the key: IoT needs privacy, yet by its nature, a smart world will be an interconnected, cloud style environment, with many tasks occurring in massive data centers. The cloud, up to now, has evolved to capture every wisp of personal information it can, and doing so made some people very wealthy, and enabled China to take steps breathtaking in their intrusiveness. But there is no reason that the future IoT cloud needs to operate that way. A “leave no trace model”, even if supported only by one big provider like Microsoft, could be the catalyst we’ve all been waiting for. And just think how hard it will be for companies (or countries) locked into spying and reporting everything, to compete with that new model.

Let’s learn to pack it in... and then to pack up the leftovers and clear them out. The time is ripe for this, the technology is feasible, and the competition will be left reeling!

A Few Thoughts on Distributed Computing

Monday, 7 January 2019

Leave no trace behind: A practical model for IoT privacy?

No comments:

Post a Comment