Wednesday, 13 March 2019

Intelligent IoT Services: Generic, or Bespoke?

I've been fascinated by a puzzle that will probably play out over several years.  It involves a deep transformation of the cloud computing marketplace, centered on a choice.  In one case, IoT infrastructures will be built the way we currently build web services that do things like intelligent recommendations or ad placements. In the other, edge IoT will require a "new" way of developing solutions that centers on creating new and specialized services... ones that embody real-time logic for making decisions or even learning in real-time.

I'm going to make a case for bespoke, handbuilt, services: the second scenario.  But if I’m right, there is hard work to be done and whoever starts first will gain a major advantage.

So to set the stage, let me outline the way IoT applications work today in the cloud.  We have devices deployed in some enterprise setting, perhaps a factory, or an apartment complex, or an office building.  These might be quite dumb, but they are still network enabled: they could be things like temperature and humidity sensors, motion detectors, microphones or cameras, etc.  Because many are dumb, even the smart ones (like cameras and videos with built-in autofocus, deblurring, depth perception) are treated in a sort of rigid manner: the basic model is of a device with a limited API that can be configured, and perhaps can be patched if the firmware has issues, but then generates simple events with meta-data that describes what happens.   

In a posting a few weeks ago, I noted that unmanaged IoT deployments are terrifying for system administrators, so the world is rapidly shifting towards migrating IoT device management into systems like Azure's infrastructure for Office 365.  Basically, if my company already uses Office for other workplace tasks, it makes sense to also manage these useful (but potentially dangerous) devices through the same system.  

Azure's IoT Hub handles that managerial role: secure connectivity to the sensors, patches guaranteed to be pushed as soon as feasible... and in the limit, maybe nothing else. But why stop there? My point a few weeks back was simply that even just managing enterprise IoT will leave Azure in a position of managing immense numbers of devices -- and hence, in a position to leverage the devices by bringing new value to the table.

Next observation: this will be an "app" market, not a "platform" market.  In this blog I don't often draw on marketing studies and the like, but for the particular case, it makes sense to point to market studies that explain my thinking (look at Lecture 28 in my CS5412 cloud computing class to see charts from the studies I drew on).  

Cloud computing, perhaps far more than most areas of systems, is shaped by the way cloud customers actually want to use the infrastructure.  In contrast, an area like databases or big data is about how people want to use the data, which shapes access patterns.  But they aren't trying to explicitly route their data through FPGA devices that will transform it in some way, or doing computations that can't keep up unless they run in GPU clusters.  So, because my kind of cloud customers migrate to the clouds that make it easier to build their applications, they will favor the cloud that has the best support for IoT apps.

A platform story basically offers minimal functionality, like bare metal running Linux, and leaves the developers to do the rest.  They are welcome to connect to services but not required to do so.  Sometimes this is called the hybrid cloud.

Now, what's an app?  As I'm using the term, you would want to visualize the iPhone or Android app store: small programs that share many common infrastructure components (the GUI framework, the storage framework, the motion sensor and touch sensors, etc), and then that connect to their bigger cloud-hosted servers over a Web Services layer that tends to match nicely with the old Apache-dominated cloud for doing highly concurrent construction of web pages.  So this is the intuition.

For IoT, though, an app model wouldn't work in the same way -- in fact, it can't work in the same way.  First, IoT devices that want help from intelligent machine-learning will often need support from something that learns in real-time.  In contrast, today's web architecture is all about learning yesterday and then serving up read-only data at ultra-fast rates from scalable caching layers that could easily be stale if the data was actually changing rapidly.  So suddenly we will need to do machine learning, decision making and classification, and a host of other performance-intensive tasks at the edge, under time pressure, and with data changing quite rapidly.  Just think of a service that guides a drone surveying a farming area that wants to optimize its search strategy to "sail on the wind" and you'll be thinking about the right issues. 

Will the market want platforms, or apps?  I think the market data strongly suggests that apps are winning.  Their relatively turnkey development advantages outweigh the limitations of programming in a somewhat constrained way.  If you do look at the slides from my course, you can see how this trend is playing out.  The big money is in apps.

And now we get to my real puzzle.  If I'm going to be creating intelligent infrastructure for these rather limited IoT devices (limited by power, and by compute cycles, and by bandwidth), where should the intelligence live?  Not on the devices: we just bolted them down to a point where they probably wouldn't have the capacity.  Anyhow, they lack the big picture: if 10 drones are flying around, the cloud can build a wind map for the whole farm.  But any single drone wouldn't have enough context to create that situational picture, or to optimize the flight plan properly.  There is even a famous theoretical result on the "cost of anarchy", showing that you don't get the global optimum if you have a lot of autonomous agents making individually optimal choices.  No, you want the intelligence to reside in the cloud.  But where?

Today, machine intelligence lives at the back, but the delays are too large.  We can’t control today’s drones with yesterday’s wind patterns.  We need intelligence right at the edge!

Azure and AWS both access their IoT devices through a function layer ("lambdas" in the case of AWS).  This is an elastic service that hosts containers, launching as many instances of your program as needed on the basis of events.  Functions of this kind are genuine programs and can do anything they need to do, but they run what is called a "stateless" mode, meaning that they flash into existence (or are even warm-started ahead of time, so that when the event arrives, the delay is minimal).  Then they handle the event, but they can't save any permanent data locally, even though the container does have a small file system that works perfectly well: as soon as the event handling ends, the container will garbage collect itself and that local file system will evaporate.

So, the intelligence and knowledge and learning has to occur in a bank of servers.  One scenario, call it the PaaS mode, would be that Amazon and Microsoft pre-build a set of very general purpose AI/ML services, and we code all our solutions by parameterizing those and mapping everything into them.  So here you have AI-as-a-service.  Seems like a guaranteed $B startup concept!  But very honestly, I'm not seeing how it can work.  The machine learning you would do to learn wind patterns and direct drones to sail on the wind is just too different from what you need to recognize wheat blight, or to figure out what insect is eating the corn.

The other scenario is the "bespoke" one.  My Derecho library could be useful here.  With a bespoke service, you take some tools like Derecho and build a little cluster-hosted service of your very own, which you then tell the cloud to host on your behalf.  Then your functions or lambdas can talk to your services, so that if an IoT event requires a decision, the path from device to intelligence is just milliseconds.  With consistent data replication, we can even eliminate stale data issues: these services would learn as they go (or at least, they could), and then use their most recent models to handle each new stage of decision-making.

But without far better tools, it will be quite annoying to create these bespoke services, and this, I think, is the big risk to the current IoT edge opportunity: do Microsoft and Amazon actually understand this need, and will they enlarge the coverage of VSCode or Visual Studio or in Amazon's case, Cloud9, to "automate" as many aspects of service creation as possible, while still leaving flexibility for the machine learning developer to introduce the wide range of customizations that her service might require?

What are these automation opportunities?  Some are pretty basic (but that doesn't mean they are easy to do by hand)!  To actually launch a service on a cloud, there needs to be a control file created, typically in a JSON format, with various fields taking on the requisite values.  Often, these include magically generated 60-hexidecimal-digit keys or other kinds of unintuitive content.  When you use these tools to create other kinds of cloud solutions, they automate those steps.  By hand, I promise that you’ll spend an afternoon and feel pretty annoyed by the waste of your time.  A good hour will be lost on those stupid registry keys alone.

Interface definitions are a need too.  If we want functions and lambdas talking to our new bespoke micro-services ("micro" to underscore that these aren't the big vendor-supplied ones, like CosmosDB), the new micro-service needs to export an interface that the lambda or function can call at runtime.  Again, help needed!

In fact the list is surprisingly long, even though the items on it are (objectively) trivial.  The real point isn’t that these are hard to do, but rather that they are arcane and require looking for the proper documentation, following some sort of magic incantation, figuring out where to install the script or file, testing your edited version of the example they give, etc.   Here are a few examples:
  • Launch service
  • Authenticate if needed
  • Register micro/service to accept RPCs
  • There should be an easy way to create functions able to call the service, using those RPC APIs
  • We need an efficient upload path for image objects
  • There will need to be tools for garbage collection (and tools to track space use)
  • … and tools for managing the collection of configuration parameter files and settings for an entire application
  • .… and lifecycle tools, for pushing patches and configuration changes in a clean way.
Then there are some more substantial needs:
  • Code debugging support for issues missed in development and then arising at runtime
  • Performance monitoring, hotspot visualization and performance optimization (or even, performance debugging) tools
  • Ways to enable a trusted micro-service to make use of hardware accelerators like RDMA or FGPA even if the end user might not be trusted to safely to so (many accelerators save money and improve performance but are just not suitable for direct access by hordes of developers with limited skill sets.  Some could destabilize the data center or crash nodes, and some might have security vulnerabilities.
This makes for a long list, but in my view, a strong development team at Amazon or Microsoft, perhaps allied with a strong research group to tackle the open ended tasks, could certainly succeed.  Success would open the door to mature intelligent edge IoT.  Lacking such tools, though, it is hard not to see edge IoT as being pretty immature today: huge promise, but more substance is needed.
My bet?  Well, companies like Microsoft need periodic challenges to set in front of their research teams.  I remember that when I visited MSR Cambridge back in 2016, everyone was asking what they should be doing as researchers to enable the next steps for the product teams... the capacity is there.  And those market slides I mentioned make it clear: The edge is a huge potential market.  So I think the pieces are in place, and that we should jump on the IoT edge bandwagon (in some cases, “yet again”).  This time, it may really happen!

Saturday, 23 February 2019

Managed sensor deployments

In my cloud computing course at Cornell, students have asked why we are focused on Azure IoT during the current spring offering.  One answer is that I like to orient our MEng students towards technology sectors that are experiencing really dramatic growth, because I want them to have strong job prospects and the skills to secure jobs that will pay above average -- and historically, this has been a good strategy for all of us teaching at this level.

But implicit in my thinking is the assumption that Azure IoT (or the Amazon AWS counterpart) really is headed towards mainstream adoption.  So the question then has to be posed: why should we believe this?  After all, I've been around long enough to remember those crazy videos of people in the Xerox SRC lab or the MIT media lab who set out to capture a digital record of their entire day, and ended up dressed like deep-sea divers, with cameras and microphones all over their bodies and aimed in every possible direction.

That can't possibly be the future of IoT.

What then will be the drivers for this particular IoT surge, and why should we bet that this time, IoT has "crossed the chasm" (a reference to a wonderful 1991 book by Geoffrey Moore)?

I would argue that the first reason centers on a form of risk that creates a powerful pent-up demand.  The risk is simply that in homes, offices and public spaces we are increasingly surrounded by wifi-enabled devices that are capable of tracking us through audio, imaging, motion detectors, swipe card and RFID sensing... the list is long.  So where are these things?  Many are simply in our hands or pockets: any smart phone fits the picture.  But then there are room environmental controls that use small devices to track room or space occupancy -- and because cameras and microphones are so cheap now, naturally are based on those technologies.

I find it kind of ironic that ubiquity would have driven the cost of spying on us down to pennies per device, but there you have it: the vast quantities of inexpensive audio and video chips that have flooded the market due to their use in phones are ending up in just about everything else.  The same goes for simple Linux-based computing platforms: with ARM, there are a tremendous number of devices that have the compute capabilities of an old-style Linux box, and indeed, run Linux or one of the stripped-down real-time capable alternatives.

So we have this unusual picture in which volume has driven costs to the floor, creating a situation in which if you want your room air conditioning "sector controller" to be smart, the easiest thing may be to just include a fully capable small PC that can watch and listen to the space to see if anyone is in it.
But this then becomes a handy option for intrusion, or compromise by the folks who like to create giant bot deployments -- why bother to compromise my PC if they can just target my router, my smart TV set and home entertainment system, my microwave and fridge, my thermostat?  And this doesn't even get at the intentional cases: Alexa, Siri, Cortana and their friends, always waiting to hear their name mentioned, always listening.

It doesn't matter much which kind of space you focus on: whether at home, in the office, walking in the park, all of us are continuously within range of something or other.  And that device is at least in theory capable of hosting intelligent spyware.

To me this single insight is already enough to justify a major bet on Azure IoT and its cousins.  But focusing just on Microsoft, we have already seen the huge success of Office 365: Microsoft dithered but finally figured out that everyone likes their tools, and found a way to integrate them into a complete IT solution for modern enterprises with all sorts of intelligent (social-networking) features to let corporate customers leverage the knowledge inherent to their organization.  There are some obvious glitches (the one that drives me crazy is that the Office 365 version of Skype (Skype for Business) and Slack (Teams) refuse to talk to my desktop telephone, even though my phone uses a standard VOIP technology -- what an annoying oversight).

Anyhow, suddenly we see all of the world's medium to large corporations adopting Office 365 as a complete IT story for internal workplace collaboration, and beyond those annoyances, there really are big wins.  The technology is making Microsoft a winner again.

But every one of those Office 365 customers needs to worry about competitors spying on them, and some also worry about randomware invasions or other forms of disruptive intrusions.  Where would you focus that worry, right now?  I think your attention would be on the smart thermostat, the routers, and the myriad other intelligent devices that pervade the enterprise, and yet are basically insecure.

This is where I see a real opportunity for Azure IoT: the chance to be "Office 365 IoT" by using the security functions of the Azure IoT Hub to wire down all of those devices.  And this is at least the stated plan -- check out Microsoft's official stories.  First, they are trying hard to convince vendors to use a small hardware component called Azure Sphere to secure the device itself -- sensors with a trusted hardware security component (and if you don't love Sphere, Berkeley's David Culler has a bunch of research papers and ideas on sensor security that you could explore.  He used to even have a sensor technology company, although with the pace of turnover in the Bay Area, I'm not sure what became of it.)

A brilliant PhD student of mine, Z Teo, has a company in this space too (free advertising for him!): IronStack.  Z's focus is on securing the corporate network by gaining better control over the routing elements, especially the programmable SDN components.

Then the plan would be to connect every single sensor in the corporate campus to Azure IoT Hub,  every controllable router, every smart microwave oven and self-flushing toilet... and by doing so, to gain  minimal level of control over all these things.  The Azure IoT Hub is basically a massive active database: it has a secure link to the devices, and it wires them down: these sensors cease to be accessible over normal networks, so that once they are connected to the hub, they aren't connected to intruders and spies.  Next, the hub makes sure that the firmware is always properly patched.  Is your fancy printer running the proper software revisions?  With Azure IoT Hub, the answer should be "yes, if it is available on the network"  because if the answer were "no", the other part of the answer would be "but we've taken it offline and dispatched someone to fix it."

I honestly think that taking control over all of these devices is, by itself, the first killer app.  We would be in a much better place if all the smart devices in our environment (and now I mean all: not just the ones in the office, but the CATV things in the street and the smart phones -- the whole game) were actively controlled by security-management infrastructures that can just keep the software patched and avoid random drive-by takeovers.

But once you have this active control over billions (maybe trillions, someday) of devices, it becomes very appealing to make the home and office smart -- and this is the second reason I would bet on Azure IoT right now.  That huge opportunity to innovate is going to be too sexy to pass up, and in the current white-hot technology market, some things are too obvious to fail.  So this won't fail.  If anything, I think we'll head towards some form of auto-configuration where just bringing a smart expresso machine into the building triggers the protocol to securely register it (and control it) from the building management system.

For truth in advertising; I actually have investments in this space -- I own plenty of shares of Microsoft stock, and I'm also an investor and advisor to, a smart homes venture.  That's the one that got me thinking about the "leave no sensitive data behind" model for cloud computing, discussed earlier in January on this blog.  And while I don't happen to have any money in IronStack, I'm very loyal to my past students.  So I've got some biases here!

And yet honestly, I don't think this is a biased blog.  My bets are on this stuff because I truly believe that it has huge positive potential, even if it also represents a short-term risk for organizations that haven't bothered to think it through and are operating with insecure environments.  Worried that someone might be spying on your corporate meetings?  Well, have you thought about what devices might be in the room?  Maybe it is about time to secure them, with something like the Azure IoT Hub.

What of the huge potential?  Well, I have elderly relatives and I like the idea of a friendly little home ghost that can keep an eye on things, making sure they haven't fallen, that the stove was turned off when they left to go shopping, and that the windows are closed and latched at night.  I work in a smart building, and I like the feeling that we're being energy-smart and that the water won't somehow be left running in some unattended sink without the custodian eventually being notified.  These are good ideas.

A smart world with smart homes, smart offices, smart highways (if you've followed my blog, you would know that I have a bit of thing about smart cars... a worry that smart highways can address), smart cities, smart grid.  These are the technologies of the future.  And the Azure IoT Hub strikes me as the ideal way to start.  Which is why, in the spring 2019 offering of cloud computing, we've spent quite so much time on this model.  My slides are online, by the way, if you want to see what all this translates to in practice.

Sunday, 20 January 2019

Derecho status update

As we swing into 2019 mode, I wanted to share a quick Derecho status report and point to some goals for the coming months.

First, our ACM Transactions on Computer Sysms paper will be appearing sometime soon, which should give nice visibility for the work, and also the validation that comes from a tough peer-to-peer reviewing process.  The paper has hugely improved through the pushback our reviews provided, so it was a challenge but, I think, worth it.  The system itself works really well!

Next, we are starting to focus on a stronger integration with Azure IoT, where Derecho could be used either as a tool for creating new micro-services with strong fault tolerance and consistency guarantees, or as an ultra fast RDMA-capable object store.  Microsoft has been supportive of this and Derecho should be available from their third party portal, still as a free and open source technology.

But that portal isn’t available yet.  So right now, use Derecho via the  v0.9 release of the system, which will be available by February 1 (we are saving v1.0 for later in the year, after we have a reasonable amount of end user experience).  As of today, we still have one or two bugs we want to fix before doing that release.

Some key points:

  • We are urging people to use the Ubuntu Linux version, because this interoperates between normal Linux environments and Azure (the Microsoft cloud).  On our release site (download here), you can find the source code but also some VMs (a container and then a true VM) with the library preinstalled.  But in fact Derecho should work on any Linux-compatible system.
  • Right now, Derecho has only been tested in a single cluster, cloud (Azure, including Azure IoT, AWS, etc).  We have some limited experience with virtualization, and with ROCE as opposed to pure Infiniband.
  • The easiest path to using Derecho is via the new key-value store.  In this both keys and values can be any serializable object type you like, and we offer a wide range of features: Put and Get, but also a conditional put, which checks that the version you are writing was based on the most current version of the underlying object (useful for atomic replace, like in Zookeeper), plus a watch operation that works just like a topic based pub-sub or DDS.  Objects can be stateless, stateful but not versioned, or versioned and persistent with strong consistency and extremely accurate temporal indexing.  On this we will eventually support pub-sub (think of Kafka or OpenSplice), file systems (HDFS, Ceph), and maybe even a genuine Zookeeper look-alike.  The only caution is that the watch feature isn’t designed to support huge numbers of watched topics.  So if you would have more than 50 or 100 active topics, consider using Dr. Multicast to squeeze that set down.
  • The full system can only be used directly from our templates library API in C++, but you can easily build a wired-down library with no templated methods and then load it from Java or Python or whatever.
  • Runs on RDMA, OMNIPath, and even on normal TCP with no special hardware help at all.  You just configure it via a configuration file, to tell the system how to set itself up.  We use LibFabrics for this mapping to the underlying hardware.
  • Right now, all the Derecho group members need to be on hardware with identical endian and byte alignment policies, but clients not in the group can use RESTful RPC, the OMG DDS stack, WCF or JNI to issue RPCs to Derecho group members, which can then relay the request as appropriate.  
Later this year we will extend the system in various ways.  The API should stay stable, but the new  features would include:
  • Hierarchically structured WAN layer that does read-only content mirroring for the object store.
  • A form of ultra fast and scalable LAN and WAN BlockChain support.
  • Machine checked correctness proofs, and a reference-version of the core Derecho protocols both in a high level form, and as proved and then re-extracted from those proofs in C or C++.
  • External client access to our API via RDMA, supporting point-to-point send and query.
  • Integration with Matt Milano’s mobile code language, MixT, allowing a client to send code to data residing in the object store.

Monday, 7 January 2019

Leave no trace behind: A practical model for IoT privacy?

IoT confronts us with a seeming paradox.

There is overwhelming evidence that machine learning requires big data, specialized hardware accelerators and substantial amounts of computing resources, hence must occur on the cloud.

The eyes and ears of IoT, in contrast, are lightweight power-limited sensors that would generally have primitive computing capabilities, mostly “dedicated” storage capacity (for storing acquired images or other captured data), and limited programmability. These devices have bandwidth adequate to upload metadata, such as thumbnails, and then can upload selected larger data objects, but they can’t transmit everything.  And the actuators of the IoT world are equally limited: controllers for TVs and stereo receivers, curtains that offer robot controls, and similar simple, narrowly targeted robotic functionality.

It follows that IoT necessarily will be a cloud “play.”  Certainly, we will see some form of nearby point-of-presence in the home or office, handling local tasks with good real-time guarantees and shielding the cloud from mundane workloads.  But complex tasks will occur on the cloud, because no other model makes sense.

And here is the puzzle: notwithstanding these realities,  IoT systems will collect data of incredible sensitivity!  In aggregate, they will watch us every second of every day.  There can be no privacy in a smart world equipped with pervasive sensing capabilities.  How then can we avoid creating a dystopian future, a kind of technological Big Brother that watches continuously, knows every secret, and can impose any draconian policy that might be in the interests of the owners and operators of the infrastructure?

Indeed, the issue goes further: won’t society reject this degree of intrusiveness?  In China, we already can see how dangerous IoT is becoming.  Conversely, in Europe, privacy constraints are already very strong, and some countries, like Israel, even include a right to privacy in its constitution.  If we want IoT to boom, we had better not focus on an IoT model that would be illegal in those markets, and would play into China’s most repressive instincts!   

IoT is the most promising candidate for the next wave of technology disruption.  But for this disruption to occur, and for it to enable the next wave of innovation and commerce, we need to protect the nascent concept against the risk posed by this seemingly inherent need to overshare with the cloud.

But there may be an answer.  Think about the rule for camping: pack it in, then pack it out, leaving no trace behind.  Could we extend the cloud to support a no-trace-left behind computing model?

What I have in mind is this.  Our device, perhaps a smart microphone like Alexa, Siri, or Cortana hears a command but needs cloud help to understand it.  Perhaps the command is uttered in a heavy accent, or makes reference to the speaker’s past history, or has a big data dimension.  These are typical of cases where big data and hardware accelerators and all that cloud technology make a huge difference.

So we ship the information up to Google, Microsoft, Amazon.  And here is the crux of my small proposal: suppose that this provider made a binding contractual commitment to retain no trace and to use every available technical trick to prevent attackers from sneaking in and stealing the sensitive data.  

Today, many cloud operators do the opposite.  But I’m proposing that the cloud operator forgo all that information, give up on the data-sales opportunities, and commit to perform the requested task in a secured zone (a secure “enclave” in security terminology).  

Could this be done, technically?  To me it seems obvious that the problem isn’t even very hard!

The home device can use firewalls, securely register and bind to its sensors, and send data over a secured protocol like https.   Perfect?  No.  But https really is very secure.  

In the cloud, the vendor would need to avoid cohosting the computation on nodes that could possibly also host adversarial code, which avoids the issue of leakage p such as with “meltdown.”  It would have to monitor for intrusions, and for insider “spies” trying to corrupt the platform.  It would need to scrub the execution environment before and after the task, making a serious effort to not leave traces of your question.  

The vendor would have to carry this even further, since a machine learning tool that  can answer a question like “does this rash look like it needs a doctor to evaluate it?” might need to consult with a number of specialized microservices.  Those could be written by third parties hoping to sell data to insurance companies.  We wouldn’t want any of them retaining data or leaking it.  Same for apps that might run in the home.  

But there is a popular “stateless” model for cloud computing that can solve this problem.  We want those microservices walled off and by locking them into a stateless model (think of a firewall that blocks attempts to send data out), and only allowing them to talk to other stateless microservices, it can be done. A serious attempt to monitor behavior would be needed too: those third party apps will cheat if they can.

Today, many cloud companies are dependent on capturing private data and selling it.  But I don’t see why other companies, not addicted to being evil, couldn’t offer this model.  Microsoft has made very public commitments to be being a trusted, privacy-preserving, cloud partner.  What I’ve described would be right up their alley!  And if Azure jumped in with such a model, how long would it be before everyone else rushes to catch up?

To me this is the key:  IoT needs privacy, yet by its nature, a smart world will be an interconnected, cloud style environment, with many tasks occurring in massive data centers.  The cloud, up to now, has evolved to capture every wisp of personal information it can, and doing so made some people very wealthy, and enabled China to take steps breathtaking in their intrusiveness.  But there is no reason that the future IoT cloud needs to operate that way.  A “leave no trace model”, even if supported only by one big provider like Microsoft, could be the catalyst we’ve all been waiting for.  And just think how hard it will be for companies (or countries) locked into spying and reporting everything, to compete with that new model.

Let’s learn to pack it in...  and then to pack up the leftovers and clear them out.  The time is ripe for this, the technology is feasible, and the competition will be left reeling!

Wednesday, 12 December 2018

The debate about causal emergence helps explain a tension between distributed systems theory and practice.

There is an old joke that goes like this:  A tourist gets lost and then sees a farmer, so he stops to ask directions.  The farmer hems and haws and finally says that "son, I'm sorry, but you just can't get to there from here.  You may just have to go somewhere else and then try again."

It turns out that there is a situation where this kind of advice might actually make a great deal of sense.  A little while back, I had an opportunity to learn about “causal emergence” during a few hours spent with Erik Hoel, a Tufts University professor who is a leading proponent of the concept, at an undergraduate-organized "research and society" symposium at Princeton (quite a nice event).

Suppose that you were provided with a great model describing the quantum behavior of oxygen and hydrogen atoms.  In a triumph of scientific computing, you use the model to predict that they combine to form molecules of H2O and even to make new discoveries about how those molecules behave, and how to relate their behavior to their underlying quantum nature.

But can you extrapolate to predict the behavior of a cup of tea, or the steam rising from it?  A cup of tea is a very complex thing: a simulation would need to deal with all the interactions between molecules (to say nothing of your half-dissolved teaspoon of sugar and cloud of milk).   There is no way you could do it: the emergent structure can't easily be deduced even with an understanding of the underlying system.

Erik and his colleagues are actually focusing on human consciousness, and developing a hypothesis that we won't be able to understand human thought purely in terms of the underlying neural wiring of the brain, or the chemical and electrical signals it uses.  They treat the problem as a type of coding question, and argue that the fine-grained details are like noise that can drown out the signal of interest to us, so that no matter how much we learn about the brain, we might still be unable to understand thought.

This got the audience very engaged at the Princeton event: they seemed to really like the  the idea that human intellect might somehow be inaccessible to science, or at least to "reductionist" science.  Erik, though, mentioned that he doesn't always get a positive reception: there is a scientific community that absolutely hates this work!  As he explains it, first, they tend to point to the Greek philosophers and note that Plato and Aristotle came up with this a long time ago.  Next, they point out that in computing we have all sorts of impossibility and undecideability results, and that even a basic complexity analysis can lead to similar conclusions.  Beyond this, there is a question of whether the concept of layering is even well posed: it is easy to say "I know a cup of tea when I see one", but what, precisely, constitutes a cup?  Philosophers adore questions such as this.  But.... let's not go there!

Is causal emergence just much fuss about nothing?  Not necessarily: there is an aspect of this causal emergence debate that fascinates me.  As most people who read this blog would know, distributed systems tend to be built using one of three core concepts -- everything else just puts these together as building blocks:
  1. We use fault-tolerant consensus to implement consistency (the use cases are very broad and include transactions, state machine replication, leader election, primary-backup coordination, locking, system configuration, barrier synchronization, Zookeeper...).  Even our more complex models, such as Byzantine Agreement and BlockChain, really come down to consensus with a particularly severe fault model.  
  2. We divide to conquer, mostly using key-value sharding.  A consensus mechanism can be used to track the configuration of the sharded layer, so the shards themselves are freed to use simpler, cheaper mechanisms:  In effect they depend on the consensus layer, but don't need to implement it themselves. 
  3. We turn to convergent stochastic mechanisms in situations where a state-machine style of step-by-step behavior isn't applicable (like for the TCP sliding window, or a gossip protocol for tracking membership or loads, or a multi-tier caching policy).
So if you accept this very simplified taxonomy, what jumps out is that in effect, variations on these three basic kinds of building blocks can be used as "generators" for most of modern distributed computing.  But are there behaviors that these three building blocks can't enable? What building blocks would be needed to cover "everything"?   I think the causal emergence model sheds some light by suggesting that in fact, there may be a new kind of impossibility argument that would lead us to conclude that this question might not have an answer!

But we've always suspected that.  For example, one category of behaviors we often worry about in distributed settings are instabilities.  I've often written about broadcast storms and data-center wide oscillatory phenomena: these arise when a system somehow manages to have a self-reinforcing load that surges like a wave until it overwhelms various components, triggering episodes of data loss, waves of error-recovery messages, and eventually a total meltdown.  We obviously don't really want to see those kinds of things, so designers try to bullet-proof their systems using mechanisms dampen transients.    Could stability be in this class of properties that are "hidden" in the low-level details, like Erik's causal emergence scenario?

Then there is a second and more subtle concern.  Think about a wave in the ocean that gradually builds up energy in a region experiencing a major storm, but then propagates for thousands of miles under fair skies: some of the immense energy of the storm was transferred to the nascent wave, which then transports that energy over vast distances.  Here we have an emergent structure that literally moves, in the sense that the underlying components of the water that it perturbs change as time elapses. The fascination here is that the emergent structure is actually a wave of energy.  So we observe the physical wave, and yet we aren't really seeing the energy wave -- we are seeing a phenomenon caused by the energy wave, yet somewhat indirect from it.  Similarly, when a data center becomes destabilized, we are often confronted with massive numbers of error messages and component failures, and yet might not have direct visibility into the true "root cause" that underlies them.  Causal emergence might suggest that this is inevitable, and that sometimes, the nature of an instability might not be explicable even with complete low-level traces.

This idea that some questions might not lend themselves to formal answers can frustrate people who are overly fond of reductionist styles of science, in which we reduce each thing to a more basic thing.  That energy wave can't be directly observed, and in fact if you look closely at the water, it just bobs up and down.  The water isn't moving sideways, no matter how the wave might look to an observer.

This same puzzle arises when we teach students about the behavior of electric power grids: we are all familiar with outlets that deliver A/C power, and even children can draw the corresponding sine wave.  Yet many people don't realize that the power signal has an imaginary aspect too, called the reactive component of power.  This reactive dimension actually emerges from a phenomenon analogous to that water bobbing up and down, and to fully describe it, we model the state of a power line as a signal that "spirals" around the time axis, with a real part and an imaginary part.  The familiar A/C signal is just the projection of that complex signal onto the real axis, but the reactive part is just as real -- or just as unreal, since this is simply a descriptive model.  The physical system is the conductive wire, the electrons within it (they move back and forth, but just a tiny amount), and the power signal, which is a lot like that wave in the water, although moving a lot faster.

In effect, electricity is an emergent property of electric systems.  Electricity itself doesn't have an imaginary dimension, but it is very convenient to model an A/C electric circuit as if it does.

Viewed this way, causal emergence shouldn't elicit much debate at all: it is just a pretext for pointing out that whereas the physical world is shaped by physical phenomena, we often perceive it through higher-level,  simplified models.  Viewed at the proper resolution, and over the proper time scale, these models can be incredibly effective: think of Newtonian mechanics, or hydraulics, or the electric power equations.

And yet as a person who builds large distributed systems, I find that people often forget these basic insights.  For me, and for any distributed systems builder, it can be frustrating to talk with colleagues who have deep understanding of the theories covering small distributed services, but never actually implement software  There is a kind of unwarranted hubris that theoreticians sometimes slip into: a belief that their theories are somehow more valid and complete than the real system.

In fact, any builder will tell you that real systems are often far more complex than any theory can model.  Those old farmers would understand.  Causal emergence potentially offers a rigorous way to back up such a claim.

The usual theoretical riposte is to say "show me the detailed model, and express your goal as an abstract problem, and I will solve it optimally."  But not everything that occurs at large scale can be expressed or explained using lower-level models. And this is a deep truth that our community really needs to internalize.  If the did, it would (I think) lead to a greater appreciation for the inherent value of high quality engineering, and very detailed experiments.  Sometimes, only engineering experience and careful study of real systems under real loads suffices.

Tuesday, 27 November 2018

So what's the story about "disaggregated" cloud computing?

The hardware community has suddenly gone wild over a new idea they call "disaggregated" cloud computing infrastructures.  I needed to talk about this in my graduate class, so I've been reading some of the papers and asking around.

If you come at the topic from distributed/cloud computing, as I do, you may have been using disaggregation to refer to a much older trend, namely the "decoupling" of control software from the hardware that performs various data and compute-intensive tasks that would otherwise burden a general purpose processor.  And this is definitely a legitimate use of the term, but as it turns out, the architecture folks are focused on a different scenario: for them, disaggregation relates to a new way of building your data center compute nodes in which you group elements with similar functionalities, using some form of micro-controller with minimal capabilities to manage components that handle storage, memory, FPGA devices, GPU, TPU, etc.  Recent work on quantum computing uses a disaggregated model too: the quantum device is like an experimental apparatus that a control program carefully configures into a specific state, then "runs", and then collects output from.

The argument is that disaggregation reduces wasted power by supporting a cleaner form of multitenancy.  For example, consider DRAM.  The “d” stands for dynamic, meaning that power is expended to keep every bit continuously refreshed.  This is a costly, heat-intensive activity, and yet your DRAM may not even be fully used (studies suggest that 40-50% would be pretty good).  If all the DRAM was in some shared rack, bin-packing could easily give you >95% utilization on the active modules, plus you could power down any unused modules.

So disaggregation as a term really can refer to pretty much any situation in which some resources are controlled from programs running "elsewhere", and where you end up with a distributed application that conceptually is a single thing, but has multiple distinct moving parts, with different subsystems specialized for different roles and often using specialized hardware as part of those roles.

At this point I'm seeing the idea as a spectrum, so perhaps it would make sense to start with the older style of control-plane/data-plane separation, say a few words about the value of such a model, and then we can look more closely at this extreme form and ask how it fits into the picture.

Control/data disaggregation is really pretty common idea and dates back quite far: even the earliest IBM mainframes had dedicated I/O coprocessors to operate disks and tapes, and talked about "I/O channels".   I could easily list five or ten other examples that have popped up during the past decade or two: very few devices lack some form of general purpose on-board computer, and we can view any such system as a form of disaggregated hardware. But the one that interests me most is the connection to Mach, at the stage when the project was publishing very actively at SOSP and was run from CMU, with various people heading it: Rick Rashid (who ultimately left to launch Microsoft Research), then Brian Bershad, and then Tom Anderson.

Mach was an amazing system, but what I find fascinating is that it happens to also have anticipated many aspects of this new push into disaggregation.  As you perhaps will recall (from all those Mach papers you've read...) the basic idea was to have a single micro-kernel that would host various O/S presentation frameworks: a few versions of Linux, perhaps versions of legacy IBM operating systems, etc.  Internally, one of those frameworks consisted of a set of processes sharing DLLs, some acting as servers and others as applications.  Memory was organized into segments with read/write/execute permissions, and all the underlying interactions centered on an ultra-fast form of RPC in which a single CPU core could run an application thread in segment A, then briefly suspend that thread and activate a thread context in segment B (perhaps A was an application and B is the file server, for example), run in B for a short period, then return to A.  The puzzle was to handle permissions (the file server can see the raw disk, but A shouldn't be allowed to, and conversely, the file server can DMA from a specific set of memory areas in A, but shouldn't be able to scan A's memory for passwords).  Then because all of this yielded a really huge address space, Mach ran into page table fragmentation issues that were ultimately solved with a novel software inverted page table.  And capabilities were used for the low-level message passing primitives.  Really cool stuff!

As you might expect, all of these elements had costs: the messaging layer, the segmentation model, context switching, etc.  Mach innovated by optimizing those steps, but keep these kinds of costs in mind, because disaggregation will turn out to run into similar issues.

Which brings us to today!  The modern version of the story was triggered by the emergence of RDMA as an option in general purpose data centers.  As you'll know if you've followed these postings, I'm an RDMA nut, but it isn't just me.  RDMA is one of those rare cases where a technology became very mature in a different setting (namely high-performance computing, where it serves as the main communications backbone for packages like the MPI library, and runs on InfiniBand networks), and yet somehow went mostly unnoticed by the general computing community.... until recently, when people suddenly "discovered" it and went bonkers.

As I've discussed previously, one reason that RDMA matured on Infiniband and yet wasn't used on Ethernet centered on RDMA's seeming need for a special style of credit-based I/O model, in which a sender NIC would only transfer data if the receiver had space waiting to receive the data.   It is only recently that the emergence of RoCE allowed RDMA to jump to more standard datacenter environments.  Today, we do have RDMA in a relatively stable form within a set of racks sharing a TOR switch, assuming that the TOR switch has full bisection bandwidth.  But RDMA isn't yet equally mature across routers, particularly in a COS/Spine model where the core network would often be oversubscribed and hence at risk of congestion.  There are several competing schemes for deploying RDMA over RoCE v2 with PPF at data-center-wide scale (DCQCN and TIMELY), but work is still underway to understand how those can be combined with Diffsrv or Enterprise VLAN routing) to avoid unwanted interactions between standard TCP/IP and the RDMA layer.  It may be a few years before we see "mature" datacenter-scale deployments in which RDMA runs side-by-side with TCP/IP in a non-disruptive way over RoCE v2 (in fact, by then we may be talking about RoCE v5)!  Until then, RDMA remains a bit delicate, and must be used with great care.

So why is RDMA even relevant to disaggregation?  There is a positive aspect to the answer, but it comes with a cautionary sidebar. The big win is that RDMA has a simple API visible directly to the end-user program (a model called "qpairs" that focuses on "I/O verbs".  These are lock-free, and with the most current hardware, permit reliable DMA data transfers from machine to machine at speeds of up to 200Gbps, and latencies as low as .75us.  The technology comes in several forms, all rather similar: Mellanox leads on the pure RDMA model (the company has been acquired and will soon be renamed Xylinx, by the way), while Intel is pushing a variant they call OMNI-Path, and other companies have other innovations.   These latencies are 10x to 100x better than what you see with TCP, and the data rates are easily 4x better than the fastest existing datacenter TCP solutions.  As RDMA pushes up the performance curve to 400Gbps and beyond, the gaps will get even larger.  So RDMA is the coming thing, even it hasn't quite become commonplace today.

RDMA is also reliable in its point-to-point configuration, enabling it to replace TCP in many applications, and also offers a way to support relatively low latency remote memory access with cache-line atomicity.  Now, it should probably be underscored that with latencies of .75us at the low end to numbers more like 3us for more common devices, RDMA offers the most NUMA of NUMA memory models, but even so, the technology is very fast compared to anything we had previously.

The cautions are the obvious ones.  First, RDMA on RoCE is really guaranteed to offer those numbers only if the endpoints live under a TOR switch that can support full bidirectional communication.  Beyond that, we need to traverse a router that might experience congestion, and might even drop packets.  In these cases, RDMA is on slightly weaker ground.

A second caution is simply that applications can't really handle long memory-access delays, so this NUMA thing becomes a conceptual barrier to casual styles of programming.   You absolutely can't view RDMA data-center memory as a gigantic form of local memory.  On the contrary, the developer needs to be acutely aware of potential delays, and hence will need to have full control over memory layout. Today, we lack really easily-used tools for that form of control, so massive RDMA applications are the domain of specialists who have a great deal of arcane hardware insight.

So how does this story relate to disaggregation?  A first comment is that RDMA is just one of several new hardware options that force us to take control logic out of the main data path.  There are a few other examples: with the next generation of NVRAM storage (like Intel's Optane), you can DMA directly from a video camera into a persistent storage card, and with RDMA, you'll be able to do so even if the video isn't on the same machine as the place you plan to store the data.  With FPGA accelerators, we often pursue bump-in-the-wire placements: data streams into a memory unit associated to the FPGA at wire rates, it does something, and then we might not even need to touch the main high-volume stream from general purpose code, at all.  So the RDMA story is pointing to a broader lack of abstractions covering a variety of data-path scenarios.  Moreover, our second caution expands to a broader statement: these different RDMA-like technologies may each bring different constraints and different best-case use scenarios.

Meanwhile, the disaggregation folks are taking the story even further: they start by arguing that we can reduce the power footprint of rack-scale computing systems if we build processors with minimal per-core memory and treat that memory more like a cache than as a full-scale memory.  Because memory runs quite hot but the heat dissipation is a function of the memory size, this can slash the power need and heat generation.  Then if the processor didn't really need much memory, we are golden, and if it did, we can start to think about ways to intelligently manage the on-board cache, paging data in and out from remote memories hosted in racks dedicated to memory resources and shared by the processor racks (yes, this raises all sorts of issues of protection, but that's the idea).  We end up with racks of processors, racks of memory units, racks of storage, racks of FPGA, etc.  Each rack will achieve a much higher density of the corresponding devices, which brings some efficiencies too, and we also potentially have multi-tenancy benefits if some applications are memory-intensive but others need relatively little DRAM, some use FPGA but some don't, etc.

The main issue -- the obvious one -- is latency.  To appreciate the point, consider standard NUMA machines.  The bottom line, as you'll instantly discover should you ever run into it, is that on a NUMA machine, DRAM is split into a few modules, each supporting a couple of local cores, but interconnected by a bus.  The issue is that although a NUMA machine is capable of offering transparent memory sharing across DRAM modules, the costs are wildly different when a thread accesses the closest DRAM than when it reaches across the bus to access a "remote" DRAM module (in quotes because this is a DRAM module on the same motherboard).  At the best, you'll see a 2x delay: your local DRAM might run with 65ns memory fetch speeds, but a "remote" access could cost 125ns.  These delays will balloon if there are also cache-coherency or locking delays: in such cases the NUMA hardware enforces the consistency or locking model, but it runs backplane protocols to do so, and those are costly.  NUMA latencies cause havoc.

Not long ago, researchers at MIT wrote a whole series of papers on optimizing Linux to run at a reasonable speed on NUMA platforms.

The cross-product of RDMA with NUMA-ness makes life even stranger.  Even when we consider a single server, an RDMA transfer in or out will run in slow motion if the RDMA NIC happens to be far from the DRAM module the transfer touches. You may even get timeouts, and I wonder if there aren't even cases where it would be faster to just copy the data to the local DRAM first, and then use RDMA purely from and to the memory unit closest to the NIC!

This is why those of us who teach courses related to modern NUMA and OS architectures are forced to stress that writing shared memory multicore code can be a terrible mistake from a performance point of view.  At best, you need to take full control: you'll have to pin your threads to a set of cores to some single DRAM unit (this helps: you'll end up with faster code.  But... it will be highly architecture-specific and on some other NUMA system, it might perform poorly).

In fact NUMA is probably best viewed as a hardware feature ideal for virtualized systems with standard single-threaded programs (and this is true both for genuine virtualization and for containers). Non-specialists should just avoid the model!

So here we have this new disaggregation trend, pushing for the most extreme form of NUMA-ness imaginable.  With a standard Intel server, local DRAM will be accessible with a latency of about 50-75ns.  Non-local jumps to 125ns, but RDMA could easily exceed 750ns, and 1500ns might not be unusual even for a nearby node, accessible with just one TOR switch hop.  (Optical networking could help, driving this down to perhaps 100ns, but that is still far in the future and involves many assumptions.)

There may be an answer, but it isn't obvious people will like it: we will need a new programming style that systematically separates data flows from control, so that this new style of out of band coding would be easier to pull off.  Why a new programming style?   Well, today's prevailing style of code centers on a program that "owns" the data and holds it in locally malloc-ed memory: I might write code to read frames of video, then ask my GPU to segment the data -- and the data transits through my DRAM (in fact the GPU would often use my DRAM as its memory source).  Similarly for FPGA.  In a disaggregated environment, we would give the GPU or the FPGA memory of its own, then wire it directly to the data source, and control all of this from an out of band control program running on a completely different processor.  The needed abstractions for that style of computing are simply lacking in most of today's programming languages and systems.  You do see elements of the story -- for example, Tensor Flow can be told to run portions of its graph on a designated TPU cluster -- but I would argue that we haven't yet found the right OS and programming language coding styles to make this easy at data-center scale.

Worse, we need to focus on applications that would be inherently latency-tolerant: latency barriers will translate to a slowdown for the end user.  To avoid that, we need a way of coding that promotes asynchronous flows and doesn't require a lot of hopping around, RPCs or locking.  And I'll simply say that we lack that style of coding now -- we haven't really needed it, and the work on this model hasn't really taken off.  So this is a solvable problem, for sure, but it isn't there yet.

Beyond all this, we have the issues that Mach ran into.  If we disaggregate, all of those same questions will arise: RPC on the critical path, context switch delays, mismatch of the hardware MMU models with the new virtualized disaggregated process model, permissions, you name it.  Today's hardware is nothing like the hardware when Mach was created, and I bet there are all sorts of cool new options.  But it won't be trivial to figure them all out.

I have colleagues who work in this area, and I think the early signs are very exciting.  It looks entirely plausible that the answers are going to be there, perhaps in five years and certainly in ten.  The only issue is that the industry seems to think disaggregation is already happening, and yet those answers simply aren't in hand today.

It seems to me that we also risk an outcome where researchers solve the problem, and yet the intended end-users just don't like the solution.  The other side of the Tensor Flow story seems to center on the popularity of just using languages everyone already knows, like Python or Java, as the basis for whatever it is we plan to do.  But Python has no real concept of resource locations: Tensor Flow takes Python and bolts on the concept of execution in some specific place, but here we are talking about flows, execution contexts, memory management, you name it.  So you run some risk that smart young researchers will demonstrate amazing potential, but that the resulting model won't work for legacy applications (unless a miracle occurs and someone finds a way to start with a popular language like Tensor Flow and "compile" it to the needed abstractions).  Can such a thing be done?  Perhaps so!  But when?  That strikes me a puzzle.

So I'll stop here, but I will say that disaggregation seems like a cool and really happening opportunity: definitely one of the next big deals.   Researchers are wise to jump in.  But people headed into the industry, though, who are likely to be technology users rather than inventors, need to appreciate that it is still too early to speculate about what might be successful, and when, and definitely too early for them to get excited: the bottom line message in the lecture I just gave on this topic!

Friday, 16 November 2018

Forgotten lessons and their relevance to the cloud

While giving a lecture in my graduate course on the modern cloud, and the introduction of sensors and hardware accelerators into machine-learning platforms, I had a sudden sense of deja-vu.

In today's cloud computing systems, there is a tremendous arms-race underway to deploy hardware as a way to compute more cost-effectively, more data faster, or simply to offload very repetitious tasks into specialized subsystems that are highly optimized for those tasks.  My course covers quite a bit of this work: we look at RDMA, new memory options, FPGA, GPU and TPU clusters, challenges of dealing with NUMA architectures and their costly memory coherence models, and similar topics.  The focus is nominally on the software gluing all of this together ("the future cloud operating system") but honestly, since we don't really know what that will be, the class is more of a survey of the current landscape with a broad agenda of applying it to emerging IoT uses that bring new demands into the cloud edge.

So why would this give me a sense of deja-vu?  Well, grant me a moment for a second tangent and then I'll link my two lines of thought into a single point.  Perhaps you recall the early days of client-server computing, or the early web.  Both technologies took off explosively, only to suddenly sag a few years later as a broader wave of adoption revealed deficiencies.

If you take client-server as your main example, we had an early period of disruptive change that you can literally track to a specific set of papers: Hank Levy's first papers on the Vax Cluster architecture when he was still working with DEC.  (It probably didn't hurt that Hank was the main author: in systems, very few people are as good as Hank at writing papers on topics of that kind.)  And in a few tens of pages, Hank upended the mainframe mindset and introduced us to this other vision: clustered computing systems in which lots of components somehow collaborate to perform scalable tasks, and it was a revelation.  Meanwhile, mechanisms like RPC were suddenly becoming common (CORBA was in its early stages), so all of this was accessible.  For people accustomed to file transfer models and batch computing, it was a glimpse of the promised land.

But the pathway to the promised land turned out to be kind of challenging.  DEC, the early beneficiary of this excitement, got overwhelmed and sort of bogged down: rather than being a hardware company selling infinite numbers of VAX clusters (which would have made them the first global titan of the industry), they somehow got dragged further and further into a morass of unworkable software that needed a total rethinking.  Hank's papers were crystal clear and brilliant, but a true client-server infrastructure needed 1000x more software components, and not everyone can function at the level Hank's paper more or less set as the bar.  So, much of the DEC infrastructure was incomplete and buggy, and for developers, this translated to a frustrating experience: a fast on-ramp followed by a very bumpy, erratic experience.  The ultimate customers felt burned and many abandoned DEC for Sun Microsystems, where Bill Joy managed to put together a client-server "V2" that was somewhat more coherent and complete.  Finally, Microsoft swept in and did a really professional job, but by then DEC had vanished entirely, and Sun was struggling with its own issues of overreach.

I could repeat this story using examples from the web, but you can see where I'm going: early technologies, especially disruptive, revolutionary ones, often take on a frenetic life of their own that can get far ahead of the real technical needs.  The vendor then becomes completely overwhelmed and unless they somehow can paper over the issues, collapses.

Back in that period, a wonderful little book came out on this: Crossing the Chasm, by Geoffrey Moore.  It talked about how technologies often have a bumpy adoption curve.  Moore talks about an adoption curve over time.  The first bump is associated with the early adopters (the kind of people who live to be the first to use a new technology, back before it even becomes stable).  But conservative organizations prefer to be "first to be last" as David Bakken says.  They hold back waiting for the technology to mature and hoping that they can avoid the pain but also not miss the actual surge of mainstream adoption.  Meanwhile, the pool of early adopters dries up and some of them wander off for the next even newer thing, so you see the adoption curve sag, perhaps for years.  Wired writes articles about the "failure of client server" (well, back then it would have been ComputerWorld).

Finally, for the lucky few, the really sustainable successes, you see a second surge in adoption and this one would typically play out over a much longer period, without sagging in that same way, or at least not for many years.  So we see a kind of S-curve, but with a bump in the early period.

All of which leads me back to today's cloud and this craze for new accelerators.  When you consider any one of them, you quickly discover that they are extremely hard devices to program. FPGA pools in Microsoft's setting, for example, are clearly going be expert-only technologies (I'm thinking about the series of papers associated with Catapult).  It is easy to see why a specialized cloud micro-service might benefit, particularly because the FPGA performance to power-cost ratio is quite attractive.  Just the same, though, creation of an FPGA is really an experts-only undertaking.  Anyhow, a broken FPGA could be quite disruptive to the data center.  So we may see use of these pools by subsystems doing things like feature ranking for Bing search, crypto for the Microsoft Azure VPC, or data compression and other similar tasks in Cosmos.  But I don't imagine that my students here at Cornell will be creating new services with new FPGA accelerators anytime soon.

GPU continues to be a domain where CUDA programming dominates every other option.  This is awesome for the world's CUDA specialists, and because they are good at packaging their solutions in libraries we can call from the attached general purpose machine, we end up with great specialized accelerators for graphics, vision, and similar tasks.  In my class we actually do read about a generalized tool for leveraging GPU: a language invented at MSR called Dandelion. The real programming language was easy: C# with LINQ, about as popular a technology as you could name.  Then they mapped the LINQ queries to GPU, if available.  I loved that idea... but Dandelion work stalled several years ago without really taking off in a big way.

TPU is perhaps easier to use: With Google's Tensor Flow, the compiler does the hard work (like with Dandelion), but the language is just Python.  To identify the objects a TPU could compute on, the whole model focuses on creating functions that have vectors or matrices or higher-dimensional tensors as their values.  This really works well and is very popular, particular on a NUMA machine with an attached TPU accelerator, particularly for Google's heavy lifting in their ML subsystems.  But it is hard to see Tensor Flow as a general purpose language or even as a general purpose technology.

And the same goes with research in my own area.  When I look at Derecho or Microsoft's FaRM or other RDMA technology, I find it hard not to recognize that we are creating specialist solutions, using RDMA in sophisticated ways, and supporting extensible models that are probably best viewed as forms of PaaS infrastructures even if you tend to treat them as libraries.  They are sensational tools for what they do.  But they aren't "general purpose".  (For distributed computing, general purpose might lead you to an RPC package like the OMG's IDL-based solutions, or to REST, or perhaps to Microsoft's WCF).

So where does this leave us?  Many people who look at the modern cloud are predicting that the cloud operating system will need to change in dramatic ways.  But if you believe that difficulty of use and fragility and lack of tools makes the accelerators "off limits" except for a handful of specialists, and that the pre-built PaaS services will ultimately dominate, than what's wrong with today's micro-service models?  As I see it, not much: they are well-supported, scale nicely (although some of the function-server solutions really need to work on their startup delays!), and there are more and more recipes to guide new users from problem statement to a workable, scalable, high performance solution.  These recipes often talk to pre-built microservices and sure, those use hardware accelerators, but the real user is shielded from their complexity.  And this is a good thing, because otherwise, we would be facing a new instance of that same client-server issue.

Looking at this as a research area, we can reach some conclusions about how one should approach research on the modern cloud infrastructure.

A first observation is that the cloud has evolved into a world of specialized elastic micro-services and that the older style of "rent a pile of Linux machines and customize them" is fading slowly into the background.  This makes a lot of sense, because it isn't easy to end up with a robust, elastic solution.  Using a pre-designed and highly optimized microservice benefits everyone: the cloud vendor gets better performance from the data center and better multi-tenancy behavior, and the user doesn't have to reinvent these very suble mechanisms.

A second is that specialized acceleration solutions will probably live mostly within the specialized microservices that they were created to support.  Sure, Azure will support pools of FPGAs.  But those will exist mostly to speed up things like Cosmos or Bing, simply because using them is extremely complex, and misusing them can disrupt the entire cloud fabric.  This can also make up for the dreadful state of the supporting tools for most if not all cloud-scale elastic mechanisms.  Like early client-server computing, home-brew use of technologies like DHTs, FPGA and GPU and TPU accelerators, RDMA, Optane memory -- none of that makes a lot of sense right now.  You could perhaps pull it off, but more likely, the larger market will reject such things... except when they result in ultra-fast, ultra-cheap micro-services that they can treat as black boxes.

A third observation is that as researchers, if we hope to be impactful, we shouldn't fight this wave.  Take my own work on Derecho.  Understanding that Derecho will be used mostly to build new microservices helps me understand how to shape the APIs to look natural to the people who would be likely to use it.  Understanding that those microservices might be used mostly from Azure's function server or Amazon's AWS Lambda, tells me what a typical critical path would look like, and can focus me on ensuring that this particular path is very well-supported, leverages RDMA at every hop if RDMA is available, lets me add auto-configuration logic to Derecho based on the environment one would find at runtime, etc.

We should also be looking at the next generation of applications and by doing so, should try to understand and abstract by recognizing their needs and likely data access and computational patterns.  On this, I'll point to work like the new paper on Ray from OSDI: a specialized microservice for a style of computing common in gradient-descent model training, or Tensor Flow: ultimately, a specialized microservice for leveraging TPUs, or Spark: a specialized microservice to improve the scheduling and caching of Hadoop jobs.  Each technology is exquisitely matched to context, and none can simply be yanked out and used elsewhere.  For example, you would be unwise to try and build a new Paxos service using Tensor Flow: it might work, but it wouldn't make a ton of sense.  You might manage to publish a paper, but it is hard to imagine such a thing having broad impact.  Spark is just not an edge caching solution: it really makes sense only in the big-data repositories where the DataBricks product line lives.  And so forth.