I've been fascinated by
a puzzle that will probably play out over several years. It involves a
deep transformation of the cloud computing marketplace, centered on a choice.
In one case, IoT infrastructures will be built the way we currently build
web services that do things like intelligent recommendations or ad placements.
In the other, edge IoT will require a "new" way of developing
solutions that centers on creating new and specialized services... ones that
embody real-time logic for making decisions or even learning in real-time.
I'm going to make a case
for bespoke, handbuilt, services: the second scenario. But if I’m right,
there is hard work to be done and whoever starts first will gain a major
advantage.
So to set the stage, let
me outline the way IoT applications work today in the cloud. We have
devices deployed in some enterprise setting, perhaps a factory, or an apartment
complex, or an office building. These might be quite dumb, but they are
still network enabled: they could be things like temperature and humidity
sensors, motion detectors, microphones or cameras, etc. Because many are
dumb, even the smart ones (like cameras and videos with built-in autofocus,
deblurring, depth perception) are treated in a sort of rigid manner: the basic
model is of a device with a limited API that can be configured, and perhaps can
be patched if the firmware has issues, but then generates simple events with
meta-data that describes what happens.
In a posting a few weeks
ago, I noted that unmanaged IoT deployments are terrifying for system
administrators, so the world is rapidly shifting towards migrating IoT device
management into systems like Azure's infrastructure for Office 365.
Basically, if my company already uses Office for other workplace tasks, it
makes sense to also manage these useful (but potentially dangerous) devices
through the same system.
Azure's IoT Hub handles
that managerial role: secure connectivity to the sensors, patches guaranteed to
be pushed as soon as feasible... and in the limit, maybe nothing else. But why
stop there? My point a few weeks back was simply that even just managing
enterprise IoT will leave Azure in a position of managing immense numbers of
devices -- and hence, in a position to leverage the devices by bringing new
value to the table.
Next observation: this
will be an "app" market, not a "platform" market. In
this blog I don't often draw on marketing studies and the like, but for the
particular case, it makes sense to point to market studies that explain my
thinking (look at Lecture 28 in my CS5412 cloud computing class to
see charts from the studies I drew on).
Cloud computing, perhaps
far more than most areas of systems, is shaped by the way cloud customers
actually want to use the infrastructure. In contrast, an area like
databases or big data is about how people want to use the data, which shapes
access patterns. But they aren't trying to explicitly route their data
through FPGA devices that will transform it in some way, or doing computations
that can't keep up unless they run in GPU clusters. So, because my kind
of cloud customers migrate to the clouds that make it easier to build their
applications, they will favor the cloud that has the best support for IoT apps.
A platform story
basically offers minimal functionality, like bare metal running Linux, and
leaves the developers to do the rest. They are welcome to connect to
services but not required to do so. Sometimes this is called the hybrid
cloud.
Now, what's an
app? As I'm using the term, you would want to visualize the iPhone or
Android app store: small programs that share many common infrastructure components
(the GUI framework, the storage framework, the motion sensor and touch sensors,
etc), and then that connect to their bigger cloud-hosted servers over a Web
Services layer that tends to match nicely with the old Apache-dominated cloud
for doing highly concurrent construction of web pages. So this is the
intuition.
For IoT, though, an app
model wouldn't work in the same way -- in fact, it can't work in the
same way. First, IoT devices that want help from intelligent
machine-learning will often need support from something that learns in
real-time. In contrast, today's web architecture is all about learning
yesterday and then serving up read-only data at ultra-fast rates from scalable
caching layers that could easily be stale if the data was actually changing
rapidly. So suddenly we will need to do machine learning, decision making
and classification, and a host of other performance-intensive tasks at the
edge, under time pressure, and with data changing quite rapidly. Just
think of a service that guides a drone surveying a farming area that wants to
optimize its search strategy to "sail on the wind" and you'll be
thinking about the right issues.
Will the market want
platforms, or apps? I think the market data strongly suggests that apps
are winning. Their relatively turnkey development advantages outweigh the
limitations of programming in a somewhat constrained way. If you do look
at the slides from my course, you can see how this trend is playing out.
The big money is in apps.
And now we get to my
real puzzle. If I'm going to be creating intelligent infrastructure for
these rather limited IoT devices (limited by power, and by compute cycles, and
by bandwidth), where should the intelligence live? Not on the
devices: we just bolted them down to a point where they probably wouldn't have
the capacity. Anyhow, they lack the big picture: if 10 drones are flying
around, the cloud can build a wind map for the whole farm. But any single
drone wouldn't have enough context to create that situational picture, or to
optimize the flight plan properly. There is even a famous theoretical
result on the "cost of anarchy", showing that you don't get the
global optimum if you have a lot of autonomous agents making individually
optimal choices. No, you want the intelligence to reside in the cloud.
But where?
Today, machine
intelligence lives at the back, but the delays are too large. We can’t
control today’s drones with yesterday’s wind patterns. We need
intelligence right at the edge!
Azure and AWS both
access their IoT devices through a function layer ("lambdas" in the
case of AWS). This is an elastic service that hosts containers, launching
as many instances of your program as needed on the basis of events.
Functions of this kind are genuine programs and can do anything they need to
do, but they run what is called a "stateless" mode, meaning that they
flash into existence (or are even warm-started ahead of time, so that when the
event arrives, the delay is minimal). Then they handle the event, but they
can't save any permanent data locally, even though the container does have a
small file system that works perfectly well: as soon as the event handling
ends, the container will garbage collect itself and that local file system will
evaporate.
So, the intelligence and
knowledge and learning has to occur in a bank of servers. One scenario,
call it the PaaS mode, would be that Amazon and Microsoft pre-build a set of
very general purpose AI/ML services, and we code all our solutions by
parameterizing those and mapping everything into them. So here you have
AI-as-a-service. Seems like a guaranteed $B startup concept! But
very honestly, I'm not seeing how it can work. The machine learning you
would do to learn wind patterns and direct drones to sail on the wind is just
too different from what you need to recognize wheat blight, or to figure out
what insect is eating the corn.
The other scenario is
the "bespoke" one. My Derecho library could be useful
here. With a bespoke service, you take some tools like Derecho and build
a little cluster-hosted service of your very own, which you then tell the cloud
to host on your behalf. Then your functions or lambdas can talk to your
services, so that if an IoT event requires a decision, the path from device to
intelligence is just milliseconds. With consistent data replication, we
can even eliminate stale data issues: these services would learn as they go (or
at least, they could), and then use their most recent models to handle each new
stage of decision-making.
But without far better
tools, it will be quite annoying to create these bespoke services, and this, I
think, is the big risk to the current IoT edge opportunity: do Microsoft and
Amazon actually understand this need, and will they enlarge the coverage of
VSCode or Visual Studio or in Amazon's case, Cloud9, to "automate" as
many aspects of service creation as possible, while still leaving flexibility
for the machine learning developer to introduce the wide range of
customizations that her service might require?
What are these
automation opportunities? Some are pretty basic (but that doesn't mean
they are easy to do by hand)! To actually launch a service on a cloud,
there needs to be a control file created, typically in a JSON format, with
various fields taking on the requisite values. Often, these include
magically generated 60-hexidecimal-digit keys or other kinds of unintuitive
content. When you use these tools to create other kinds of cloud
solutions, they automate those steps. By hand, I promise that you’ll
spend an afternoon and feel pretty annoyed by the waste of your time. A
good hour will be lost on those stupid registry keys alone.
Interface definitions
are a need too. If we want functions and lambdas talking to our new
bespoke micro-services ("micro" to underscore that these aren't the
big vendor-supplied ones, like CosmosDB), the new micro-service needs to export
an interface that the lambda or function can call at runtime. Again, help
needed!
In fact the list is
surprisingly long, even though the items on it are (objectively) trivial.
The real point isn’t that these are hard to do, but rather that they are arcane
and require looking for the proper documentation, following some sort of magic
incantation, figuring out where to install the script or file, testing your
edited version of the example they give, etc. Here are a few examples:
- Launch service
- Authenticate if needed
- Register micro/service to accept RPCs
- There should be an easy way to create functions able to call the service, using those RPC APIs
- We need an efficient upload path for image objects
- There will need to be tools for garbage collection (and tools to track space use)
- … and tools for managing the collection of configuration parameter files and settings for an entire application
- .… and lifecycle tools, for pushing patches and configuration changes in a clean way.
Then there are some more
substantial needs:
- Code debugging support for issues missed in development and then arising at runtime
- Performance monitoring, hotspot visualization and performance optimization (or even, performance debugging) tools
- Ways to enable a trusted micro-service to make use of hardware accelerators like RDMA or FGPA even if the end user might not be trusted to safely to so (many accelerators save money and improve performance but are just not suitable for direct access by hordes of developers with limited skill sets. Some could destabilize the data center or crash nodes, and some might have security vulnerabilities.
This makes for a long
list, but in my view, a strong development team at Amazon or Microsoft, perhaps
allied with a strong research group to tackle the open ended tasks, could
certainly succeed. Success would open the door to mature intelligent edge
IoT. Lacking such tools, though, it is hard not to see edge IoT as being
pretty immature today: huge promise, but more substance is needed.
My bet? Well,
companies like Microsoft need periodic challenges to set in front of their
research teams. I remember that when I visited MSR Cambridge back in
2016, everyone was asking what they should be doing as researchers to enable
the next steps for the product teams... the capacity is there. And those
market slides I mentioned make it clear: The edge is a huge potential
market. So I think the pieces are in place, and that we should jump on
the IoT edge bandwagon (in some cases, “yet again”). This time, it may
really happen!
Super ideas, Professor. While I was reading the first sentence of the 13 paragraph, I was stuck with a question of why cannot we use tools such as Derecho. Then you exactly mentioned that in the immediate paragraphs.. Super exciting..
ReplyDelete