Saturday 15 July 2017

How far could a secure Internet get us?

There is a standard refrain among those who teach and do research on security: it holds that the fundamental flaw in security originates with an early decision by the community that created the Internet.  In that early design, Internet routing occurs by a form of peering that can often occur anonymously.  Endpoint systems are not authenticated, and packets generally aren't encrypted or authenticated either.  As a consequence, packets are routed permissively even if they contain what are almost certainly false origin addresses, or have content that seems dubious in obvious ways (like a TCP connection packet that makes no sense on this particular link).

This story then holds that the original sin of the Internet design community has given us layer upon layer of infrastructure that cannot be secured because it resides on a flawed and easily compromised foundation.

As it turns out, there is little doubt that we actually can build an Internet that corrects that deep flaw, and that can be relatively compatible with the existing Internet.  Enterprises like military systems and security-conscious companies do it all the time.   Let me sketch out how this is done (it isn't rocket science, and you won't find many surprises).  But then let's ask if it would really be a game-changer for the security of higher level applications.

To appreciate the core challenge, it helps to start by understanding that everything you think you know about the network is really just an abstraction layered over something more basic.  For example, even my advanced students tend to think of Internet addresses as identifiers, as if my computer's true name were 128.84.16.001 and your computer's true name was 167.88.194.023.  In fact, these addresses are better understood as routing data: useful for deciding what the next hop should be as a packet progresses hop-by-hop through the network, but lacking any deep connection to identity. 

In fact, a single computer might have many network addresses (one for each of its network interfaces).  It might host multiple virtual machines, and those could each have virtual network addresses of their own: with virtualization, any computer can function as a kind of Internet and host large numbers of computers that run within it, complete with virtual networks, virtual name spaces, services of various kinds, you name it.  A single computer might move about, so that over short periods of times it takes on a sequence of addresses: the old addresses can cease to work, and traffic needs to be redirected to the new ones.  Addresses can be mapped by network address translators.

Once we free ourselves from this false equivalence of network address to identity, you need to ask what the real root of identity should be, in a modern system.  Here, the proper focus is on hardware security tokens combined with human users who authenticate themselves in the usual ways.  The hardware could be a built-in component of the computer, or it could be some form of plug-in.  But the key point is that when we associate authentication with these unforgeable hardware elements, used in tandem, we arrive at a much stronger notion of endpoint identity.  The deep roots of that form of identity reside in key pairs: the identity defines some form of private key, with which information can be authenticated by public components that only have access to the corresponding public key.

This then is our bootstrap opportunity: we live in a vast world of devices: computers, routers, IoT components like video cameras and smart televisions and smart cars, and each element can be understood as either being anonymous (if it lacks the ability to authenticate itself), or capable of proving that it has "authorized access" to some private key.  With that proof, we can consult a vendor-published registry and from that registry, can learn about this endpoint device.  A device can also be registered in secondary registries: when I bring a new router into my smart home, I could register my router as "Ken's router, at his summer cottage on Cayuga Lake".  And now there would be a trustworthy chain of reasoning that would let you convince yourself that certain messages were indeed sent by, or countersigned by, my router.

Sounds familiar?  It should, if you've ever brought your laptop from home to work.  Some companies won't allow you to connect the laptop at all (fearing viruses that your machine might carry), but those that do usually require precisely this sort of authentication and registration.

Given strong authentication, a second opportunity arises whenever we focus on an island of infrastructure that has a coherent owner.  Here on Cayuga Lake, my network provider is part of a broader system owned and controlled by a company that controls a fairly large regional network.  This ISP is paid for its services, and at least in theory, has complete control of every device that can connect directly to it, every router and switch it operates internally, and every network link used to glue them all together.  One can understand the ISP as a kind of military hierarchy: at the core we find a "master and commander" who has absolute control and is the sole source of permissions.  Below the master and commander are a collection of subordinates, to whom restrictive roles have been granted, and way down at the bottom are the common seaman who might permit packets originating in my router to sail the seas of this regional network -- or could block them.

As it happens, today's ISPs are somewhat relaxed about authentication, so one opportunity to secure the network would start by simply enforcing authentication when devices are connected to an ISP.  Today when I buy a new iPhone, I can simply start to use it in my home.  If I was trying to build a much more secure network, at a minimum my router might pop up a screen requiring me to fill in details: who owns this iPhone (and it might want proof: type in your password, your RSA code number, and hold your thumb on the fingerprint reader...)  Perhaps, that regional ISP would do the same and require a second level of device registration.  Military and other nationally critical infrastructure networks work precisely in this way: if you were to take a laptop into a major power plant and connect it to the Ethernet jack in the waiting room while waiting for your appointment with human resources, either it won't connect, or it will give you some form of very limited guest connectivity. 

Think back to the last time you took your laptop to work.  I bet that something along the lines just described happened then, too.  Any organization that takes security at all seriously is careful to track the devices connected to it, and to limit their networking "power".  A military system won't allow you to connect your own machine, at all.  But if you work at Cornell, like I do, you might be able to get permission -- except that your machine will end up registered for use in a specific context, such as from my office in the Computer Science building.  If I were to carry it across the street and connect it to a wall jack in the ECE department, I would be back to square zero.

With enclaves that use authentication, one can take further steps.  Without too much added cost, packets can be cryptographically signed or fully encrypted at the endpoint, yielding various forms of virtual private networks: subnetworks within which communication is safe, but with strong protection against traffic from the outside leaking in, or against intruders managing to read or tamper with data.  Such systems can also filter or block traffic that might try to escape the secure perimeter.

I worked at Microsoft for a few months in 2016, and they adopted this approach.  I could only connect to their system via a secured VPN, and the security perimeter it enforced when I worked from home was very carefully controlled and monitored.  I could continue projects from work while at home, but I could never have wandered the virtual Microsoft network with impunity from outside the office.  In my office, I had somewhat more relaxed access to internal software tools and projects.

This, then, gives some sense of what a secure Internet would look like.  But how secure can such a system really be?

As we push from the lowest layers of hardware and software up to higher levels of abstraction, the numbers of elements of a modern system increase exponentially.  There are dozens of operating systems and each has hundreds of variants and patch levels.  So the very first layer already has a level of diversity measureable in the thousands.  Then there are hundreds of programming tools and languages and services that can run on them, configurable in hundreds of ways, to say nothing of all the management options.  Layer by layer, we bring in a surreal degree of diversity simply by virtue of the choices made by each system designer and vendor.

In settings like military systems, or power grids, a major effort is invested to keep control over the forms of diversity that are actually present in the deployed system.   Individual users aren't permitted to install their own applications, patches are applied in a coordinated way, and monitoring tools are used to notice unexpected behavior that could point to an intrusion.  In contrast, networks used in other settings need to deal with enormous levels of diversity and individual customization.  Like it or not, the network provider simply won't know what the network is being used to do.

It seems to me that this diversity accounts for the insecurity of modern systems, to a far greater degree than the "original sin" of endpoint anonymity and unauthenticated peering.  While the insecurity of the network certainly makes it easier for attackers to mount denial of service attacks or to route spam emails with falsified origin data, those are just the a small aspect of the larger problem.  And that larger problem centers on the exceptionally large attack surface that modern systems offer: hundreds of millions if not billions of lines of code, riddled with bugs, and configured in every imaginable way (just yesterday I was horrified to discover that my home router has an administrative login and password, both set to the defaults.  Now I'm wondering about my television, and my internet box...). 

Fixing the original sin of the Internet genuinely would help in some ways, especially as we move to deploy an Internet of Things with more and more devices playing a diversity of roles in homes, cars, highways and other settings.  We should take that step.  But it is an illusion to imagine that true software security can be achieved by hardening the network itself, because the extreme diversity of uses would overwhelm any systematic attempt to impose security standards and ensure that they are respected.  Attempting to do so would probably create some very frustrated users, and yet would at best raise the bar slightly for one facet of a complex, multi-faceted problem.

More interesting, I think, is to think about diversity as a constructive tool for protecting large, complex systems.  Fred Schneider and I once coauthored an editorial on this topic, and a short paper, and I continue to believe that this was one of our better efforts.  Synthetic diversity, in particular, is a remarkable tool for combatting the wily attacker, who often has a surprisingly limited arsenal of off the shelf exploits and might be blocked by even small "surprises".

The basic idea is simple: just as a compiler takes source code and then can create a variety of executables (depending on the target instruction set, operating system, optimization level selected, etc), we can also "compile" programs to expose various forms of artificial diversity.  We can force the memory layout to look somewhat random (simply by adding random padding to objects allocated dynamically).  We can vary the stack layout and the order in which inputs are delivered if a program receives data from multiple sources.  We can potentially compile one program several ways, and pick the version that will be running today at random within the resulting set.  We can randomize the numbering used for operating systems calls.

Such steps diversity and obfuscate the attackable surface.  The attacker who was using an exploit that overruns an input buffer in a way that triggers a system call to download and then execute a bot will now run into an unpredictable memory location, a buffer that might not overflow, and in any case the system call to read from the network socket might have been remapped to some normally-unassigned code.  These are steps that really can pay off.

VPN security offers opportunities too.  Obviously, surfing the web requires that there be no barriers.  But for an application like a medical system that manages patient records or interacts with a lab, there is no real reason to also be able to surface the web, and there is no reason that random web programs should be able to communicate to that application, either.  VPNs can offer this sort of protection, and if we could deploy them more easily, they could wrap sensitive code in layers of armor.

So count me on the side of those who believe that Internet security can and should be a priority, particularly in critical infrastructure and Internet of Things scenarios.   It really is about time to make Internet authentication a readily available and widely standard function, and to deploy VPN technologies much more actively.  But doing so won't solve the whole problem.  We should also make more of an effort to synthetically diversify the systems we deploy widely, so that the attacker will encounter a bewildering variety of "versions" of any particular software.   

If you can't build a single impregnable castle, the next best thing is to populate a huge city with every imaginable variation on the castle theme.  Put police checkpoints on the roads leading to the city water pumping and power generating systems.  Post signs in a few languages, including some you made up just for this purpose.   Good luck to that hacker: he might break into one system, but doing so won't get him very far before we root him out...

No comments:

Post a Comment

This blog is inactive as of early in 2020. Comments have been disabled, and will be rejected as spam.

Note: only a member of this blog may post a comment.