Tuesday, 6 December 2016

RDMA[5]: Where does Derecho fit in this story?


“Blown away, blown away, blown away, blown away…” (Carrie Underwood)

RDMA is going to blow away conventional datacenter communications.  The speed and latency benefits are enormous. At Cornell we think the biggest wins are in situations where you want lots of copies of some sort of large data object, or where many nodes needs to be informed of an event.
But it is very hard to program against the core RDMA API, which can leave you feeling like you are building a new device driver even for the smallest operations.  And the most popular tool, MPI, is very wedded to its HPC setting (MPI expects to run on bare metal, can’t handle virtualization, is gang-scheduled with all the application nodes launched at once and one designated as a head node that controls the worker nodes in a step by step way.  Further, MPI isn’t fault-tolerant or able to offer dynamic forms of consistency for replicated data). 
This is the kind of problem we view as a research opportunity here at Cornell.
Accordingly, we’ve been building a new C++ software library for RDMA programming, called Derecho (we named it after a kind of horizontal tornado: a super-fast straight-line windstorm).  Derecho is easy to code against, runs on Linux over a variety of RDMA-capable hardware (we’ve worked mostly with Mellanox but also done some experiments on QLogic networks).  It can run over a software emulation of RDMA called SoftRoCE, which is currently much slower (down the road, some think it will evolve to be the fastest non-hardware option).  SoftRoCE is useful because it can be used where no RDMA hardware is available. 

Layered over this, Derecho brings strong consistency, fault-tolerance, and many other benefits. Unlike MPI, Derecho doesn’t require identical endpoints, doesn’t limit itself to a gang scheduled model, does support fault-tolerance and strong consistency at scale, and is much more flexible about what an application might be doing.  Read the papers on http://www.cs.cornell.edu/ken to learn more, and download Derecho from https://githubcom/Derecho-Project.
Derecho isn’t quite finished yet: After completing a version for just a single group at a time that we called Derecho v1, we started work on Derecho v2, which will be our first release for a wide community.  We don’t think Derecho v1 is really ready for prime time, although we do share it with some friends.  Derecho v2 is just finished in a coding sense and we are debugging and testing now (there is also a plan for a Derecho v3, but it is a small delta relative to v2 and anyhow, we want v2 to be stable before we tackle it).
Once we have a stable release of Derecho v2, we recommend just using our C++ programming API and not dealing with the hardware directly!  

No comments:

Post a Comment