A Few Thoughts on Distributed Computing: Dercho discussion thread

Tuesday, 28 November 2017

Dercho discussion thread

As requested by Scott... Scott, how about reposting your questions here?
===

Scott Lewis28 November 2017 at 16:04

Hi Ken.

Thanks very much for the info. I wasn't familiar with LibFabrics. I think since I don't have access to RDMA hardware right now I will wait for the derecho over libfabrics. How can I know when it is available?

In the mean time a question: you mentioned in one of your blog posts that it would be difficult to use other languages than c++ for using the derecho replicated object API. Could you comment on why that would be?...e.g. relative to python or java say.

To share one thought: the OSGi (java) concept of a service...or rather a remote service...is conceptually similar. OSGi services are plain 'ol object instances that are managed by a local broker (aka service registry) and accessed by interfaces.

One thing that makes OSGi services different from java objects is support for dynamics...i.e. OSGi services can come and go at runtime

Which brings me to another question: Is it possible for replicated object instances to be created and destroyed at runtime within a derecho process group?

BTW, if there is some derecho mailing list that is more appropriate than your blog for such questions please just send me there.

Scott

25 comments:

Ken Birman28 November 2017 at 20:05
Java/Python/Rust/... : These languages all do memory management, so any data sent via RDMA would have to be copied to a pinned memory page first. Copying is 3x slower than 100Gb RDMA, so copying twice (sender... RDMA... receiver) is a path at least 7x slower than just using RDMA directly from memory, as we can do from C++. In practice it would be more like 25x because of the need to create new objects on the receiver side.

A further issue is that we use C++ generics (templates) for our Derecho API. The Java type system isn't the same as the C++ type system, so there is no simple way to call a C++ generic from Java. Python doesn't even have static types...
ReplyDelete
Replies
Ken Birman28 November 2017 at 20:06
OSGi: These client/server boundaries are expensive! LibFabrics links directly to C++, and just maps to the cheapest option available: RDMA if we have the hardware, TCP if not...
ReplyDelete
Replies
Ken Birman28 November 2017 at 20:07
Dynamic object creation: Yes, by changing the group "view" and having the mapping function do a new membership assignment at that point. You can't do it without a view change. But view changes are fast (150ms).
ReplyDelete
Replies
Ken Birman29 November 2017 at 04:57
Availability of the LibFabrics version: Weijia Song is currently wrapping up his experiments to understand exactly how the library works, how MVAPICH (MPI) uses it and is able to configure itself semi-automatically, etc. Then the plan is to port FFFS, which Weijia wrote and can be used standalone (it runs in our GridCloud platform, and v2 might use Derecho, but FFFSv1 is pretty stable and has an RDMA option). If that goes easily, we will port Derecho next. I’m guessing early February?

The project is open and we would welcome contributors but honestly, there are just three key developers who really know their stuff and all of them have priorities of various kinds tied to their research and career goals. So we do plan to do this soon, but it may not occur tomorrow.
ReplyDelete
Replies
Anand Shah1 December 2017 at 15:53
I have the hardware... like the switches and the Chelsio cards and also some mellanox cards... all of which support RDMA (Ethernet) not infiniband... and am willing to help (not a high end programmer, but can provide quality assurance, testing, test lab, etc.,

How can i help and whom can i get in touch with... ?

https://twitter.com/apscomp
ReplyDelete
Replies

Add comment

This blog is inactive as of early in 2020. Comments have been disabled, and will be rejected as spam.

Note: only a member of this blog may post a comment.