Mesos, Omega, Borg: A Survey

menage · on June 1, 2015

One important point that the author seems to have misunderstood is that Borg was the predecessor to the other two systems, not the successor. Borg went into production (running a bunch of websearch dedicated clusters) in late 2004, long before Mesos or Omega were around. Omega is/was an experimental replacement for Borg that was started much later, although I'm not sure how much production load it actually took over.

umbrant · on June 1, 2015

Author here. I ordered them based on time of publication, and evaluated them based on the contents of the paper. My summary wasn't meant to be a substitute for actually reading the papers either, the Borg paper states that it started out as a centralized scheduler and has evolved over time, and also that it's been in production for over a decade. Clearly, it predates Mesos and Omega.

I've heard varied things about the use of Omega in production. The Borg paper mentions that it runs 98% of machines at Google, but that number is apparently dated. One person said that Omega runs all the batch work, and is being rolled out further. However, I've also heard it's being phased out.

menage · on June 1, 2015

We did try to write a paper on Borg way back in ~2008 but it got bogged down by internal disagreements on the style and approach ...

thrownaway2424 · on June 1, 2015

I initially thought the same thing because the author commits this error in the opening act. But later he seems to understand that Borg absorbed the good ideas of Omega, and that Borg is the one that exists in practice.

mckoss · on June 1, 2015

See also Google's blog post summarizing Borg -> Kubernetes improvements.

http://blog.kubernetes.io/2015/04/borg-predecessor-to-kubern...

KaiserPro · on June 1, 2015

Its interesting to see how other industries tackle the same problem.

VFX has essentially the same problem to google: a huge bunch of tasks that need to perform all at once.

However VFX only tend to have one data center, so they don;t need or want clustered scheduler.

https://github.com/mikrosimage/openrendermanagement, Alfred and tractor from pixar, and framestore's FQ (which is faster and more efficient than Borg at job dispatch. ) Are a few good example of task management.

presspot · on June 1, 2015

I know a lot about Mesos and Mesosphere's DCOS, so can comment on those:

* There are users of these systems that get 90+% cluster utilization.

* Pre-emptable tasks (e.g., best effort scheduling vs guaranteed SLA scheduling) will be landing in Mesos.

* Mesosphere is building advanced scheduling plug-ins that will use the new scheduling models to do oversubscription of a cluster, helping to drive utilization to the 90%+ range without the need for any special tooling. You can get an idea of some of the algorithms being employed by checking out the Kozyrakis/Delimitrou Quasar paper[1].

[1] http://csl.stanford.edu/~christos/publications/2014.quasar.a...

jefe78 · on June 1, 2015

Is anyone using these at scale but with a small team to support it? We have a 5-6k fleet of servers across 3 DCs + another 1.5k in AWS. I tried deploying Mesos with mixed results. I also experimented with CoreOS. Considering re-exploring XEN/VMWare.

sysk · on June 1, 2015

I'm not a sysadmin but recently started using CoreOS to deploy small web apps. Could anyone explain to me like I'm 5 what's the difference between those cluster schedulers and something like CoreOS' fleet (https://github.com/coreos/fleet)?

robszumski · on June 1, 2015

The fleet maintainers have taken a hard stance on keeping fleet simple with "just enough" features. This line has been worked out to be resource scheduling -- fleet will stick to unit colocation, simple fan out via conflicts, machine metadata and pinning to a specific machine id. If you need to do hardcore bin packing, using one of the other schedulers is going to help you with that goal. Fleet is a great tool for bootstrapping those schedulers. For example, in Tectonic, fleet is used to run the Kubernetes control plane and related services.

gtirloni · on June 1, 2015

I think they are trying to achieve the same thing, with differences in API and richness of each ecosytem.

http://www.slideshare.net/teemow1/container-orchestration

https://groups.google.com/forum/#!msg/coreos-dev/nHK8irdnmM0...

nrr · on June 1, 2015

Actually, having spoken with one of the CoreOS guys recently about this, it seems that their concerns are a bit lower-level. Where these resource managers actually do concern themselves with the problem space of resource management as well as orchestration, Fleet is taking the position of a "distributed systemd" in a way without much else in terms of provided porcelain.

For those of us who are old school HPC people, it's probably more reasonable to think of Fleet as a dynamic always-running manifestation of xCAT or, perhaps, Fabric (of fabfile.py fame) or similar. For example, I've heard of people installing and running Mesos with it, which might seem like a bit of cluster scheduler self-satisfaction at first until one understands the reasoning.

ianburrell · on June 1, 2015

Fleet is only for running services. It is effectively a distributed systemd.

Mesos can schedule resources for multiple different frameworks. It can run services with Marathon or Aurora frameworks. But it can also run cron jobs and batch pipelines with Chronos or Aurora. It can also schedule the distributed tasks for Hadoop or Spark.