Mesos, Marathon, and Docker

How Yodel solved its deployment problems using Mesos and Marathon with Docker


Goals

Technologies used

Some of the technology used at yodel

Service Discovery

Container

Docker

Scheduler/keepalive

Virtual layer

Mesos

  1. Master slave topology
  2. Clustering for realms
    • PROD mesos cluster
    • QA mesos clsuter
    • Staging mesos cluster
  3. Contains zookeeper

CI

Atlassian Bamboo - CI

Monitoring

New Relics

Deploy scripts

In house microservice called “Cerebro” - microservice in docker container deployed by marathon in mesos

Configuration

docker file and puppet

Logs

syslogs (or other sensible log solution) - trade off no console log

Scheduling

MPI and Hadoop schedulers

Problems faced and solutions

Problem: Bad Actors

Some processes cause starvation

Solution: dynamic scaling/customizable services

Scaling

Scales from 3 to 3000 dyanmically, through an UI for example

  1. A process is not allowed to consume more than the resource CPU/Memory that is allocated
  2. No starvation

Resource constraints

  1. CPU (fall back to swap if set up) meso will kill and marathon will take over and startup
  2. Memory constraint is soft - if process is the only running on host it can consume all memory

Problem: explicit service discovery

Solution: implicit service discovery

look up port 8080 and it distributes to the slaves that are service-instance-1 Listening on 31003 service-instance-2 Lisetning on 31090

Problem: All or nothing deployment is a problem

If there are problems in production you have to roll back

Solution: Canary Deployments

Canary Isolated - Step your toe into PROD and see if it’s working - deploy to PROD tests are passing

Canary Partial - Leg into PROD and see if it’s working - decree 10% of the traffic, if error rollback

Full Production - deploy to PROD with confidence

Configuration management

Configuration management for infrastructure and application

Infrastructure level

puppet for configuration to manage (ex: a system admin) mesos master and slave for infrastructure

Application level

developer configure their own docker file (ex: a java developer)

Clean up

  1. 3 newers versions brought up, mesos will tear down the old versions and taken out of rotation once marathon verifies the new versions are up and running
  2. deployment finished the ‘cerebro’ deamon stops watching it, new relic take over

Deployed outside of mesos clusters

  1. databases are mapped outside of mesos
  2. Legacy services ex: using curator over zookeeper for thrift/legacy stack

Alternatives to Mesos

CoreOS Fleet

Developer environment

Q & A

  1. Why Qubit Bamboo and HA Proxy?

HA Proxy for canary

Alternatives to Qubit Bamboo

  1. Production topology
    • 4 Mesos slaves and 4 mesos masters
    • 4 machines
    • 20 services 3 instances each
    • 60 docker containers
  2. Chronos
    • being evaluated for scheduling to replace cron
    • leader election is a horrible problem with shell script
  3. Right now only long-running services
    • Short running batch processes shouldn’t conflict (hadoop) Constraint is resource CPU/memory
  4. Reliable non-flaky deployments
    • “Cerebro” - in house runs as a docker container through Mesos
    • micro service architecture - shared state is not a thing that happens as each service has its own database
  5. Case study: Production process was falling over (memory issue)
    • Used canary isolated to continuously deploy and rollback

Architecture

Architecture 1 Architecture 2 Architecture 3 Architecture 4