ElastiCluster

aims to provide a user-friendly command line tool to create, manage and setup computing clusters hosted on cloud infrastructures like Amazon's Elastic Compute Cloud EC2, Google Compute Engine, or a private OpenStack cloud. Its main goal is to get your compute cluster up and running with just a few commands.

Read the Documentation Install ElastiCluster

How it works

The architecture of ElastiCluster is quite simple: a configuration file defines a set of cluster configurations and information on how to access a specific cloud webservice.

Using the command line (or, very soon, a simple API), you can start a cluster. ElastiCluster will connect to the desired cloud, start the virtual machines and wait until they are accessible via SSH.

After all the virtual machines are up and running, ElastiCluster will use Ansible to configure them.

Features

ElastiCluster provides automated setup of:

HPC batch-queuing clusters running SLURM, Grid Engine, or TORQUE+MAUI;
Spark / Hadoop clusters with HDFS and Hive/SQL;
distributed storage clusters using GlusterFS, Ceph, or OrangeFS
Useful add-on tools like Ganglia for monitoring or Jupyter/IPython for teaching or interactive programming use.
... or anything that you can install with an Ansible playbook!

ElastiCluster is in active development, and offers the following features at the moment:

Simple configuration file to define cluster templates.
Grow and shrink a running cluster.
Start and manage multiple independent clusters at the same time.

Demo Video

Sample Usage

The following sample shows how a cluster is configured with ElastiCluster and the basic commands to interact with ElastiCluster through the command line interface. For a full description of the configuration and command line interface see the documentation. It's also possible to bundle all the examples below in a single configuration file.

(You can see more configuration examples in the examples/ directory of the source tree.)

This example shows how to set up a SLURM batch-queuing cluster with 4 compute nodes and a single front-end node (also acting as NFS server for shared home directories). More details on the SLURM configuration can be found here.

[cluster/slurm]
cloud=openstack
login=ubuntu
setup_provider=slurm
security_group=default
# Ubuntu image
image_id=16618a82-92fd-4615-86e6-d354f9f66af5
flavor=4cpu-16ram-hpc
frontend_nodes=1
compute_nodes=4
ssh_to=frontend

[cloud/openstack]
provider=openstack
auth_url=http://openstack.example.org:5000/v2.0
username=****REPLACE WITH YOUR USERNAME****
password=****REPLACE WITH YOUR PASSWORD****
project_name=****REPLACE WITH YOUR PROJECT/TENANT NAME****

[login/ubuntu]
image_user=ubuntu
image_user_sudo=root
image_sudo=True
user_key_name=elasticluster
user_key_private=~/.ssh/id_rsa
user_key_public=~/.ssh/id_rsa.pub

[setup/slurm]
provider=ansible
frontend_groups=slurm_master
compute_groups=slurm_worker

Start slurm cluster with the name `mycluster`


                            $ elasticluster start slurm -n mycluster

List nodes


                            $ elasticluster list-nodes mycluster

Grow cluster by 10 compute nodes


                            $ elasticluster resize mycluster -a 10:compute

SSH into frontend node


                            $ elasticluster ssh mycluster

SFTP shell to front-end node


                            $ elasticluster sftp mycluster

Destroy cluster


                            $ elasticluster stop mycluster

This example shows how to set up a Hadoop 2.x cluster, complete with HDFS. Each of the 8 "worker" nodes are both HDFS data nodes and YARN execution nodes. The single "master" node acts as YARN resource manager and HDFS name node.

Spark (together with Python and R support) is installed on top of Hadoop YARN; as soon as the cluster is installed, it is possible to log in to the "master" node and start submitting Spark or Map/Reduce jobs.

More details on the Hadoop/Spark configuration can be found here.

[cluster/hadoop]
cloud=amazon-us-east-1
login=ubuntu
setup=hadoop
security_group=all_tcp_ports
image_id=ami-00000048
flavor=m1.small
master_nodes=1
worker_nodes=8
ssh_to=hadoop-name

[cloud/amazon-us-east-1]
provider=ec2_boto
ec2_url=https://ec2.us-east-1.amazonaws.com
ec2_access_key=****REPLACE WITH YOUR ACCESS ID****
ec2_secret_key=****REPLACE WITH YOUR SECRET KEY****
ec2_region=us-east-1

[login/ubuntu]
image_user=ubuntu

[setup/hadoop]
provider=ansible
master_groups=hadoop_master
worker_groups=hadoop_worker

Start a Hadoop+Spark cluster with the name `cluster1`


                            $ elasticluster start hadoop -n cluster1

Start another cluster just for the fun of it


                            $ elasticluster start hadoop -n cluster2

List all clusters


                            $ elasticluster list

List nodes


                            $ elasticluster list-nodes cluster1

Grow cluster by 10 worker nodes


                            $ elasticluster resize cluster1 -a 10:worker

SSH into master node


                            $ elasticluster ssh cluster1

SFTP shell to master node


                            $ elasticluster sftp cluster1

Destroy both clusters


                            $ elasticluster stop cluster1 cluster2

This example shows how to set up a single-node JupyterHub server: ElastiCluster will provision a virtual machine, install Jupyter/IPython and configure the JupyterHub server to run as a service over the default HTTPS website. (By default, a self-signed TLS/SSL certificate is used for HTTPS.)

[cluster/jupyterhub]
cloud=google
login=google
setup=jupyterhub
security_group=tcp_port_443
image_id=debian-8-jessie-v20170124
flavor=n1-standard-1
server_nodes=1
ssh_to=server
image_userdata=

[cloud/google]
provider=google
gce_project_id=****REPLACE WITH YOUR PROJECT ID****
gce_client_id=****REPLACE WITH YOUR CLIENT ID****
gce_client_secret=****REPLACE WITH YOUR SECRET KEY****

[login/google]
image_user=****REPLACE WITH YOUR GOOGLE ID****
image_user_sudo=root
image_sudo=True
user_key_name=elasticluster
user_key_private=~/.ssh/id_rsa
user_key_public=~/.ssh/id_rsa.pub

[setup/jupyterhub]
provider=ansible
server_groups=jupyterhub

Start JupyterHub server


                            $ elasticluster start jupyterhub

SSH into server


                            $ elasticluster ssh jupyterhub

Open JupyterHub login page

Open https://server.vm/ in your browser. Accept the self-signed TLS/SSL certificate.

Destroy server


                            $ elasticluster stop jupyterhub

Interested? Find out more!

Complete documentation for ElastiCluster (including installation instructions) is available at http://elasticluster.readthedocs.io/

General discussion over ElastiCluster's usage, features, and bugs takes place on the elasticluster@googlegroups.com mailing-list (only subscribers can post).

A real-time chat and support line is hosted at http://gitter.im/elasticluster/chat (with an IRC-compatible interface at http://irc.gitter.im)

This project is an effort of the S³IT: Services and support for Science IT unit at the University of Zurich, licensed under the GNU General Public License version 3.