Since Curalate began three years ago, our build and deploy pipeline has changed immensely. From a manual process run locally on our laptops to an automated system consisting of Jenkins, Packer, Chef, and Asgard, the progression has given us confidence in the system and allowed us to develop and deploy ever faster. In this post I’ll talk about how we build and deploy our code at Curalate. We’ll cover where we started three years ago, where we are now, and what the future holds.
Like most young startups, the build and deploy process at Curalate did not receive much attention at first. Whatever could get the code out the door quickly and easily was used and things worked moderately well initially. This makes sense: as a young company you have to focus on your product and don’t have the luxury to devote weeks or months to fine-tuning your build process. The small team and codebase size also allows this process to be fairly ad-hoc. To that end our initial process for deploying applications consisted of:
- Compiling code locally
- Uploading the build artifacts to a private S3 bucket
- Re-launching the relevant instances
Since our entire infrastructure is running on Amazon Web Services we were able to leverage the EC2 userdata feature. The instances were running on vanilla Amazon Linux Amazon Machine Images (AMI) and our custom userdata would download the build artifacts from S3 and launch the daemons at boot time. With the help of a few simple scripts this was quick and kept us moving along for a while. As both the number of applications increased as well as the size of the team this approach started to break down. The two biggest problems were relying on a manual (and therefore error-prone) process of building locally and the inability to quickly roll-back a bad deploy. The latter problem would exacerbate the former. Missing a build flag, using the wrong version, etc, are all mistakes we made while building and deploying locally. These are all things that having a standardized, repeatable build process would solve. While it’s true that these mistakes could have been solved with the addition of another script or a change to the EC2 userdata, we decided it was high time to invest in a proper build and deployment pipeline.
When designing our next generation build and deployment pipeline we had several goals:
- Centralized deployment dashboard
- Fast deployment and rollback
- Scale to many applications
- Immutable build artifacts
With those goals in mind and the desire to avoid reinventing the wheel we started to look at existing open source tools. Having used both Capistrano and Fabric in the past I realized that, while great tools, we were looking for something more full-featured. Deployinator from Etsy solves a few of our challenges but ultimately relies on updating code from source on the target machines. As we’re a Scala-based shop this didn’t make as much sense since the deployment artifacts (JAR/WAR) are already built. Deploymacy from Yammer sounds promising but it doesn’t seem like it will be open-sourced any time soon. That left us with Asgard from Netflix and further investigation revealed that Asgard would solve all of the goals mentioned above.
At a high level our current pipeline can be summarized in the below graphic. Concretely, this means our source is checked into GitHub, Jenkins builds JAR/WAR artifacts, and those artifacts are packaged into AMIs to be deployed.
Jenkins is the backbone of the build portion of the pipeline and its importance and utilization has only grown since our initial rollout. We use the Amazon EC2 plugin to automatically provision and terminate the build nodes on-demand, which is very beneficial in keeping costs low. Pull requests from GitHub are automatically retrieved, built, and checked to ensure that tests pass and they conform to our coding standards using the GitHub pull request builder plugin. For releases, Jenkins creates a tag and pushes it to GitHub, deploys the resulting artifacts to our internal Maven repository, and then kicks off a Packer run for each of the applications to be built. The Build Flow plugin offers a very nice DSL for designing complex build pipelines in code.
Packer is invoked by Jenkins, runs our Chef recipes, and bakes an AMI with the desired software and configuration. We followed Netflix’s approach to building AMIs as detailed in their blog post on Aminator. We start with a Foundation AMI: This is strictly a pristine OS installation (e.g., Ubuntu 14.04 LTS): no extra software, no customization, etc. This mostly exists so that we do not have to rely upon a public cloud image that may disappear at any time. Next is the Base AMI, which is built from the Foundation AMI, and installs common software and tools that are needed across all our instances. Think,
screen, specific JVM, etc. Finally, we bake custom AMIs on top of the Base AMI for each version of our applications. Packer and Chef make this whole complex process easy and repeatable.
Finally, the “deploy” part of the build and deploy pipeline is handled by Netflix’s Asgard. As the AMI is the unit of deployment, a new version of an application is deployed by creating a Launch Configuration with the new AMI, assigning this Launch Configuration to a new Autoscaling Group (ASG), and sizing the new ASG appropriately. If the application is a web app or service, the new ASG is put into service simultaneously with the old version. Once the new ASG is scaled up properly and all instances are healthy the remaining traffic is shifted to the new ASG. At this point the old ASG is scaled down and can be safely deleted. Asgard’s Automated Deployment feature handles this workflow with ease.
Source code is checked into GitHub, Jenkins builds artifacts from it, and the Packer/Chef combination turns those artifacts into deployable AMIs. Asgard is then used to create an ASG with the new AMI and the code is rolled out into production. The key point here is that at every step the output is immutable: The
git tag, the JAR/WAR, and the AMI never change once they are built.
Our build and deploy process has served us well over the last year and the properties it enforces (immutable artifacts, machine images as the unit of deployment, ASGs) are patterns we will continue to heavily utilize. It has allowed us to develop and deploy code faster, more reliably, and with more confidence while giving us the flexibility to start breaking our monolithic repository into discrete applications. That said, the build and deployment landscape is changing rapidly with the advent of containers and unikernels and we’re eagerly evaluating what our next version will look like. On the deployment side, new tools like Terraform promise to make the “infrastructure as code” phrase a reality. Netflix has recently released the successor to Asgard, Spinnaker, which builds upon the above properties but takes a more comprehensive approach. The ability to build AMIs as well as Docker images provides the flexibility to migrate to containers in the future, something we’ve been looking at recently. The introduction of Blue-Green deployments as a first-class citizen is also a welcome addition to the project. Needless to say, the build and deployment ecosytem is flourishing and we’re excited about what the future holds.
If working on any of the above challenges sounds interesting, we’re hiring in Philadelphia, Seattle, and New York!