Say you want to track the health of your API. Pingdom is probably your first move. But what if you want to track thousands of endpoints? Suddenly you’re looking at a monthly bill of over $500! So, what if you’re willing to build an API health tracker yourself? This blog post is for you. Not only can we beat $500 per month, we can build our API health tracker for damn near free!
First, let’s clear something up. We love Pingdom at Curalate. We’ve been using Pingdom for almost three years to track the uptime and latency of our major products, and we will continue to use it for the forseeable future. But we’ve reached a point in our product where it is possible for us to break an endpoint for just one client, and that won’t be reflected in the overall API health. We’d like to check that data is flowing for each of our clients.
So let’s complement Pingdom with a lightweight solution that pings endpoints for all of our clients. And let’s call it Hotline Ping.
Choosing AWS Lambda
One major goal here is to run this health check regularly and frequently. We have a cluster of
machines that run scheduled jobs, but there is no explicit execution start time due to the variable
queue length. We could consider something like
cron to launch the health tracker at an exact
time, but that introduces a single point of failure. Neither of these are good options for a
production monitoring system.
We could also spin up a standalone service, but making it resilient would require multiple servers and extra engineering and maintenance. That adds up to some real costs.
AWS Lambda is built to handle this type of work: regularly scheduled but low density work. Why pay for a ton of unused CPU cycles? Also, NodeJS is well-suited for the task of pinging a long list of URLs and Lambda supports Node natively. Lambda essentially lets you spin up a mini-instance to run your function once, then spin it down. Only need to run it once a day? Schedule it with Amazon’s built-in scheduled events and only pay for the CPU cycles it takes to run it once a day. Need to run the function a million times at once? Spin up a million instances of your function to run them all concurrently!
Lastly, we were looking for a nice test-case for AWS Lambda. We wanted to get a feel for how to code for it, how to deploy for it, and how to monitor it.
Serverless, previously known as JAWS, is a framework created in response to Lambda. It’s built by a company of the same name working fulltime on making this open source framework great. It helps you automate deployments and versioning of your Lambda functions. It also helps you write clean code by separating the Lambda event handler code from the rest of your code. Using Serverless correctly allows you to deploy the same code you wrote for Lambda to an EC2 instance with relatively low overhead.
The framework is still pretty young (v0.5.5 as of writing this), but the team and contributors were incredibly responsive and helpful when we were building the first version of Hotline Ping. Their Gitter chatroom is very busy with the team, contributors, and new Serverless users.
Overall, Serverless helps smooth the few rough edges in Lambda.
Building Hotline Ping
Now that we’ve picked our stack, actually building Hotline Ping is pretty straightforward.
- Set up a scheduled event to run every 5 minutes (most frequent currently supported schedule)
- Write a function that reads a list of URLs from S3, pings each, and sends metrics to DataDog
- Hook that function up to a Lambda event handler using Serverless
- Configure Serverless for your AWS environment
- Bob’s your uncle
We built Hotline Ping to avoid upping our montly bill with Pingdom, so how cheap is Lambda for our project? We can use Matthew Fuller’s Lambda Cost Calculator to find out. Running our function every 5 minutes for 30 days is 8,640 executions per month. Our function does not need anything more than the minimum 128MB memory instance. And empirically, our function runs in a maximum of 90 seconds (so this will actually be an overestimate). That comes to a whopping $1.62 per month. Plus that’s ignoring the free tier AWS provides!
There are still a few things we would like to tidy up and add in future work:
- Get this code into our Jenkins workflow, including for deployments. Right now we just deploy from a local machine.
- Track latency to give us a more complete picture of our API health
- Upgrade to the latest and greatest version of Serverless. We started Hotline Ping with v0.3.0,
and they’ve already added a bunch of great changes by v0.5.5:
- Much simpler configuration
- Better directory structure
- Configuration for scheduled events
- Support for multiple AWS accounts for different deployment stages
- Plus many more; they’re seriously cranking out some great new features
So, how do you feel after your first foray into Lambda? Not so bad, right? With this simple example, our hope is that you better understand the potential to create services that use minimal resources but that can also scale massively and seamlessly.