At Curalate, we need to be able to use data to demonstrate that our products hold value for our clients. One of our products, Fanreel, uses user-generated content to enhance online shopping experiences and product discovery. We record and store usage metrics from Fanreel but we also need to take those usage metrics and connect them to product purchases. If Fanreel analytics were a puzzle, purchase information would be the last piece and historically, Google Analytics or Adobe Omniture served as this last piece. However, every ecommerce site is different so sometimes the intricacies of Google Analytics and Adobe Omniture got in the way.

We wanted to have a simple “one size fits all” solution so we have turned to a simple tracking pixel. Specifically, our first tracking pixel is a checkout pixel which lives on our clients’ checkout confirmation pages to collect transaction data and serve as the last piece of our analytics puzzle.

What Is a Tracking Pixel?

A tracking pixel is a 1x1 transparent image that sends data from the webpage the pixel lives on. When the page loads, a GET request is made for the image along with query parameters that contain user data.

<img src="https://yourwebsite.com/trackingpixel?pixelid=12345&username=shippy&company=curalate&position=engineer">
This tracking pixel is sending information from a pixel with id 12345 about a user named Shippy who is works at Curalate and whose position is engineer.

When the server receives the request for the image, it also receives all of the query parameters, simplifying data transfer between different websites. Since you can place a tracking pixel anywhere that you can use Javascript, it’s a simple and flexible way of transferring data. To get a tracking pixel up and running, you’ll need a few things:

  1. a Javascript snippet to collect the data you’re tracking and request the tracking pixel image
  2. a servlet to receive the tracked data and return the tracking pixel image
  3. a way to stream/store the data

We solve step 3 with AWS Kinesis Firehose. We like Kinesis Firehose because it’s easily configurable and because it fits nicely into our existing data pipeline which uses AWS Redshift extensively as a data store.

Let’s go through how you can set up a tracking pixel and AWS Kinesis Firehose to collect some information on our friend Shippy, the engineer who works at Curalate:

Step 1: Create a Javascript Library to Collect and Send Data

First, you need a small snippet that initializes a global queue of pixel functions to execute and also asynchronously loads your Javascript library of pixel functions (more on this soon!).

(function() {
    // initialize the global queue, 'q' which is attached to a 'crl8' object
    if (!('crl8' in window)) {
        window.crl8 = function() {
            window.crl8.q.push(arguments);
        };
        window.crl8.q = [];
    }

    // load your js library of pixel functions...
    var script = document.createElement('script');
    script.src = 'https://yourwebsite.com/js-min/pixel-library.min.js';

    // ...do it asynchronously...
    script.async = true;

    // ...and insert it before the first script on the page!
    var firstScript = document.getElementsByTagName('script')[0];
    firstScript.parentNode.insertBefore(script, firstScript);
})();

You can minify this file and put it in the head tag of whatever pages you want your tracking pixel to live on.

As for Your Pixel JS Library File…

This file defines all of the functions needed to gather the data you’re interested in and to generate a request for the pixel that will send the data to your server. It will also pull events off of the global queue and execute them. This library should provide functions so that whoever is placing your pixel on their website can use these functions to include exactly the data you want the pixel to collect.

(function() {
    var api = {}; // use this object to store all of your library functions
    var pixelId = null;
    var data = {}; // use this object to store the data you're collecting and sending

    // if your pixel will be used in multiple places, unique pixel ids will be crucial to
    // identify which piece of data came from which place
    api.init = function(pId) {
        pixelId = pId;
    };

    // include a function for each type of data you want to collect and add it to your data object.
    // if we're trying to collect Shippy's name, company, and position, we'll have the following
    // functions which should take in an object with key and value as argument (this will form your
    // query parameters):
    api.addName = function(n) {
        data.push(n);
    };

    api.addCompany = function(c) {
        data.push(c);
    };

    api.addPosition = function(p) {
        data.push(p);
    };

    // include a function to turn all the data you've collected in the data object into query
    // parameters to append to the url for the pixel on your server
    api.toQueryString = function() {
        var s = [];
        Object.keys(data).forEach(function(key) {
            s.push(key + "=" + encodeURIComponent(data[key]));
        });
        return s.join("&");
    };

    // include a function to add the query parameters to your pixel url and to finally append
    // the resulting pixel URL to your document
    api.send = function() {
        var pixel = document.createElement("img");
        var queryParams = api.toQueryString();
        pixel.src = "https://yourwebsite.com/trackingpixel/" + pixelId + "/pixel.png?" +
                    queryParams;
        document.body.appendChild(pixel);
    };

    // pull functions off of the global queue and execute them
    var execute = function() {
        // while the global queue is not empty, remove the first element and execute the
        // function with the parameter it provides
        // (assuming that the queued element is a 2 element list of the form
        // [function, parameters])
        var command = window.crl8.q.shift();
        var func = command[0];
        var parameters = command[1];
        if (typeof api[func] === 'function') {
            api[func].call(window, parameters);
        } else {
             console.error("Invalid function specified: " + func);
        }
    };

    execute();
}

Step 2: Set Up a Servlet

The servlet ties everything together.

  1. receive the data sent by the tracking pixel
  2. return a 1x1 transparent image to the page that requested the pixel
  3. send the data to your Firehose delivery stream
import com.amazonaws.services.kinesisfirehose.AmazonKinesisFirehoseClient
import com.amazonaws.services.kinesisfirehose.model.{PutRecordBatchRequest, Record}

class TrackingPixelServlet extends ScalatraServletEx {
  getEx("/:pixelId/pixel.png") {
    private val firehoseClient = new AmazonKinesisFirehoseClient(credentials)
    // this should match the name you that you set for your Kinesis Firehose delivery stream
    private val DELIVERY_STREAM = "tracking-pixel-delivery-stream"

    // extract the tracking pixel data from query parameters
    val pixelId = paramGetter.getRequiredLongParameter("pixelId")
    val username = paramGetter.getRequiredStringParameter("username")
    val company = paramGetter.getRequiredStringParameter("company")
    val position = paramGetter.getRequiredStringParameter("position")

    val userData = UserData(pixelId, username, company, position)

    // create a record
    val jsonData = JsonUtils.toJson(userData) + ENTRY_SEPARATOR
    val record = new Record
    record.setData(ByteBuffer.wrap(data.getBytes()))

    // send the record to your firehose delivery stream
    val request = new PutRecordBatchRequest()
    request.setDeliveryStreamName(DELIVERY_STREAM)
    request.setRecords(record)
    firehoseClient.putRecordBatch(request)

    // return a 1x1 transparent image to the page with the tracking pixel
    {
      contentType = "image/png"
      PIXEL_IMG // your pixel
    }
  }
}

case class UserData (
  pixelId: Long,
  username: String,
  company: String,
  position: String
)

Step 3: Set Up AWS Kinesis Firehose

Now that you have all of this data collected by your tracking pixel, how do you store it? At Curalate, we’ve turned to AWS Kinesis Firehose. Firehose is specifically designed to provide an easy and seamless way to capture and load data into AWS, not to mention that setup literally consists of clicks on the AWS console. Firehose is also great since since data streamed to a Firehose delivery stream can ultimately land in ElasticSearch, S3, or Redshift.

Since we love Redshift and already use it very heavily, that’s our chosen destination for our checkout pixel. The AWS setup documentation is quite thorough but here are a few tips that we picked up from our setup for our checkout pixel:

  • Adjust the buffer size and buffer interval of your Firehose delivery stream to control your throughput. If you have a lot of data, consider reducing the buffer size and interval to get quicker updates into your final AWS destination.
  • After you’ve set up your tracking pixel Javascript and your Firehose delivery stream, take advantage of the error logs and Cloudwatch monitoring that Firehose provides to verify that your tracking pixel is correctly sending data to your delivery stream and to your delivery destination.

And That’s It!

Since you can place a tracking pixel anywhere that you can use Javascript, it’s a simple and flexible way to collect data. Combined with AWS Kinesis Firehose, the pipeline from data collection to storage is very adaptable to your specific needs and very easily configurable.