Mechanical Turk Lessons Learned

At Curalate, we constantly dream up big ideas for new products and services. Big ideas that require lots of work. Lots of boring, repetitive, simple work that we honestly do not want to do ourselves. In situations like this we turn to the industry standard for getting other people to do work for you, Amazon Mechanical Turk. Amazon’s “Artificial Artificial Intelligence” service connects requesters to people from across the globe (known as “Turkers”) to complete a set of Human Intelligence Tasks or HITs. We’ve used Mechanical Turk in the past to create labeled datasets for use in our machine learning models to tackle various deep learning problems here at Curalate (such as the Emojini 3000 and Intelligent Product Tagging). In the process we have learned a few lessons that would have saved us a lot of time if known beforehand, so here they are to help if you want to get the most out of your valuable Turk time.

Turkers are surprisingly cost effective

So why use Mechanical Turk in the first place? Turkers will work for a single penny in many cases. Even with Amazon’s additional fees such as a 20% service fee and additional fees for premium qualifications, you can get a lot of work done for very little investment. Because of this low upfront cost, it’s beneficial to test your HITs on smaller data sets first to work out any issues. Maybe the instructions for your task are not clear enough for the Turkers or maybe your job has a poor conversion rate (workers finding your HIT, but not wanting to do it). More on avoiding some of these issues later.

Now you will have to tune the reward for your HITs a bit. Too much and your HIT will get done very quick, but you will be wasting money. Too little and no one will want to do your HIT. Once again this is where you can set up multiple versions of your HIT at different price points to find the sweet spot.

Master turkers are worth it

To help avoid some of the possible quality issues associated with using Mechanical Turk, we use Masters Turkers on all of our HITs. These are Turkers who have a history of giving good results. There is a cost associated with using them however and they are harder to attract to a job, but their answers are more consistent compared to non-masters.

Setting up a job is easy

If you are reading this engineering dev blog you likely know everything you need to set up a Mechanical Turk job. The HIT layout and questions are in standard HTML. HITs are generally unstyled (we’re talking 1995 era web styling here), so you can largely forgo any fancy CSS. You really just need to know how to make lists, tables, and standard input fields. The online editor is clunky and will not automatically save work if you accidentally go back a page or something, so save your work often.

Your jobs are uploaded as a CSV file, 1 HIT per row. Each column containing string variables that are replaced with placeholders in the HTML source of your HIT. You can use this for various things like setting variable addresses for hosted images or links to external pages, or custom text or strings per task for the user, but be aware that Mechanical Turk does not support full UTF-8 and will complain if you try to upload a CSV file containing your favorite emojis 😞

Less is more

A problem that we ran into early on was having a poor conversion rate with the Turkers. They were viewing our job and then leaving and our best guess as to why was because they didn’t feel it was worth their time. The problem was we were either asking too many questions, even if they are very simple questions, or were presenting the Turkers a massive wall of text that they did not want to read.

Our advice

Have them answer as little questions as possible.
Try and keep them yes/no style questions if possible.
Hide your instructions for the task in a drop-down drawer, the example HITs provided by Amazon do this as well, since a Turker really only needs to see this once.
Make sure the instructions are not annoyingly long to read.
Make sure your HIT is short enough that the Turker does not have to scroll to view it all.

To help track our conversion rate of Turkers we embedded an HTML only tracking pixel in our HIT template. There are many free tracking pixels available, but this is the one that we use. Just use the basic anchor tag tracking pixel, stick it at the bottom of your hit template HTML, and everything should^TM work fine.

Standing out from the crowd

When your HIT goes live, it is going to be placed in a pool with all of the other current HITs available. You have to stand out from the crowd and make your job easy to find for the Turkers who would want to complete it.
In our past experience, short and catchy titles and keywords increase the amount of new Turkers finding your HIT, which in turn increases the rate at which your overall job gets completed. Likewise, Turkers do judge a book by its cover, so try to not include words in the title that would make the Turker think that the job would take too long or contain content that is boring or uninteresting to work on. At the same time, if your HIT contains NSFW content you do have to properly mark it as such when creating the HIT in the dashboard.

Don’t assume background knowledge

Overall Turkers are largely from the US and India. Amazon is slowly expanding into other markets around the world, but you can expect to get citizens of these two countries on your HIT. Therefore it’s important that your HIT makes sense to non-native English speakers or those who may be unfamiliar with certain cultural knowledge. For example, Instagram’s active user base is largely in American and European markets. This would mean that a question like “Which Instagram filter best describes you?” would make zero sense to a lot of Turkers.

Turkers do not want to waste time trying to figure out your crazy HIT. Keep an elementary difficulty level in the task and provide simple examples over explanations when possible. The faster a Turker can complete your HIT / the more straightforward it is to understand, the more likely you’ll see completed HITs and repeat workers.

Repeat visitors is a good sign

If your HITs are good in terms of a price to difficulty ratio, you will see many repeat workers. This is generally a good thing as having the same workers on your tasks will result in faster results and more consistent answers. The tracking pixel also comes into use here for tracking the unique visitors.

Manual checking of your results

Even after all of this you will still get bad answers. We have had plenty of Turkers who would just answer all of the same option. To help avoid this we would have multiple, preferably an odd number, Turkers take multiple passes on each HIT. The results from your HITs will be returned in a CSV which links a Turkers ID to their answers, so we also wrote scripts to do some basic analysis of the answers given by each worker to see if their answers met some red flag criteria. This could include them always answering largely the same pattern or answer, or always disagreeing with the other Turkers assigned to the same HIT.

Even then, we still felt the need to manually check the HITs that returned results on the border of our problem space (i.e. an almost even amount of Turkers answering yes or no on a HIT). We would go through and manually check these answers to solidify the result.

And that’s it! At least it’s all of the general, not-too-specific things we have figured out. By no means is this everything that you need to know to successfully use Amazon Mechanical Turk, but it should make your experience as a requester a bit smoother.