Lost In Translation

Machine learning learning is only good with very narrow and clearly defined tasks. We have to translate and narrow down our original goal several times before we can solve it with machine learning. This blog post examines the translation chain and the loss that occurs between each step.

Christoph Molnar

When we train machine learning models, we measure the loss that occurs between the true outcome and the model predictions. We focus on minimizing the loss by trying different models, by tuning hyperparameters, by engineering the features, by transforming the data. During this model training phase of a machine learning project, minimizing the model performance seems like the only goal and makes one forget the bigger picture. Then we get headlines like the following, which merely focuses on the model accuracy: “Researchers found that human dermatologists accurately identified 86.6 percent of skin cancers from a range of images, compared to 95 percent for the CNN.” - engadget 2018. I chose this headline because it’s representative of the focus on the loss minimization part and less on the systems thinking part.

Why is high predictive performance not a guarantee that a model is used in practice? One of the reasons: machine learning learning is only good with very narrow and clearly defined tasks. We have to translate our original goal multiple times, narrow it down, before we can solve it with machine learning.

Let’s assume our goal is to help dermatologists detect skin cancer more reliably. We first have to decide on the tool that we want to use to solve that problem. A special camera for the doctor that marks suspicious looking skin parts? Or a smartphone application that the patient uses to self-examine over time, giving recommendations when to see a doctor? An application where the patient interactively chats with a doctor and takes pictures when necessary? The choice of the tool has a lot of impact on what data we have to collect (images with smartphones vs. images from special medical cameras), the way we formulate the prediction task (compare images over time? classify individual images?) and the cost of missclassification.

Once a tool for the goal is chosen, we have to choose a prediction task: Do we want to predict the likelihood that some mole is cancerous? Or that it will develop to cancer within the next year? Is it an image recognition task or do we consider additional input from the patient?

Once we have formulated our prediction task, we have to decide which data to use. Do we take data from one doctor’s office only? Probably not. How many different hospitals / patients (age, sex, skin type, …) / types of cancers / cameras and so on does the model need to generalize well in other settings?

A simple method: Translation chain

A simple, yet powerful method is to explicitly write down the translation chain that happened from goal to data. This helps to understand the decision, the loss or problems that might occur because of those decisions. The translation chain happens between the following steps:

Goal: What do you want to achieve?

Tool: By which means do you want to achieve the goal?

Prediction Task: Which predictions or classifications are needed to make the tool work?

Data: Which data are used to train a model to solve the prediction task?

We start with a very broad goal and end up with very specific, detailed data. Inevitably, we have to make simplifications and something gets lost in translation. There ain’t no such thing as a free lunch This does not mean that we should just shrug off the fact that we have an imperfect model for our goal. It’s crucial to think about the translation chain and how it affects the applicability of the predictions to the real world.

This blog post encourages you to make those choices explicit by providing a very rough template: Goal -> Tool -> Prediction Task -> Data.

In the remainder of this post we will go through a few examples of translations: customer relationship management, predictive policing and credit scoring.

Sending Vouchers to Customers

Imagine you manage a travel agency. You sell trips to your customers. The customers are mostly very loyal customers, sometimes you have new customers. Business is not as good as it could be, so you hire a few data scientists.