Blog

Data Scientists vs Superforecasters

One of my primary interests as a data scientist is great forecasting. Personally you might not share this point of view. I bet however you’ve encountered managers or colleagues who just want a great prediction. Forecasting is the essence that people think data scientists do. On my quest to better forecasting I encountered a book. Superforecasting by Philip Tetlock.

This book teaches us how to become a superforecaster, without machine learning or AI. In the book Philip describes how through 30 years of research projects he met people who can very accurately forecast. We learn about the stories of these people and how they use their different skillsets. They apply these skills to answer questions.

“How likely is it that this year China and Vietnam enter a border dispute with deadly consequence?”

One of the questions posed to the superforecasters

Having read the book it was remarkable to learn how these people interact with forecasting. To me it seems that we can learn a few things as data scientists. I will discuss my 3 key insights in relation to my experience working as a data scientist. Implement them at your own peril.

Quantify your question

“Hey Joe, will this marketing campaign result in any sales growth?”

Any manager

One of the striking points in Philips book is the way he phrases forecasting questions. To be a superforecaster you have to measure your progress. What is the requisite for accurate measurement? Objectifiable parameters.

Compare the border dispute with the question above. The border dispute needs to break out within a year, the marketing campaign is unspecified. It can result in sales growth 2 weeks from now or 2 months from now. Both would be a correct answer. Superforecasters need an objectifiable question with a clear time horizon. This is true for data scientists too. It is a fresh reminder that to be accurate we need objectifiable questions.

We need to work with the people around us to drill down on these questions. Before we dive into the data. The superforecasters mentioned in Philips book taught me this again. Once you start working on objectifiable questions you can track your forecasting, improve and learn from them.

Break down your question

Superforecasters don’t do ‘black-box’ guesses. What are black-box guesses? Its the concept that people face difficult questions and attempt to answer them anyway. You will get a response like, maybe 40%? This is a tip-op-the-nose estimate. The mental trap here is to forget to activate your problem solving brain because it seems overwhelming.

The key in breaking down your question is to focus on things you know, and move from there. A famous physisist, Enrico Fermi, bears the name of these problems. His most famous question is below.

“How many piano tuners are there in chicago?”

Enrico Fermi

This question is on the face of it a total unknown. Fermi teaches us we can break it down into components. We can then estimate the components individually based on what we know. This results in better guesses then tip-of-the-nose estimates.

Superforecasters constantly do this, they focus on what they know. They guess those components as accurately as possible and come up with an overall better answer. As data scientists we too have to break down our questions. For example our marketing campaign breaks down into relevant components:

  • What is our current sales volume?
  • How much did our sales grow in the past year?
  • Did we run any marketing campaign in the past year?

These questions allow us to answer components of the original question and craft a more educated response. It allows us to create a mental and mathematical baseline for the problem at hand. Sometimes we forget to take a step back and think about our problem logically. Doing this can help us come up with overall better answers, remembering Fermi can help us with that.

Work together with expert judgement

Superforecasters take in all the information that is available to them, period. They attempt to make the subjective quantifiable, if it supports the end goal, a better prediction. This behavior strikes a particular point that I personally can take to heart.

As data scientists we focus on eliminating subjective judgement to make decisions based on facts. What we forego in these situations is that subjective judgement can also be the final piece of our prediction puzzle.

In my daily work I very often reach a limit of what our companies data can predict. In order to improve our estimates we propose gathering more or richer datasets. Why dont we propose to have the experts that need to work with our prediction outcome adjust it as needed? Perhaps we can even program a feedback loop so we can learn from how our model outcome is subjectively adjusted.

In his book Philip Tetlock discusses this issue, and mentions a final concept, the ‘wisdom-of-the-crowds’. This is a statistical concept coined in 2004, which states that the aggregate forecast very often beats what any single member of the group could have guessed. Its not just a concept that works for 100 humans vs 10 humans that make a prediction. It also works for 1 human and 1 predictive model.

This concept is very powerful and in my opinion should be applied way more often with difficult to predict problems. As data scientists we need to learn the value of expert judgement, and not just strive to remove it.

Superforecasting

Superforecasting really is the business of data scientists. Our phones can now automatically categorize pictures into cats or no cats. We can soon use real-time data on your health to predict cardiac arrest. Looking at these developments you can argue data science is on top of its game. What the superforecasters in Philip Tedlocks book add to this is a refresher on the essence of making forecasts. Its a great reminder that I can totally recommend.

Leave a Reply