Calibration plots
I finally read Nate Silver’s The Signal and the Noise. At the time of its release in 2012, it was a rather unique book. It discussed statistical modeling, Bayes theorem, and the art and science of predictions in a way that the general public could follow and understand. A book ahead of its time and it has held up nicely when I read it in 2019.
One of the things the author talks about in the book are weather predictions, and in the chapter, he has a short mention of model calibration plots. “Calibration plots? That looks useful!” I thought and my interest had piqued enough to try it on some of my own models.
When evaluating models we run into mentions of accuracy, f1 score, or confusion matrixes. Calibration is not something I see too often and it turns out it’s a pretty good view into how your model is performing.
In general terms, calibration is a comparison of the confidence of your model with the actual results. If a model is 90% confident in the prediction, what’s the percentage it is actually correct? Does it have “blind spots” where a model is overconfident consistently? Calibration plot can help you spot such trends.
The method to calculate it is pretty straightforward. Here is a snippet of code that illustrates the approach:
What we are doing above is running through model predictions. For each prediction, round down to the nearest 5% interval and note the outcome of his prediction. Tally # of correct vs incorrect and you have the accuracy % for each interval. I output this into a CSV to later render with pandas:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snssns.set()df = pd.read_csv("calibration.csv", index_col="index")
df.sort_index(inplace=True)
df.loc[:,["predicted","actual"]].plot(figsize=(15,10))
Once you run this, you should see something like this:
This is a plot for my NBA model for 2018–2019 games. You can see how the actual values are pretty close to what it thought it should be with the biggest drift around 50–55% and 85–90% prediction interval. Pretty cool!
Here is another NBA model plot for the same season, this one is based on the same features as the above model but uses GaussianNB implementation instead of XGBoost:
You can see how the first model’s actual line is consistently closer to the predicted line and the second model is off and stays that way for the majority of predictions. What’s worse is that in many cases it’s off by +10%. Easy to see which one seems to be better tuned, calibrated.
One other interesting tidbit about the two models above. Their overall validation accuracy differed only by 1%! XGBoost based model came in at 65.76% and gaussian based was 64.75% accurate.
I love the insight this plot gives and will keep this technique around for comparing the models.