Comparing Accuracy of Predictive Models –Visualization

Predictive Modeling is a very prominent approach to forecast or predict KPI’s. These KPI’s could be sales figures, churn rates, death rates, criminal activities, loss forecasting etc.
We will take a scenario where you as a data scientist are required to build a predictive model to forecast monthly sales for a Retailer in US for the year 2015. This retailer sells products in different categories across US counties. Sales figures every year depend on different parameters, with some of the key inputs being advertising spend, ad spend across channels, new product launches, product launches across categories promotions, seasonality, weather conditions etc.
As a data scientist you will try and experiment with different modeling techniques, linear or non-linear and come up with various models. You will take a validation dataset; let’s say data for year 2014 to validate your models for sales forecast.  Here is something you will notice
  • You will have some error rate associated with monthly sales forecast.
  • This error rate will vary across different models you have built
      You will need to show these models to business teams and choose one of the models based on its forecast accuracy over validation period, stability of forecasts on validation and test data. Let’s say you have built 6 such models using various techniques.  Each model will have different average error rates associated for monthly sales forecast for year 2014. Below is the average forecast error (in %) for six different models.
 
S.No
Model_Code
Avg_Forecast _Error
Error_Range
1
A
5
10
2
B
8
5
3
C
16
12
4
D
18
15
5
E
13
10
6
F
10
10
For visualizing performance of these models simultaneously we use bubble charts.
Model_Compare
In the above chart, bubble position relative to y-axis defines average forecast error of the model while the size of bubble explains the dispersion of forecast error. From the chart we can easily make out that model A and model B have better performance statistics as compared to rest. Final model selection out of A and B can be a choice between lower average error rate, dispersion and business inputs.
In case your data contains location as input, i.e sales by geography, you can use map based visuals to compare forecasting models. These visuals could be like heat maps. Intent is to make comparison as intuitive as possible for business users.
Tags:

Leave a Reply