Monday, November 18, 2019

Data Science - Evaluating ML Models using Cumulative Gain Lift, K-S Statistic, ROC, AUC, Gini Index

The links below have some very good technical details. This post will not repeat those details, but instead give some plain description to explain certain things which are not that clear.


Machine Learning algorithms aim to train a data set, then applied to new data set. So the main question is How good is the model? The following tools aim to answer that.

Confusion Matrix - The purpose is not to confuse but determine the proportion of correct or incorrect predictions. Think of the truth tables about false positives, Type I and Type II errors, etc.

Let's also focus on the example where the Dependent Variables has only two classes: True/False, Positive/Negative etc....

The Gain is usually plotted as a Cumulative Gain Chart (CGC). The chart is for only ONE of the target value - they don't often mention this clearly enough. So the Y-axis on the CGC can be either for the Positive or Negative case. The X-axis shows the proportion of population. The graph reads this way: after a fraction of the population is considered, how many of the Target is correctly predicted, as compared with no model being used (ie random chance).

The Lift graph is obtained by the Gain value divided by the decile at which it is being plotted.

The AUC is the area under the ROC curve. The ROC is a plot between the True Positive rate vs the False Positive rate. If purely by random chance, no prediction at all, the performance is 0.5 - which is the area under the graph of the diagonal line. An ML model with prediction will mean the ROC is curved above the 45deg line. The area under this represents the probability which is between 0.5 to 1.0. 

The Gini Index the 2*AUC - 1, the reason for this is to make the AUC to have the value of 0.0 to 1.0.

The K-S statistic and graph shows how well the model predicts and separates between the Positive and Negative target values.

2 comments:

Unknown said...

Hi, thanks for the post and links!

I have recently written two extensive articles on evaluation metrics for binary classification that could really be relevant here:

- 24 Evaluation Metrics for Binary Classification (And When to Use Them) https://neptune.ml/blog/evaluation-metrics-binary-classification

- F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which Evaluation Metric Should You Choose?
https://neptune.ml/blog/f1-score-accuracy-roc-auc-pr-auc

Figured I'd let you know.

All the best,
Jakub

xtechnotes said...

Thanks very much Jakub. Very useful info and much appreciated.