Confusion Matrix (Precision, Recall, F1 Score)

Ron Lee
4 min readMar 19, 2021

--

The confusion matrix is a useful tools to measure the effectiveness of model. It can be explained in a table with 4 different combination of predicted and actual values.

Type of Confusion Matrix

Precision:

The precision is calculated as the ratio between the number of Positive samples correctly classified to the total number of samples classified as Positive (either correctly or incorrectly). The precision measures the model’s accuracy in classifying a sample as positive.

Low precision when increase of denominator i.e. many false positive classification or few true positive classification.

High precision when decrease of denominator i.e. few false positive classification or many true positive classification.

The goal of the precision is to classify all the Positive samples as Positive, and not miss-classify a negative sample as Positive. When the precision is high, we can trust the model when it predicts a sample as Positive. Thus, the precision reflects how reliable the model is in classifying samples as Positive.

Recall:

The recall is calculated as the ratio between the number of Positive samples correctly classified as Positive to the total number of Positive samples. The recall measures the model’s ability to detect Positive samples. The higher the recall, the more positive samples detected.

The recall cares only about how the positive samples are classified. When the model classifies all the positive samples as Positive, then the recall will be 100%. It does not care if all the negative samples were incorrectly classified as Positive. So, it ignores how negative samples are being classified. High recall means the model can classify all the positive samples correctly as Positive. Thus, the model can be trusted in its ability to detect positive samples.

Summary:

  • The precision measures the model trustfulness in classifying positive samples, and the recall measures how many positive samples were correctly classified by the model.
  • The precision dependent on how both the positive and negative samples were classified, but the recall only dependent on the positive samples in its calculations.
  • The precision considers when a sample is classified as Positive, but it does not care about correctly classifying all positive samples. The recall cares about correctly classifying all positive samples, but it does not care if a negative sample is classified as positive.
  • When a model has high recall but low precision, then the model classifies most of the positive samples correctly but it has many false positives. When a model has high precision but low recall, then the model is accurate when it classifies a sample as Positive but it can only classify a few positive samples.

Precision or Recall?

Whether precision or recall, all depend on the type of problem we are solving. Example:

It is more important to detect all the positive Covid-19 patient than wrongly detect non Covid-19 as positive, so use recall.

Wrongly classify customer as loan defaulter will cause bank to lose customer. Bank do want this to happen, so use precision.

F1 Scores

Accuracy always use to measure the effectiveness of a model. Higher the accuracy rate, higher the performance. However, it is not always the case if one data is severely outnumbered to the other. Example: 1 billion of non Covid-19 people against 100 people with Covid-19 which gives 99.9999% accuracy. In such a case, we cannot claim that our model is effective in predicting Covid-19 cases.

There is trade-off in the metrics we choose to maximize. Example, when we increase the recall, we decrease the precision. This involves achieving the balance between underfitting and overfitting

Example: if we change the model with high recall, we might detect all the patients who actually have Covid-19 but we accept low precision and might end up giving treatments to a lot of patients who don’t suffer from it. Similarly, if we aim for high precision to avoid giving any wrong and unwanted treatment, we end up getting a lot of patients who actually have Covid-19 going out without any treatment.

Since it is equally important to classify both Covid-19 and non Covid-19 patient, we use F1-score, a harmonic mean to find an optimal blend of precision and recall:

If we want to create a balanced classification model with the optimal balance of recall and precision, then we try to maximize the F1 score.

--

--

Ron Lee
0 Followers

Graduate in biz and big data analysis. Passionate in machine learning. Working in commodity industry