Productive Toolbox

Confusion Matrix Calculator

Calculate confusion matrix metrics instantly online. Get accuracy, precision, recall, specificity, F1 score, MCC, and more for machine learning classification models.

🧩

Confusion Matrix Calculator

Enter confusion matrix values (TP, FP, FN, TN) to instantly calculate all classification metrics — accuracy, precision, recall, F1, MCC, and more. Upload a CSV for automatic matrix generation. All calculations run locally in your browser.

Confusion Matrix Values

Correctly predicted positive

Missed positive (Type II)

Wrong positive (Type I)

Correctly predicted negative

Total samples200

Ctrl+Enter to recalculate

Confusion Matrix

← Predicted Pos | Predicted Neg →
90
TP
45.0%
20
FN
10.0%
10
FP
5.0%
80
TN
40.0%
↑ Actual Pos↓ Actual Neg

Example Scenarios

Key Metrics

85.00%
Accuracy
90.00%
Precision
81.82%
Recall
F1 Score85.71%
Specificity88.89%
MCC0.7035
Total samples: 200Positives: 110 · Negatives: 90
Accuracy
85.00%
Fraction of all predictions that were correct.
Precision
90.00%
Of all predicted positives, how many were correct?
Recall
81.82%
Of all actual positives, how many were found?
Specificity
88.89%
Of all actual negatives, how many were correctly identified?
F1 Score
85.71%
Harmonic mean of precision and recall.
FPR
11.11%
False Positive Rate (1 − Specificity).
FNR
18.18%
False Negative Rate (1 − Recall).
NPV
80.00%
Negative Predictive Value — accuracy of negative predictions.
Balanced Acc.
85.35%
(Recall + Specificity) / 2. Fair for imbalanced classes.
MCC
0.7035
Matthews Correlation Coefficient (−1 to +1).

Metrics Overview

Accuracy85.00%
Precision90.00%
Recall81.82%
Specificity88.89%
F1 Score85.71%
Balanced Accuracy85.35%

What Is a Confusion Matrix?

A confusion matrix is a table that summarises the performance of a classification model by comparing actual labels against predicted labels. For binary classification, it contains four values: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).

From these four numbers you can derive every standard classification metric — accuracy, precision, recall, F1 score, specificity, MCC, and more — giving you a complete picture of how well a model performs across both positive and negative classes.

Confusion Matrix Layout

                  Predicted Positive    Predicted Negative
Actual Positive        TP (True+)            FN (False-)
Actual Negative        FP (False+)           TN (True-)

TP = Correctly predicted positive
TN = Correctly predicted negative
FP = Incorrectly predicted positive (Type I error)
FN = Incorrectly predicted negative (Type II error)

All Metric Formulas

Accuracy          = (TP + TN) / (TP + FP + FN + TN)
Precision         = TP / (TP + FP)
Recall            = TP / (TP + FN)
Specificity       = TN / (TN + FP)
F1 Score          = 2 × (Precision × Recall) / (Precision + Recall)
False Positive Rate (FPR) = FP / (FP + TN)
False Negative Rate (FNR) = FN / (FN + TP)
NPV               = TN / (TN + FN)
Balanced Accuracy = (Recall + Specificity) / 2
MCC               = (TP×TN − FP×FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN))

Example: TP=90, TN=80, FP=10, FN=20
  Accuracy   = (90 + 80) / 200 = 85.00%
  Precision  = 90 / 100        = 90.00%
  Recall     = 90 / 110        = 81.82%
  F1 Score   = 2×(0.90×0.818)/(0.90+0.818) = 85.71%

Metric Reference Guide

MetricFormulaBest For
Accuracy(TP+TN)/TotalBalanced datasets
PrecisionTP/(TP+FP)When FP is costly (e.g. spam)
RecallTP/(TP+FN)When FN is costly (e.g. medical)
SpecificityTN/(TN+FP)Negative class performance
F1 Score2×P×R/(P+R)Imbalanced datasets
FPRFP/(FP+TN)ROC curve analysis
FNRFN/(FN+TP)Miss rate analysis
NPVTN/(TN+FN)Negative prediction reliability
Balanced Accuracy(Recall+Specificity)/2Class-imbalanced evaluation
MCC(TP×TN−FP×FN)/√(…)Overall quality (−1 to +1)

When to Use Each Metric

Use Accuracy when…

Your dataset is balanced (roughly equal class sizes) and all misclassification types carry equal cost.

Use Precision when…

False positives are expensive. In spam detection, flagging a legitimate email as spam (FP) damages user trust more than missing a spam email.

Use Recall when…

False negatives are dangerous. In cancer screening, missing a true positive (FN) has far greater consequences than a false alarm.

Use F1 Score when…

Your dataset is imbalanced and you need a single metric that balances precision and recall equally.

Use MCC when…

You want a single comprehensive metric that accounts for all four quadrants of the confusion matrix, especially for heavily imbalanced datasets.

Frequently Asked Questions

What does a confusion matrix tell you?

It breaks down all prediction outcomes into four categories — TP, TN, FP, FN — letting you see exactly where your model succeeds and where it fails. From these you can calculate every classification metric without needing the raw predictions.

What is a good accuracy for a classification model?

It depends heavily on the dataset. For balanced datasets, 90%+ is typically good. For heavily imbalanced datasets, accuracy can be misleading — a model predicting the majority class 100% of the time could score 99% accuracy while being completely useless.

What is the difference between sensitivity and specificity?

Sensitivity (recall) measures how well the model finds actual positives. Specificity measures how well it correctly identifies actual negatives. A good diagnostic test aims for high values of both.

Why is MCC considered a better metric than F1?

MCC (Matthews Correlation Coefficient) uses all four values of the confusion matrix and is not inflated by class imbalance. F1 ignores true negatives entirely. For datasets where TN is large (common in fraud detection), MCC gives a more balanced assessment.

Can I upload predictions directly?

Yes. Use the CSV Upload mode and provide a two-column CSV with columns 'actual' and 'predicted'. The calculator parses binary values (1/0) or string labels (positive/negative, yes/no) and builds the confusion matrix automatically.