Productive Toolbox

F1 Score Calculator

Calculate F1 score instantly using confusion matrix or precision and recall values. Free online F1 score calculator for AI, machine learning, classification, and data science.

๐Ÿงฎ

F1 Score Calculator

Calculate F1 score from confusion matrix values (TP, FP, FN) or directly from precision and recall. Instant results with step-by-step formulas. All calculations run locally in your browser.

Confusion Matrix

Correct positive

Missed positive

Wrong positive

Correct negative

Total: 1,000TN not needed for F1

Visual Matrix

80
TP
8.0%
10
FN
1.0%
20
FP
2.0%
890
TN
89.0%

Ctrl+Enter to recalculate

Example Scenarios

F1 Score Result

84.21%
0.8421 ยท Good
80.00%
Precision
88.89%
Recall
โœ… Good F1 Score

Good performance. The model balances precision and recall well for most production use cases.

All Metrics

F1 Score84.21%
Precision80.00%
Recall88.89%
Accuracy97.00%
Specificity97.80%

Extended Metrics

Accuracy97.00%
Specificity97.80%
NPV98.89%
MCC0.8270

What Is the F1 Score?

The F1 score is the harmonic mean of precision and recall. It provides a single metric that balances both concerns โ€” how many of your positive predictions were correct (precision) and how many of the actual positives your model found (recall).

Unlike accuracy, F1 score is not inflated by a large number of true negatives, making it especially useful for imbalanced classification problems where one class dominates.

The F1 Formula

F1 = 2 ร— (Precision ร— Recall) รท (Precision + Recall)

Where:
  Precision = TP รท (TP + FP)   โ€” of all predicted positives, how many were correct?
  Recall    = TP รท (TP + FN)   โ€” of all actual positives, how many were found?

Example: TP = 80, FP = 20, FN = 10
  Precision = 80 รท (80 + 20) = 0.80
  Recall    = 80 รท (80 + 10) = 0.8889
  F1        = 2 ร— (0.80 ร— 0.8889) รท (0.80 + 0.8889) = 0.8421  (84.21%)

Why Use the Harmonic Mean?

The harmonic mean penalises extreme imbalances between precision and recall. If a model achieves 100% precision by making very few predictions but 0% recall by missing all positives, the arithmetic mean would be 50% โ€” misleadingly high. The harmonic mean yields 0%, correctly reflecting that the model is useless.

Arithmetic mean of 1.0 and 0.0 = 0.50  โ† misleading
Harmonic mean   of 1.0 and 0.0 = 0.00  โ† correct

F1 Score Rating Reference

F1 ScoreRatingInterpretation
โ‰ฅ 0.90 (90%)ExcellentProduction-ready โ€” strong balance of precision and recall
0.75 โ€“ 0.89GoodSuitable for many real-world applications with minor trade-offs
0.50 โ€“ 0.74ModerateAcceptable for some tasks; review false positives/negatives
< 0.50PoorModel struggles significantly โ€” investigate data and features

F1 Score vs Accuracy

AspectF1 ScoreAccuracy
Uses TN?NoYes
Best for imbalanced data?โœ“ Yesโœ— No โ€” inflated by majority class
Penalises extreme values?โœ“ Yes (harmonic mean)โœ— No
Single metric?โœ“ Yesโœ“ Yes
Intuitive?ModerateHigh

Frequently Asked Questions

What is a good F1 score?

It depends on the problem. For most production ML systems, an F1 score โ‰ฅ 0.85 is considered good. For safety-critical domains (medical diagnosis, fraud detection), you may target โ‰ฅ 0.90 or higher. For research baselines, 0.75+ is often acceptable.

Can F1 score be calculated without a confusion matrix?

Yes. If you already know your model's precision and recall values, you can calculate F1 directly: F1 = 2 ร— (Precision ร— Recall) รท (Precision + Recall). Use the Precision & Recall mode in this calculator.

What is macro vs micro F1 score?

For multi-class problems, micro F1 aggregates TP/FP/FN across all classes before computing the metric. Macro F1 computes F1 per class then averages them. Macro F1 gives equal weight to all classes; micro F1 is influenced by larger classes.

When should I use F1 score over precision or recall alone?

Use F1 when both false positives and false negatives carry meaningful cost. If one type of error is much more costly than the other, optimise directly for precision (if FP is costly) or recall (if FN is costly) instead.

What is the F-beta score?

F-beta is a generalisation of F1 where you can weight precision or recall more heavily. F1 uses ฮฒ=1 (equal weight). F0.5 weights precision twice as much; F2 weights recall twice as much. This calculator computes the standard F1 (ฮฒ=1).