F1 Score Calculator

Calculate F1 score instantly using confusion matrix or precision and recall values. Free online F1 score calculator for AI, machine learning, classification, and data science.

🧮

F1 Score Calculator

Calculate F1 score from confusion matrix values (TP, FP, FN) or directly from precision and recall. Instant results with step-by-step formulas. All calculations run locally in your browser.

Confusion Matrix

True Positive (TP)

Correct positive

False Negative (FN)

Missed positive

False Positive (FP)

Wrong positive

True Negative (TN)

Correct negative

Total: 1,000TN not needed for F1

Visual Matrix

8.0%

1.0%

2.0%

890

89.0%

Ctrl+Enter to recalculate

Example Scenarios

F1 Score Result

84.21%

0.8421 · Good

80.00%

Precision

88.89%

Recall

✅ Good F1 Score

Good performance. The model balances precision and recall well for most production use cases.

All Metrics

F1 Score84.21%

Precision80.00%

Recall88.89%

Accuracy97.00%

Specificity97.80%

Extended Metrics

Accuracy97.00%

Specificity97.80%

NPV98.89%

MCC0.8270

What Is the F1 Score?

The F1 score is the harmonic mean of precision and recall. It provides a single metric that balances both concerns — how many of your positive predictions were correct (precision) and how many of the actual positives your model found (recall).

Unlike accuracy, F1 score is not inflated by a large number of true negatives, making it especially useful for imbalanced classification problems where one class dominates.

The F1 Formula

F1 = 2 × (Precision × Recall) ÷ (Precision + Recall)

Where:
  Precision = TP ÷ (TP + FP)   — of all predicted positives, how many were correct?
  Recall    = TP ÷ (TP + FN)   — of all actual positives, how many were found?

Example: TP = 80, FP = 20, FN = 10
  Precision = 80 ÷ (80 + 20) = 0.80
  Recall    = 80 ÷ (80 + 10) = 0.8889
  F1        = 2 × (0.80 × 0.8889) ÷ (0.80 + 0.8889) = 0.8421  (84.21%)

Why Use the Harmonic Mean?

The harmonic mean penalises extreme imbalances between precision and recall. If a model achieves 100% precision by making very few predictions but 0% recall by missing all positives, the arithmetic mean would be 50% — misleadingly high. The harmonic mean yields 0%, correctly reflecting that the model is useless.

Arithmetic mean of 1.0 and 0.0 = 0.50  ← misleading
Harmonic mean   of 1.0 and 0.0 = 0.00  ← correct

F1 Score Rating Reference

F1 Score	Rating	Interpretation
≥ 0.90 (90%)	Excellent	Production-ready — strong balance of precision and recall
0.75 – 0.89	Good	Suitable for many real-world applications with minor trade-offs
0.50 – 0.74	Moderate	Acceptable for some tasks; review false positives/negatives
< 0.50	Poor	Model struggles significantly — investigate data and features

F1 Score vs Accuracy

Aspect	F1 Score	Accuracy
Uses TN?	No	Yes
Best for imbalanced data?	✓ Yes	✗ No — inflated by majority class
Penalises extreme values?	✓ Yes (harmonic mean)	✗ No
Single metric?	✓ Yes	✓ Yes
Intuitive?	Moderate	High

Frequently Asked Questions

What is a good F1 score?

It depends on the problem. For most production ML systems, an F1 score ≥ 0.85 is considered good. For safety-critical domains (medical diagnosis, fraud detection), you may target ≥ 0.90 or higher. For research baselines, 0.75+ is often acceptable.

Can F1 score be calculated without a confusion matrix?

Yes. If you already know your model's precision and recall values, you can calculate F1 directly: F1 = 2 × (Precision × Recall) ÷ (Precision + Recall). Use the Precision & Recall mode in this calculator.

What is macro vs micro F1 score?

For multi-class problems, micro F1 aggregates TP/FP/FN across all classes before computing the metric. Macro F1 computes F1 per class then averages them. Macro F1 gives equal weight to all classes; micro F1 is influenced by larger classes.

When should I use F1 score over precision or recall alone?

Use F1 when both false positives and false negatives carry meaningful cost. If one type of error is much more costly than the other, optimise directly for precision (if FP is costly) or recall (if FN is costly) instead.

What is the F-beta score?

F-beta is a generalisation of F1 where you can weight precision or recall more heavily. F1 uses β=1 (equal weight). F0.5 weights precision twice as much; F2 weights recall twice as much. This calculator computes the standard F1 (β=1).

Related Tools

📊

Precision Recall Calculator

Calculate precision, recall, F1 score, accuracy, and specificity from confusion matrix values. Free online machine learning evaluation metrics calculator.

Try it now→

🎯

Model Accuracy Calculator

Calculate machine learning model accuracy instantly. Compare actual vs predicted labels, evaluate AI performance, upload CSV data, and get instant results online for free.

Try it now→

🤖

AI Token Cost Calculator

Estimate AI API token costs for OpenAI, Claude, Gemini, and custom models. Calculate prompt and completion token expenses, compare models, and forecast monthly and yearly costs.

Try it now→

📏

AI Prompt Length Calculator

Calculate AI prompt length instantly. Count tokens, words, characters, sentences, and estimate context window usage for ChatGPT, Claude, Gemini, and other AI models.

Try it now→

📊

Time Complexity Calculator

Estimate algorithm time complexity using Big-O notation. Analyze loop patterns, recursion, and algorithm presets with interactive growth visualizations and educational explanations.

Try it now→

⏱️

Latency Calculator

Estimate network latency, propagation delay, transmission delay, round-trip time (RTT), and gaming ping. Free online latency calculator for networking, gaming, cloud, and DevOps.

Try it now→