F1 Score Calculator
Calculate F1 score instantly using confusion matrix or precision and recall values. Free online F1 score calculator for AI, machine learning, classification, and data science.
F1 Score Calculator
Calculate F1 score from confusion matrix values (TP, FP, FN) or directly from precision and recall. Instant results with step-by-step formulas. All calculations run locally in your browser.
Confusion Matrix
Correct positive
Missed positive
Wrong positive
Correct negative
Visual Matrix
Ctrl+Enter to recalculate
Example Scenarios
F1 Score Result
Good performance. The model balances precision and recall well for most production use cases.
All Metrics
Extended Metrics
What Is the F1 Score?
The F1 score is the harmonic mean of precision and recall. It provides a single metric that balances both concerns โ how many of your positive predictions were correct (precision) and how many of the actual positives your model found (recall).
Unlike accuracy, F1 score is not inflated by a large number of true negatives, making it especially useful for imbalanced classification problems where one class dominates.
The F1 Formula
F1 = 2 ร (Precision ร Recall) รท (Precision + Recall) Where: Precision = TP รท (TP + FP) โ of all predicted positives, how many were correct? Recall = TP รท (TP + FN) โ of all actual positives, how many were found? Example: TP = 80, FP = 20, FN = 10 Precision = 80 รท (80 + 20) = 0.80 Recall = 80 รท (80 + 10) = 0.8889 F1 = 2 ร (0.80 ร 0.8889) รท (0.80 + 0.8889) = 0.8421 (84.21%)
Why Use the Harmonic Mean?
The harmonic mean penalises extreme imbalances between precision and recall. If a model achieves 100% precision by making very few predictions but 0% recall by missing all positives, the arithmetic mean would be 50% โ misleadingly high. The harmonic mean yields 0%, correctly reflecting that the model is useless.
Arithmetic mean of 1.0 and 0.0 = 0.50 โ misleading Harmonic mean of 1.0 and 0.0 = 0.00 โ correct
F1 Score Rating Reference
| F1 Score | Rating | Interpretation |
|---|---|---|
| โฅ 0.90 (90%) | Excellent | Production-ready โ strong balance of precision and recall |
| 0.75 โ 0.89 | Good | Suitable for many real-world applications with minor trade-offs |
| 0.50 โ 0.74 | Moderate | Acceptable for some tasks; review false positives/negatives |
| < 0.50 | Poor | Model struggles significantly โ investigate data and features |
F1 Score vs Accuracy
| Aspect | F1 Score | Accuracy |
|---|---|---|
| Uses TN? | No | Yes |
| Best for imbalanced data? | โ Yes | โ No โ inflated by majority class |
| Penalises extreme values? | โ Yes (harmonic mean) | โ No |
| Single metric? | โ Yes | โ Yes |
| Intuitive? | Moderate | High |
Frequently Asked Questions
What is a good F1 score?
It depends on the problem. For most production ML systems, an F1 score โฅ 0.85 is considered good. For safety-critical domains (medical diagnosis, fraud detection), you may target โฅ 0.90 or higher. For research baselines, 0.75+ is often acceptable.
Can F1 score be calculated without a confusion matrix?
Yes. If you already know your model's precision and recall values, you can calculate F1 directly: F1 = 2 ร (Precision ร Recall) รท (Precision + Recall). Use the Precision & Recall mode in this calculator.
What is macro vs micro F1 score?
For multi-class problems, micro F1 aggregates TP/FP/FN across all classes before computing the metric. Macro F1 computes F1 per class then averages them. Macro F1 gives equal weight to all classes; micro F1 is influenced by larger classes.
When should I use F1 score over precision or recall alone?
Use F1 when both false positives and false negatives carry meaningful cost. If one type of error is much more costly than the other, optimise directly for precision (if FP is costly) or recall (if FN is costly) instead.
What is the F-beta score?
F-beta is a generalisation of F1 where you can weight precision or recall more heavily. F1 uses ฮฒ=1 (equal weight). F0.5 weights precision twice as much; F2 weights recall twice as much. This calculator computes the standard F1 (ฮฒ=1).
Related Tools
Precision Recall Calculator
Calculate precision, recall, F1 score, accuracy, and specificity from confusion matrix values. Free online machine learning evaluation metrics calculator.
Model Accuracy Calculator
Calculate machine learning model accuracy instantly. Compare actual vs predicted labels, evaluate AI performance, upload CSV data, and get instant results online for free.
AI Token Cost Calculator
Estimate AI API token costs for OpenAI, Claude, Gemini, and custom models. Calculate prompt and completion token expenses, compare models, and forecast monthly and yearly costs.
AI Prompt Length Calculator
Calculate AI prompt length instantly. Count tokens, words, characters, sentences, and estimate context window usage for ChatGPT, Claude, Gemini, and other AI models.
Time Complexity Calculator
Estimate algorithm time complexity using Big-O notation. Analyze loop patterns, recursion, and algorithm presets with interactive growth visualizations and educational explanations.
Latency Calculator
Estimate network latency, propagation delay, transmission delay, round-trip time (RTT), and gaming ping. Free online latency calculator for networking, gaming, cloud, and DevOps.