Precision is a measure used to see how accurate a model's optimistic predictions are. It tells us the percentage of times the model was correct when it predicted something as positive. It simply checks how often the model's optimistic guesses were correct. The formula for precision is:

Where:
TP (True Positives): The number of times the model correctly predicted something as positive.
FP (False Positives): The number of times the model incorrectly predicted something as positive.
Precision is essential when making a false optimistic mistake is costly or harmful. If the model says something is positive, we want it to be right most of the time. This is especially useful when mistakes lead to unnecessary actions or high costs. Imagine a cancer detection system that predicts if a person has cancer. Precision matters because if the model says someone has cancer, we want that prediction to be accurate. An exact model will ensure that if it predicts cancer, it’s likely to be correct. This reduces unnecessary stress or extra medical tests for patients. In fraud detection, precision helps ensure that when the model flags a transaction as fraudulent, it’s usually right. This way, fewer valid transactions are mistakenly flagged as fraud, saving time and frustration for the customer. Precision is essential, but it should be used alongside other metrics like recall and F1-score to understand the complete picture of a model's performance.
Recall (called Sensitivity or True Positive Rate) is a measure used to determine how well a model identifies positive cases. It tells us how many actual positive instances (the real positives) the model successfully caught. In other words, recall shows how well the model detects all the important positive cases. The formula for the recall is:

Where:
TP (True Positives): The total number of correctly predicted positive cases.
FN (False Negatives): The total number of actual positive cases that the model missed (i.e., the model said they were negative, but they were positive)
The importance of recall in machine learning models. High recall is crucial in identifying confirmed positive cases, especially in disease screening like early-stage cancer detection. Missing an actual positive case can have serious consequences, so prioritizing high recall is essential for early treatment and saving lives. In fraud detection, recall helps ensure that most fraudulent transactions are identified. Even if some everyday transactions are flagged by mistake (false positives), catching as many fraudulent transactions as possible is the priority to prevent financial loss. The downside of focusing only on recall is that it can lead to more false positives (cases where the model incorrectly predicts something as positive). For example, in cancer detection, a model with very high recall might incorrectly label many healthy people as having cancer, causing unnecessary stress and tests. Similarly, in fraud detection, a high recall model might flag too many legitimate transactions as fraud, frustrating customers. Because of this, precision and recall are often used together. Precision measures how accurate the optimistic predictions are, while recall ensures that most real positives are caught. The F1 score is a mixed metric that balances precision and recall significantly when false positives and negatives matter.
The F1 score is a way to measure how good a machine learning model performs by balancing two important factors: precision and recall. Precision tells us how many of the model's positive predictions were correct, while recall tells us how many positives the model successfully identified. The formula for F1-Score is as follows:

Accuracy works well when the number of examples from each class (or category) in our dataset is similar. It’s often used when making a mistake by predicting something positive or negative about the exact cost. A high accuracy score typically means the model is performing well overall. Imagine a spam detection system where half of the emails are spam, and the other half are not. Accuracy clearly shows how well the model is doing if the model correctly identifies 90% of spam and non-spam emails. Accuracy can be wrong when the dataset is unbalanced, meaning one class has more examples than another. For example, if 95% of emails are non-spam and the model predicts "non-spam" every time, it will have 95% accuracy, even though it didn’t catch any spam. In such cases, using other metrics like precision, recall, or F1-score is better, giving us a more balanced view of how the model performs, especially for the minority class. In cases like cancer screening, missing a real case is worse than a false alarm. Metrics like precision or recall may be more important depending on the situation. The F1 score helps evaluate models where precision and recall matter, especially with imbalanced datasets.
The ROC-AUC assesses a binary classification model's ability to differentiate between two classes. It is derived from the ROC curve, which shows how well the model identifies true positives (recall) versus false positives (1-specificity) at different thresholds.
The AUC (Area Under the Curve) and ROC-AUC are essential metrics in machine learning and artificial intelligence. These metrics provide a single score to summarize the model's performance in binary classification tasks, especially when dealing with imbalanced datasets. The AUC score indicates how well the model distinguishes between positives and negatives, with a score of 0.5 indicating performance no better than random guessing and a score of 1 indicating perfect distinction between the two classes. This metric demonstrates the model's ability to manage the trade-off between identifying positives (sensitivity) and avoiding false positives (specificity). For example, in credit scoring, the ROC-AUC helps assess how well a model can differentiate between risky and safe borrowers.
In a credit scoring model, the ROC curve illustrates how effectively the model can identify good and bad credit risks at various thresholds. A high AUC score indicates the model's ability to predict risks accurately.
Mean Squared Error (MSE) is a way to check how accurate a regression model is by looking at the average squared differences between predicted and actual values. The formula for MSE is:

The Mean Squared Error (MSE) is commonly used in regression problems to minimize the difference between a model's prediction and the actual value. It's a straightforward and effective way to measure a model's performance. For example, if we are developing a model to predict house prices, MSE can indicate how closely our model's predictions align with the actual prices. Let's say we have a model that predicts house prices. To assess its performance, we calculate the MSE, which provides the average of the squared differences between the actual sale prices and the predicted prices. A lower MSE indicates that the model's predictions closely match the exact values. However, one drawback of MSE is its sensitivity to outliers or extreme values. Because MSE squares the errors, significant differences between predicted and actual values have a much larger impact. This means that a few critical errors can disproportionately raise the MSE, potentially making the model appear less effective than it is.