I have a dataset of pixels where each pixel can be classified by 1 of 5 classes and I have 3 models that I trained to classify those pixels. Now, I am developing a python script that will help me analyze the performance of each model by computing the values of some metrics. As you can see I already have some metrics but I would like to include in my script Auc_roc because I think it is a right metric to have when analyzing an imbalanced dataset for a multi-classification problem.
MY PROBLEM: I don’t know how to implement the auc roc while maintaining the structure of my DataFrame.
The structure looks like this:
The code:
data = pd.read_csv(data_path)
# Create a list of classifier names
classifiers = ['KNN', 'RF', 'XGBoost']
# Define a list of class labels
class_labels = [0, 1, 2, 3, 4] # Modify this based on your actual class labels
# Create an empty list to store class-level metrics
class_metrics = []
# Iterate over each classifier
for classifier in classifiers:
# Filter the data DataFrame for the specific classifier
classifier_data = data[data['Classifier'] == classifier]
# Iterate over each class
for class_label in class_labels:
# Filter data for the specific class
class_data = classifier_data[classifier_data['GT_Class'] == class_label]
# True ground truth values for the class
y_true = class_data['GT_Class']
# Predicted values by the classifier for the class
y_pred = class_data['Pred_Class']
# Calculate classification metrics for the class
accuracy = balanced_accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred, average="weighted")
recall_sensitivity = sensitivity_score(y_true, y_pred, average="weighted")
f1 = f1_score(y_true, y_pred, average="weighted")
# Append the metrics to the list
class_metrics.extend([
[classifier, class_label, 'Accuracy', accuracy],
[classifier, class_label, 'Precision', precision],
[classifier, class_label, 'Recall', recall_sensitivity],
[classifier, class_label, 'F1-Score', f1]
])
# Create a DataFrame from the class-level metrics
class_metrics_df = pd.DataFrame(class_metrics, columns=['Classifier', 'Class', 'Metric Name', 'Metric Value'])
print(class_metrics_df)
When I tried to only analyze the auc_roc metric I use this code and I am not sure that it worked.
data = pd.read_csv(data_path)
# Create a list of classifier names
classifiers = ['KNN', 'RF', 'XGBoost']
# Define a list of class labels
class_labels = [0, 1, 2, 3, 4] # Modify this based on your actual class labels
# Iterate over each classifier
for classifier in classifiers:
print(f"Classifier: {classifier}")
# Filter the data DataFrame for the specific classifier
classifier_data = data[data['Classifier'] == classifier]
# Iterate over each class
for class_label in class_labels:
# Filter data for the specific class
class_data = classifier_data.copy() # Make a copy to avoid modifying the original DataFrame
class_data['GT_Class'] = (class_data['GT_Class'] == class_label).astype(int)
# True ground truth values for the class
y_true = class_data['GT_Class']
# Predicted values by the classifier for the class
y_pred = class_data['Pred_Class']
# Calculate AUC-ROC for the class
auc_roc = roc_auc_score(y_true, y_pred)
print(f"Class {class_label} AUC-ROC: {auc_roc:.4f}")
The reason why I am saying that I am not sure if it work is the class 0 for KNN I am getting a result of Class 0 AUC-ROC: 0.0151 and it seems way to low. The KNN Confusion Matrix Looks like this:
[[9696 79 50 30 62] [ 36 3044 466 47 7] [ 50 427 2198 525 1] [ 13 48 395 5942 1] [ 19 7 0 0 1034]]