Tensorflow data augmentation evaluation : lower performances

I’m working on an image classification problem with VGG16.
My dataset is balanced (150 images per class)
I have segmented the data set into three sets, train, val, test

I’m testing the impact of data augmentation using this methodology but whatever I do, data augmentation makes my model performance lower.

https://www.tensorflow.org/tutorials/images/data_augmentation#option_1_make_the_preprocessing_layers_part_of_your_model

Here is my flow :

tf_train=tf.data.Dataset.from_tensor_slices((np.array(images_np_train), y_train)).batch(10)
tf_test=tf.data.Dataset.from_tensor_slices((np.array(images_np_test), y_test)).batch(10)
tf_val=tf.data.Dataset.from_tensor_slices((np.array(images_np_val), y_val)).batch(10)

def create_model_fct2() :

    IMG_SIZE = 224
    resize_and_rescale = Sequential([
        Resizing(IMG_SIZE, IMG_SIZE,input_shape=(224, 224, 3)),
      
    ])
    # Data augmentation
    data_augmentation = Sequential([
    #    RandomFlip("horizontal_and_vertical", input_shape=(224, 224, 3)),
        RandomRotation(0.2, input_shape=(224, 224, 3)),
        RandomZoom(0.1),
        Rescaling(1./255)

    ])

    model_base = VGG16(include_top=False, weights="imagenet", input_shape=(224, 224, 3))
    for layer in model_base.layers:
        layer.trainable = False



    # Définition du nouveau modèle
    model = Sequential([
        resize_and_rescale,
        data_augmentation,
        model_base,
        GlobalAveragePooling2D(),
        Dense(256, activation='relu'),
        Dropout(0.5),
        Dense(7, activation='softmax')
    ])

    # compilation du modèle
    model.build()
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

    print(model.summary())

    return model


model4 = create_model_fct2()


model4_save_path = "./model4_best_weights.keras"
checkpoint = ModelCheckpoint(model4_save_path, monitor="val_accuracy", verbose=1, save_best_only=True, mode="max")
es = EarlyStopping(monitor="val_accuracy", mode="max", verbose=1, patience=15)
callbacks_list = [checkpoint, es]

history4 = model4.fit(tf_train,
                      validation_data=tf_val,
                      batch_size=10, epochs=100, callbacks=callbacks_list, verbose=1)

loss, accuracy = model4.evaluate(tf_test, verbose=False)
print("Test Accuracy       :  {:.4f}".format(accuracy))`



Regardless of the number of epochs, patience, etc., my entrained model with data augmentation performs less well in the evaluation on my test data set.

I have two questions:
1 – I can’t explain this performance degradation.
2 – Does the evaluation exclude the data augmentation layer? Shouldn’t it? Shouldn’t data augmentation only concern the training phase and not the test/validation phases?

loss, accuracy = model4.evaluate(tf_test, verbose=False)

Thanks !

Leave a Comment