It is important that augmentation is used only on the training data, so that the valid

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to apply data augmentation? about scikeras HOT 7 CLOSED

clstaudt commented on September 13, 2024

How to apply data augmentation?

from scikeras.

Comments (7)

adriangb commented on September 13, 2024 1

Keras does the validation split internally, it's not something that sklearn is aware of. Would making the preprocessing layers part of the model itself, as suggested by the first link you included work (see Option 1: Make the preprocessing layers part of your model)? I'd imagine since it's recommended in the tutorial it doesn't lead to any important amount of data leakage.

from scikeras.

clstaudt commented on September 13, 2024 1

@adriangb Adding the augmentation as layers to the network is indeed a working solution.

from keras.layers import RandomFlip, RandomTranslation, RandomBrightness, RandomRotation


cnn_augment = Sequential(
    name="cnn_augment",
    layers=[
        Input(input_shape),

        # augmentation
        RandomFlip(mode="vertical"),
        RandomRotation(factor=0.005, fill_mode="constant", fill_value=0),
        RandomBrightness(factor=0.001),
        RandomTranslation(height_factor=0.00, width_factor=0.02, fill_mode="nearest"),

        # convolution
        Conv2D(32, (3, 3), activation='relu'),  
        MaxPooling2D(2, 2),
        Conv2D(64, (3, 3), activation='relu'),
        MaxPooling2D(2, 2),
        Conv2D(128, (3, 3), activation='relu'),
        MaxPooling2D(2, 2),
        Conv2D(256, (3, 3), activation='relu'),
        MaxPooling2D(2, 2),
        Conv2D(512, (3, 3), activation='relu'),
        MaxPooling2D(2, 2),

        Flatten(),
        Dense(256, activation='relu'),
        Dropout(0.5),
        Dense(1, activation='sigmoid')  
    ]
)

from scikeras.

adriangb commented on September 13, 2024 1

Thank you for coming back and sharing a solution!

from scikeras.

adriangb commented on September 13, 2024

I'm not sure what you mean by data augmentation. Could you point me to ScikitLearn or Keras docs? Thanks

from scikeras.

clstaudt commented on September 13, 2024

@adriangb
https://www.tensorflow.org/tutorials/images/data_augmentation

In my case, data augmentation would be used to artificially increase the number of training samples by flipping each image, for example.

The keras ImageDataGenerator class also supports augmentation, but I assume it is not compatible with scikeras.
https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/

from scikeras.

clstaudt commented on September 13, 2024

Why not apply augmentation to X_train before passing it to fit, you may ask.

Because this leads to a form of leakage into the validation split: Suppose an image is in the training split and its flipped version is in the validation split. Then the latter is too easy to predict, making the validation performance metrics look too good.

from scikeras.

clstaudt commented on September 13, 2024

@adriangb Perhaps it would, I'll have to look at it.

Alternatively, should it be possible to pass the validation dataset to the KerasClassifier constructor?

from scikeras.

Recommend Projects

How to apply data augmentation? about scikeras HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent