Code Monkey home page Code Monkey logo

Comments (7)

adriangb avatar adriangb commented on September 13, 2024 1

Keras does the validation split internally, it's not something that sklearn is aware of. Would making the preprocessing layers part of the model itself, as suggested by the first link you included work (see Option 1: Make the preprocessing layers part of your model)? I'd imagine since it's recommended in the tutorial it doesn't lead to any important amount of data leakage.

from scikeras.

clstaudt avatar clstaudt commented on September 13, 2024 1

@adriangb Adding the augmentation as layers to the network is indeed a working solution.

from keras.layers import RandomFlip, RandomTranslation, RandomBrightness, RandomRotation


cnn_augment = Sequential(
    name="cnn_augment",
    layers=[
        Input(input_shape),

        # augmentation
        RandomFlip(mode="vertical"),
        RandomRotation(factor=0.005, fill_mode="constant", fill_value=0),
        RandomBrightness(factor=0.001),
        RandomTranslation(height_factor=0.00, width_factor=0.02, fill_mode="nearest"),

        # convolution
        Conv2D(32, (3, 3), activation='relu'),  
        MaxPooling2D(2, 2),
        Conv2D(64, (3, 3), activation='relu'),
        MaxPooling2D(2, 2),
        Conv2D(128, (3, 3), activation='relu'),
        MaxPooling2D(2, 2),
        Conv2D(256, (3, 3), activation='relu'),
        MaxPooling2D(2, 2),
        Conv2D(512, (3, 3), activation='relu'),
        MaxPooling2D(2, 2),

        Flatten(),
        Dense(256, activation='relu'),
        Dropout(0.5),
        Dense(1, activation='sigmoid')  
    ]
)

from scikeras.

adriangb avatar adriangb commented on September 13, 2024 1

Thank you for coming back and sharing a solution!

from scikeras.

adriangb avatar adriangb commented on September 13, 2024

I'm not sure what you mean by data augmentation. Could you point me to ScikitLearn or Keras docs? Thanks

from scikeras.

clstaudt avatar clstaudt commented on September 13, 2024

@adriangb
https://www.tensorflow.org/tutorials/images/data_augmentation

In my case, data augmentation would be used to artificially increase the number of training samples by flipping each image, for example.

The keras ImageDataGenerator class also supports augmentation, but I assume it is not compatible with scikeras.
https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/

from scikeras.

clstaudt avatar clstaudt commented on September 13, 2024

Why not apply augmentation to X_train before passing it to fit, you may ask.

Because this leads to a form of leakage into the validation split: Suppose an image is in the training split and its flipped version is in the validation split. Then the latter is too easy to predict, making the validation performance metrics look too good.

from scikeras.

clstaudt avatar clstaudt commented on September 13, 2024

@adriangb Perhaps it would, I'll have to look at it.

Alternatively, should it be possible to pass the validation dataset to the KerasClassifier constructor?

from scikeras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.