Code Monkey home page Code Monkey logo

Comments (8)

sushreebarsa avatar sushreebarsa commented on April 27, 2024

@varshad18 Could you please double-check your calculation for steps_per_epoch. Kindly ensure it considers the total number of samples in your dataset and the batch size.
In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here. Thank you!

from tensorflow.

NBCBM avatar NBCBM commented on April 27, 2024

epochs = 300
BATCH_SIZE = 4
train_size = trainImageTotalData
valid_size = validationImageTotalData

Calculate steps per epoch and validation steps

steps_per_epoch = train_size // BATCH_SIZE
validation_steps = valid_size // BATCH_SIZE

Optionally, you can adjust steps_per_epoch and validation_steps based on whether your dataset is shuffled

If your dataset is shuffled during training, you might want to set steps_per_epoch to None

and let it automatically determine the number of steps based on the dataset size and batch size

hist = model.fit(

x=[trainNumericData, trainImagesSBData, trainImagesCBData, trainImagesWBData, trainImagesHBData,trainImagesLLData,trainImagesLBData, trainImagesUpLeftABData,trainImagesUpRightABData, trainImagesALeftLData, trainImagesARightLData],

y=trainAllRegressionData,

epochs=epochs,

validation_data=([validationNumericData, validationImagesSBData, validationImagesCBData, validationImagesWBData, validationImagesHBData, validationImagesLLData,validationImagesLBData, validationImagesUpLeftABData, validationImagesUpRightABData, validationImagesALeftLData, validationImagesARightLData], validationAllRegressionData),

steps_per_epoch=steps_per_epoch, # Set steps_per_epoch

validation_steps=validation_steps, # Set validation_steps

callbacks=[mc, tensorboard_callback]).history

Instead of manually setting steps_per_epoch and validation_steps, you can let it automatically determine based on the dataset size and batch size

hist = model.fit(
x=[trainNumericData, trainImagesSBData, trainImagesCBData, trainImagesWBData, trainImagesHBData,trainImagesLLData,trainImagesLBData, trainImagesUpLeftABData,trainImagesUpRightABData, trainImagesALeftLData, trainImagesARightLData],
y=trainAllRegressionData,
epochs=epochs,
validation_data=([validationNumericData, validationImagesSBData, validationImagesCBData, validationImagesWBData, validationImagesHBData, validationImagesLLData,validationImagesLBData, validationImagesUpLeftABData, validationImagesUpRightABData, validationImagesALeftLData, validationImagesARightLData], validationAllRegressionData),
callbacks=[mc, tensorboard_callback]).history

from tensorflow.

sushreebarsa avatar sushreebarsa commented on April 27, 2024

@varshad18 Could you please share the complete code in a notebook or gist to replicate the issue reported here?
Thank you!

from tensorflow.

varshad18 avatar varshad18 commented on April 27, 2024

@sushreebarsa I double-checked my calculation for steps_per_epoch and tried using the following formula:

BATCH_SIZE = 4
train_size = trainImageTotalData
valid_size = validationImageTotalData

print("train size " + str(train_size))
print("valid size " + str(valid_size))
print("batch size " + str(BATCH_SIZE))
steps_per_epoch = (train_size / BATCH_SIZE)
validation_steps = (valid_size / BATCH_SIZE)
print("steps_per_epoch ="+str(steps_per_epoch))
print("validation_steps ="+str(validation_steps))

steps_per_epoch = math.ceil(steps_per_epoch)
#steps_per_epoch=steps_per_epoch-1
validation_steps = math.ceil(validation_steps)
#validation_steps=validation_steps-1
print("steps_per_epoch ="+str(steps_per_epoch))
print("validation_steps ="+str(validation_steps))

train size 714
valid size 89
batch size 4
steps_per_epoch =178.5
validation_steps =22.25
steps_per_epoch =179
validation_steps =23

This worked for me and is training for all 179 steps with no errors. But the most common approach is to simply exclude the last incomplete batch from training during an epoch and here if I try to exclude the last batch by (steps_per_epoch-1) I get an error as follows

KeyError: 'Failed to format this callback filepath: "/content/drive/MyDrive/FashionBody/Regression/TrainingRun/300Run2.0/checkpoint-{epoch:02d}-{val_loss:.2f}.tf". Reason: 'val_loss''

Is it okay to train 179 steps according to my train size? Or am I doing something wrong?

from tensorflow.

sushreebarsa avatar sushreebarsa commented on April 27, 2024

@varshad18 Could you please confirm if you are still using Keras 2 ? If so then please migrate to Keras 3 and follow this documentation here. Thank you!

from tensorflow.

github-actions avatar github-actions commented on April 27, 2024

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

from tensorflow.

github-actions avatar github-actions commented on April 27, 2024

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

from tensorflow.

google-ml-butler avatar google-ml-butler commented on April 27, 2024

Are you satisfied with the resolution of your issue?
Yes
No

from tensorflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.