Code Monkey home page Code Monkey logo

Comments (21)

github-actions avatar github-actions commented on May 27, 2024

👋 Hello @Zero-start0, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

  • Quickstart. Start training and deploying YOLO models with HUB in seconds.
  • Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
  • Projects: Creating and Managing. Group your models into projects for improved organization.
  • Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
  • Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
  • Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
    • iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
    • Android. Explore TFLite acceleration on mobile devices.
  • Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

from hub.

sergiuwaxmann avatar sergiuwaxmann commented on May 27, 2024

Hi @Zero-start0!
When training a model using Ultralytics HUB, we try to save a checkpoint every 15 minutes. If a checkpoint is saved, you can resume training - the resume option is shown on the model page automatically.

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

Hi @Zero-start0! When training a model using Ultralytics HUB, we try to save a checkpoint every 15 minutes. If a checkpoint is saved, you can resume training - the resume option is shown on the model page automatically.

However, I see the checkpoint has been upload to hub, I can't find the resume option

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

Hi @Zero-start0! When training a model using Ultralytics HUB, we try to save a checkpoint every 15 minutes. If a checkpoint is saved, you can resume training - the resume option is shown on the model page automatically.

Could you show a detail instruction? This question drive me mad

from hub.

sergiuwaxmann avatar sergiuwaxmann commented on May 27, 2024

@Zero-start0 If your model is disconnected and a checkpoint is saved, the message on the model page should be "Resume training from epoch X". What do you see on the model page?

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

image
image

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

@Zero-start0 If your model is disconnected and a checkpoint is saved, the message on the model page should be "Resume training from epoch X". What do you see on the model page?

What should I do now

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

image

@Zero-start0 If your model is disconnected and a checkpoint is saved, the message on the model page should be "Resume training from epoch X". What do you see on the model page?

I can see the checkpoint, but I don't have the option

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

@Zero-start0 If your model is disconnected and a checkpoint is saved, the message on the model page should be "Resume training from epoch X". What do you see on the model page?

If I reconnect, the model will train from epoch 1.

from hub.

sergiuwaxmann avatar sergiuwaxmann commented on May 27, 2024

@Zero-start0 Based on the Ultralytics HUB screenshots you shared, there is no checkpoint saved in the Ultralytics HUB. I have attached an image of a model that has a checkpoint saved in the Ultralytics HUB.
resume

Looking at the ultralytics logs you shared, I can see that a checkpoint began uploading - perhaps the process did not succeed or something went wrong. Our team will investigate if there is an issue on our end related to the upload, and I will keep you updated.

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

@Zero-start0 Based on the Ultralytics HUB screenshots you shared, there is no checkpoint saved in the Ultralytics HUB. I have attached an image of a model that has a checkpoint saved in the Ultralytics HUB. resume

Looking at the logs you shared, I can see that a checkpoint began uploading - perhaps the process did not succeed or something went wrong. Our team will investigate if there is an issue on our end related to the upload, and I will keep you updated.ultralytics

So how can I continue my work. Any method can I use to continue

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

Now I can only to use this method to train my model

from ultralytics import YOLO

Load a model

model = YOLO('../ultralytics/runs/detect/train/weights/last.pt') # load a partially trained model

Resume training

results = model.train(resume=True)

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

I have experienced this situation so many time. If there is any solution please contact me.

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

What's more, how can I upload my trained model to the hub?

from hub.

sergiuwaxmann avatar sergiuwaxmann commented on May 27, 2024

Now I can only to use this method to train my model

from ultralytics import YOLO

Load a model

model = YOLO('../ultralytics/runs/detect/train/weights/last.pt') # load a partially trained model

Resume training

results = model.train(resume=True)

@Zero-start0 Yes, this is a valid temporary solution.
As mentioned above, I will keep you updated. Thank you for understanding!

from hub.

LightDex9 avatar LightDex9 commented on May 27, 2024

Hello, I have the same problem, the training stops after 33 epochs and i can't resume it (I'm using Colab)

from hub.

pderrenger avatar pderrenger commented on May 27, 2024

Hello! 😊 If your training in Colab stops and you're unable to resume it directly, make sure you're saving checkpoints at regular intervals during training. After a stoppage, you can resume training from the last saved checkpoint by specifying its path when initializing your training command. Please make sure your code for resuming training on Colab includes the path to the checkpoint. Remember, consistent checkpoints are key for smoothly resuming training, especially in environments like Colab that have time limits on sessions.

from hub.

LightDex9 avatar LightDex9 commented on May 27, 2024

Hello! 😊 If your training in Colab stops and you're unable to resume it directly, make sure you're saving checkpoints at regular intervals during training. After a stoppage, you can resume training from the last saved checkpoint by specifying its path when initializing your training command. Please make sure your code for resuming training on Colab includes the path to the checkpoint. Remember, consistent checkpoints are key for smoothly resuming training, especially in environments like Colab that have time limits on sessions.

Thanks for the reply, how can i see the path where the last checkpoint is saved on Colab? During training it says "Uploading Checkpoints https://hub.ultralytics.com/models/..." every 3 epochs, but I can't see saved checkpoints on Ultralytics Hub.

Edit: Now I've seen that on Colab it says "WARNING ⚠️ using HUB training arguments, ignoring local training arguments." and the argument "save_period" is equal to -1 in the HUB training

Screenshot 2024-04-16 071439

from hub.

Zero-start0 avatar Zero-start0 commented on May 27, 2024

Hello! 😊 If your training in Colab stops and you're unable to resume it directly, make sure you're saving checkpoints at regular intervals during training. After a stoppage, you can resume training from the last saved checkpoint by specifying its path when initializing your training command. Please make sure your code for resuming training on Colab includes the path to the checkpoint. Remember, consistent checkpoints are key for smoothly resuming training, especially in environments like Colab that have time limits on sessions.

Thanks for the reply, how can i see the path where the last checkpoint is saved on Colab? During training it says "Uploading Checkpoints https://hub.ultralytics.com/models/..." every 3 epochs, but I can't see saved checkpoints on Ultralytics Hub

Yeah, I have the same question. Although the Uploading Checkpoint was shown in the notebook but I can't find any checkpoint in the Ultralytics Hub

from hub.

sergiuwaxmann avatar sergiuwaxmann commented on May 27, 2024

Thanks for the reply, how can i see the path where the last checkpoint is saved on Colab? During training it says "Uploading Checkpoints https://hub.ultralytics.com/models/..." every 3 epochs, but I can't see saved checkpoints on Ultralytics Hub.

Edit: Now I've seen that on Colab it says "WARNING ⚠️ using HUB training arguments, ignoring local training arguments." and the argument "save_period" is equal to -1 in the HUB training

Screenshot 2024-04-16 071439

Hello @LightDex9!
Indeed, the local training arguments are ignored when training a model from Ultralytics HUB.

The log you see (Ultralytics HUB: Uploading checkpoint...) is shown when the upload starts but it doesn't check if the upload was successful and it doesn't retry if the job is interrupted.

Our team will investigate if there is an issue on our end related to the upload, and I will keep you updated.

from hub.

sergiuwaxmann avatar sergiuwaxmann commented on May 27, 2024

@Zero-start0 @LightDex9

I wanted to update you on the recent release from ultralytics, version 8.2.0, which addresses the issue you encountered. The checkpoints are now being uploaded correctly.

For verification, I conducted a test in my local virtual environment. I modified the "ckpt" value from 900.0 to 1.0 in the ultralytics/hub/session.py file and initiated training using my local agent. The results confirmed that the fix is effective.

Please feel free to reach out if you encounter any further issues.

from hub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.