Code Monkey home page Code Monkey logo

Comments (6)

lucas-ventura avatar lucas-ventura commented on August 23, 2024

Hi!

Regarding the images for toptee, shirt, and dress, I had all the images in the same directory (see fashioniq-base.yaml config file). If your setup has them in separate directories, you could try updating the img_dirs paths in each config/data/fashioniq-split.yaml like this:

img_dirs:
  train: ${data.dataset_dir}/images/<split>
  val: ${data.dataset_dir}/images/<split>

I'm not entirely sure if this will work when testing, but it should be fine for training.


For missing image links, it seems that FashionIQDataset expects all the images. You can handle missing images in several ways:

  1. Redownload missing images using the provided links in the README.
  2. Adapt the FashionIQDataset to skip missing files, similar to what I did in WebVidCoVRDataset.
  3. Alternatively, create a reduced dataset: cp -r annotation/fashion-iq annotation/fashion-iq_small, remove the files you don't have, and change configs/data/fashioniq-base.yaml with the new annotation directory.

Best of luck!

from covr.

Agarciahunter avatar Agarciahunter commented on August 23, 2024

I was able to download the missing images. Though after running train.py I seem to have come across another issue. Was the download for a 2M missing or is it something else? I noticed that 2M doesn't come with validation links either.

[2024-05-09 09:49:22,736][torch.distributed.distributed_c10d][INFO] - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
Error executing job with overrides: []
Error executing job with overrides: []
Error in call to target 'src.data.webvid_covr.WebVidCoVRDataModule':
AssertionError('Embedding directory /work/user/CoVR/datasets//WebVid/2M/blip-vid-embs-large-all does not exist')

Edit: also using the checkpoints you provided does imply that everything else is working. The only problem now is this:

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.16 GiB (GPU 1; 31.74 GiB total capacity; 19.99 GiB already allocated; 3.09 GiB free; 28.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

So I'll just need to try either doing a slurm file, or mess with the data size.

from covr.

Agarciahunter avatar Agarciahunter commented on August 23, 2024

Update on the test.py and the fashion IQ. Running the code with the test code results in errors like this which is weird since the images are in the directory:

AssertionError('Path to candidate B00CZ7QJUG not found in /work/user/CoVR/datasets/fashion-iq/images')
full_key: test.fashioniq-shirt
(covr) [gpu033: CoVR]$ find . -name "B00CZ7QJUG*"
./datasets/fashion-iq/MissingImages/JPG_Images/B00CZ7QJUG.jpg
./datasets/fashion-iq/images/B00CZ7QJUG.jpg

AssertionError('Path to candidate B005X4PL1G not found in /work/user/CoVR/datasets/fashion-iq/images')
full_key: test.fashioniq-dress
(covr) [gpu033: CoVR]$ find . -name "B005X4PL1G*"
./datasets/fashion-iq/images/B005X4PL1G.jpg

AssertionError('Path to candidate B008CFZW76 not found in /work/user/CoVR/datasets/fashion-iq/images')
full_key: test.fashioniq-toptee
(covr) [gpu033: CoVR]$ find . -name "B008CFZW76*"
./datasets/fashion-iq/images/B008CFZW76.jpg

Any Ideas?

from covr.

lucas-ventura avatar lucas-ventura commented on August 23, 2024

Hi @Agarciahunter,

It looks like you're encountering two different issues:

  1. Embeddings Extraction: Based on your error logs, it seems that you haven't extracted the target embeddings for WebVid nor FashionIQ. You can do this with the following commands:

    # This will compute the embeddings for the WebVid-CoVR videos. 
    # Note that you can use multiple GPUs with --num_shards and --shard_id
    python tools/embs/save_blip_embs_vids.py --video_dir datasets/WebVid/2M/train --todo_ids annotation/webvid-covr/webvid2m-covr_train.csv 
    
    # This will compute the embeddings for the WebVid-CoVR-Test videos.
    python tools/embs/save_blip_embs_vids.py --video_dir datasets/WebVid/8M/train --todo_ids annotation/webvid-   covr/webvid8m-covr_test.csv 
    
    # This will compute the embeddings for FashionIQ images.
    python tools/embs/save_blip_embs_imgs.py --image_dir datasets/fashion-iq/images/
  2. Memory Management: To manage GPU memory more efficiently and avoid the CUDA out of memory error, you can adjust the number of devices and batch sizes. Here’s how you can modify these settings:

    trainer.devices=X  # replace X with the number of GPUs
    machine.batch_size=Y  # replace Y with a suitable batch size

    I used trainer=ddp with SLURM for distributed training.

Please try these suggestions and let me know if you encounter further issues.

from covr.

Agarciahunter avatar Agarciahunter commented on August 23, 2024

Unfortunately that didn't seem to fix the missing fashion images issues. Granted I restarted it to see if any changes I made to files is causing the problem. I'll keep you posted on if it doesn't work.

Also on the plus side I was able to get the code to run without a slurm job by setting the batch size to 256.

Side note what does changing the num_workers effect? haven't seen a difference going from 4 to 8.

from covr.

lucas-ventura avatar lucas-ventura commented on August 23, 2024

Did you fix the issues?

from covr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.