Comments (6)
Hi!
Regarding the images for toptee, shirt, and dress, I had all the images in the same directory (see fashioniq-base.yaml config file). If your setup has them in separate directories, you could try updating the img_dirs
paths in each config/data/fashioniq-split.yaml
like this:
img_dirs:
train: ${data.dataset_dir}/images/<split>
val: ${data.dataset_dir}/images/<split>
I'm not entirely sure if this will work when testing, but it should be fine for training.
For missing image links, it seems that FashionIQDataset expects all the images. You can handle missing images in several ways:
- Redownload missing images using the provided links in the README.
- Adapt the FashionIQDataset to skip missing files, similar to what I did in WebVidCoVRDataset.
- Alternatively, create a reduced dataset:
cp -r annotation/fashion-iq annotation/fashion-iq_small
, remove the files you don't have, and changeconfigs/data/fashioniq-base.yaml
with the new annotation directory.
Best of luck!
from covr.
I was able to download the missing images. Though after running train.py
I seem to have come across another issue. Was the download for a 2M missing or is it something else? I noticed that 2M doesn't come with validation links either.
[2024-05-09 09:49:22,736][torch.distributed.distributed_c10d][INFO] - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
Error executing job with overrides: []
Error executing job with overrides: []
Error in call to target 'src.data.webvid_covr.WebVidCoVRDataModule':
AssertionError('Embedding directory /work/user/CoVR/datasets//WebVid/2M/blip-vid-embs-large-all does not exist')
Edit: also using the checkpoints you provided does imply that everything else is working. The only problem now is this:
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.16 GiB (GPU 1; 31.74 GiB total capacity; 19.99 GiB already allocated; 3.09 GiB free; 28.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
So I'll just need to try either doing a slurm file, or mess with the data size.
from covr.
Update on the test.py and the fashion IQ. Running the code with the test code results in errors like this which is weird since the images are in the directory:
AssertionError('Path to candidate B00CZ7QJUG not found in /work/user/CoVR/datasets/fashion-iq/images')
full_key: test.fashioniq-shirt
(covr) [gpu033: CoVR]$ find . -name "B00CZ7QJUG*"
./datasets/fashion-iq/MissingImages/JPG_Images/B00CZ7QJUG.jpg
./datasets/fashion-iq/images/B00CZ7QJUG.jpg
AssertionError('Path to candidate B005X4PL1G not found in /work/user/CoVR/datasets/fashion-iq/images')
full_key: test.fashioniq-dress
(covr) [gpu033: CoVR]$ find . -name "B005X4PL1G*"
./datasets/fashion-iq/images/B005X4PL1G.jpg
AssertionError('Path to candidate B008CFZW76 not found in /work/user/CoVR/datasets/fashion-iq/images')
full_key: test.fashioniq-toptee
(covr) [gpu033: CoVR]$ find . -name "B008CFZW76*"
./datasets/fashion-iq/images/B008CFZW76.jpg
Any Ideas?
from covr.
Hi @Agarciahunter,
It looks like you're encountering two different issues:
-
Embeddings Extraction: Based on your error logs, it seems that you haven't extracted the target embeddings for WebVid nor FashionIQ. You can do this with the following commands:
# This will compute the embeddings for the WebVid-CoVR videos. # Note that you can use multiple GPUs with --num_shards and --shard_id python tools/embs/save_blip_embs_vids.py --video_dir datasets/WebVid/2M/train --todo_ids annotation/webvid-covr/webvid2m-covr_train.csv # This will compute the embeddings for the WebVid-CoVR-Test videos. python tools/embs/save_blip_embs_vids.py --video_dir datasets/WebVid/8M/train --todo_ids annotation/webvid- covr/webvid8m-covr_test.csv # This will compute the embeddings for FashionIQ images. python tools/embs/save_blip_embs_imgs.py --image_dir datasets/fashion-iq/images/
-
Memory Management: To manage GPU memory more efficiently and avoid the
CUDA out of memory
error, you can adjust the number of devices and batch sizes. Hereβs how you can modify these settings:trainer.devices=X # replace X with the number of GPUs machine.batch_size=Y # replace Y with a suitable batch size
I used
trainer=ddp
with SLURM for distributed training.
Please try these suggestions and let me know if you encounter further issues.
from covr.
Unfortunately that didn't seem to fix the missing fashion images issues. Granted I restarted it to see if any changes I made to files is causing the problem. I'll keep you posted on if it doesn't work.
Also on the plus side I was able to get the code to run without a slurm job by setting the batch size to 256.
Side note what does changing the num_workers effect? haven't seen a difference going from 4 to 8.
from covr.
Did you fix the issues?
from covr.
Related Issues (20)
- The website can not open? HOT 2
- please provide feature file HOT 4
- readme file error HOT 1
- readme issues HOT 1
- your paper table4 issue HOT 1
- Question about the Increase from 1.2M Paired Videos to 1.6M Triplets HOT 2
- fashionIQ datasets class issue HOT 2
- Dataset download paths not working HOT 2
- Running Cost of this repo (memory/GPU/time) HOT 1
- FashionIQ fine-tuned weights HOT 3
- Question about training multiple frames and fusion mechanism HOT 1
- Does batch size have a significant impact on performance? HOT 2
- Could you add a requirements file? HOT 3
- Question about the test of CIRR HOT 11
- Wget issue. HOT 1
- What is the search space for a query in test set?? HOT 1
- Discrepancy in CIRR test set HOT 2
- The inquiry about test CIRR dataset HOT 2
- The inqury about test target video embedding vector HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covr.