Comments (15)
@maxin-cn how can i generate the train_256_list.txt, is there some process scripts? if I want to train with ucf101_img_train.yaml
Please refer to the following code (firstly transfer video to frames and resize at the same time. You can refer to this.):
import os
from tqdm import tqdm
ffs_image_root = '/UCF101/images_256/'
ffs_image_txt = '/UCF101/train_256_list.txt'
def get_filelist(file_path):
Filelist = []
for home, dirs, files in os.walk(file_path):
for filename in files:
Filelist.append(os.path.join(home, filename))
# Filelist.append(filename)
return Filelist
ffs_files = get_filelist(ffs_image_root)
for i in tqdm(ffs_files):
relative_path = i.split(ffs_image_root)[-1]
with open(ffs_image_txt, 'a+') as f:
f.writelines(relative_path + '\n')
from latte.
Dear authors,
I really appreciate your exceptional open-source work. As a newcomer to the video field, I have been exploring your repository and have a couple of questions that I hope you can help clarify.
- Training Step vs. Epoch: the training procedure specifies
max_train_steps
rather thanmax_train_epochs
. My understanding is that when using multiple GPUs, the steps per epoch decrease, but the total number of epochs increases, which doesn't seem to leverage the acceleration benefits of multi-GPU training. Could you please explain the rationale behind this setting? And why does the Fig.8 only use150k
training iterations at most butffs_train.yaml
and 'ucf101_train.yaml' adoptmax_train_steps=1000k
?- UCF-101 Dataset Training Details: I would also like to inquire about the training resources, time, and performance of the model on the UCF-101 dataset. According to the code, training the
Latte-S/2
model on UCF-101 with two 32G NVIDIA V100 GPUs takes approximately 2.5 minutes for every 100 steps. This would translate to an 18-day training period for 1 million steps, which seems quite extensive. Could you provide details on the configuration used for training? Additionally, information on the training time and Inception Score (IS) precision for the Latte-S/2 model on the UCF-101 dataset would be invaluable. You mentioned that 8 cards were used for all experiments in Issue #39 and I wonder which type of GPUs are used.Thank you for your time and for maintaining such a fantastic resource for the community.
Best regards,
Shun Lu
Hi, thanks for your interest.
- The
max_train_steps
parameter inffs_train.yaml
does not necessarily mean that the model needs to be trained up to 1000k steps; you can consider it as a maximum value. - The configuration in this repository is almost the same as the configuration I used. I use 8 A100 GPUs (80G) to conduct all the experiments shown in the paper (except for LatteT2V).
If you have any questions, please let me know.
from latte.
thanks for your great work! i want to know how to generate the /path/to/datasets/UCF101/train_256_list.txt for the UCF101 training。 After downloading the UCF101 videos, according to the paper "We extract 16-frame video clips from these datasets", are there any process scripts we can follow?
Hi, thanks for your interest. train_256_list.txt
contains the following information (video-class-name_video-name_frame):
As for the second question, please follow this link.
from latte.
@maxin-cn how can i generate the train_256_list.txt, is there some process scripts? if I want to train with ucf101_img_train.yaml
from latte.
Hi, @maxin-cn
I processed the UCF101 dataset following your above suggestions and the data is organized as follows:
But when I try to start training using train_with_img.py, it gets frozen with a super high CPU occupation.
Any idea?
[2024-03-01 12:24:03] Experiment directory created at ./results_img/000-LatteIMG-S-2-F16S3-ucf101_img
Starting rank=1, local rank=1, seed=3408, world_size=2.
[2024-03-01 12:24:05] Model Parameters: 32,624,288
[2024-03-01 12:24:07] Dataset contains 2,486,613 videos (./data/UCF-101)
from latte.
Hi, @maxin-cn
I processed the UCF101 dataset following your above suggestions and the data is organized as follows:
But when I try to start training using train_with_img.py, it gets frozen with a super high CPU occupation. Any idea?
[2024-03-01 12:24:03] Experiment directory created at ./results_img/000-LatteIMG-S-2-F16S3-ucf101_img Starting rank=1, local rank=1, seed=3408, world_size=2. [2024-03-01 12:24:05] Model Parameters: 32,624,288 [2024-03-01 12:24:07] Dataset contains 2,486,613 videos (./data/UCF-101)
@valencebond Hi, could you share any experience with @xszheng2020 ? Thank you very much~
from latte.
Hi, @maxin-cn
The images should be placed as follows, right?
['./UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000148.jpg',
'./UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000105.jpg',
'./UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000155.jpg',
'./UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000037.jpg',
'./UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000027.jpg',
'./UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000126.jpg',
...]
And the train_256_list.txt contains:
WritingOnBoardv_WritingOnBoard_g02_c04/000148.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000105.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000155.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000037.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000027.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000126.jpg
...
from latte.
Hi, @maxin-cn
The images should be placed as follows, right?
['./UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000148.jpg', './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000105.jpg', './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000155.jpg', './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000037.jpg', './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000027.jpg', './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000126.jpg', ...]
And the train_256_list.txt contains:
WritingOnBoardv_WritingOnBoard_g02_c04/000148.jpg WritingOnBoardv_WritingOnBoard_g02_c04/000105.jpg WritingOnBoardv_WritingOnBoard_g02_c04/000155.jpg WritingOnBoardv_WritingOnBoard_g02_c04/000037.jpg WritingOnBoardv_WritingOnBoard_g02_c04/000027.jpg WritingOnBoardv_WritingOnBoard_g02_c04/000126.jpg ...
That's right.
from latte.
Dear authors,
I really appreciate your exceptional open-source work. As a newcomer to the video field, I have been exploring your repository and have a couple of questions that I hope you can help clarify.
-
Training Step vs. Epoch: the training procedure specifies
max_train_steps
rather thanmax_train_epochs
. My understanding is that when using multiple GPUs, the steps per epoch decrease, but the total number of epochs increases, which doesn't seem to leverage the acceleration benefits of multi-GPU training. Could you please explain the rationale behind this setting? And why does the Fig.8 only use150k
training iterations at most butffs_train.yaml
and 'ucf101_train.yaml' adoptmax_train_steps=1000k
? -
UCF-101 Dataset Training Details: I would also like to inquire about the training resources, time, and performance of the model on the UCF-101 dataset. According to the code, training the
Latte-S/2
model on UCF-101 with two 32G NVIDIA V100 GPUs takes approximately 2.5 minutes for every 100 steps. This would translate to an 18-day training period for 1 million steps, which seems quite extensive. Could you provide details on the configuration used for training? Additionally, information on the training time and Inception Score (IS) precision for the Latte-S/2 model on the UCF-101 dataset would be invaluable. You mentioned that 8 cards were used for all experiments in Issue #39 and I wonder which type of GPUs are used.
Thank you for your time and for maintaining such a fantastic resource for the community.
Best regards,
Shun Lu
from latte.
Dear authors,
I really appreciate your exceptional open-source work. As a newcomer to the video field, I have been exploring your repository and have a couple of questions that I hope you can help clarify.
- Training Step vs. Epoch: the training procedure specifies
max_train_steps
rather thanmax_train_epochs
. My understanding is that when using multiple GPUs, the steps per epoch decrease, but the total number of epochs increases, which doesn't seem to leverage the acceleration benefits of multi-GPU training. Could you please explain the rationale behind this setting? And why does the Fig.8 only use150k
training iterations at most butffs_train.yaml
and 'ucf101_train.yaml' adoptmax_train_steps=1000k
?- UCF-101 Dataset Training Details: I would also like to inquire about the training resources, time, and performance of the model on the UCF-101 dataset. According to the code, training the
Latte-S/2
model on UCF-101 with two 32G NVIDIA V100 GPUs takes approximately 2.5 minutes for every 100 steps. This would translate to an 18-day training period for 1 million steps, which seems quite extensive. Could you provide details on the configuration used for training? Additionally, information on the training time and Inception Score (IS) precision for the Latte-S/2 model on the UCF-101 dataset would be invaluable. You mentioned that 8 cards were used for all experiments in Issue #39 and I wonder which type of GPUs are used.Thank you for your time and for maintaining such a fantastic resource for the community.
Best regards,
Shun LuHi, thanks for your interest.
- The
max_train_steps
parameter inffs_train.yaml
does not necessarily mean that the model needs to be trained up to 1000k steps; you can consider it as a maximum value.- The configuration in this repository is almost the same as the configuration I used. I use 8 A100 GPUs (80G) to conduct all the experiments shown in the paper (except for LatteT2V).
If you have any questions, please let me know.
Great thanks to your prompt and insightful response. I have two more questions when reproducing the results:
-
Training Steps and Duration for Latte-XL/2 on UCF-101: Could you please specify the exact number of training steps and the training time when training the Latte-XL/2 using eight A100 GPUs on the UCF-101 dataset?
-
Availability of Latte-S/2 Model on UCF-101: Have you trained the Latte-S/2 model on the UCF-101 dataset? If so, would it be possible for you to share such a model for reference? It would be useful for me to verify my experiments.
Thank you once again for your time and support.
Warm regards
from latte.
Dear authors,
I really appreciate your exceptional open-source work. As a newcomer to the video field, I have been exploring your repository and have a couple of questions that I hope you can help clarify.
- Training Step vs. Epoch: the training procedure specifies
max_train_steps
rather thanmax_train_epochs
. My understanding is that when using multiple GPUs, the steps per epoch decrease, but the total number of epochs increases, which doesn't seem to leverage the acceleration benefits of multi-GPU training. Could you please explain the rationale behind this setting? And why does the Fig.8 only use150k
training iterations at most butffs_train.yaml
and 'ucf101_train.yaml' adoptmax_train_steps=1000k
?- UCF-101 Dataset Training Details: I would also like to inquire about the training resources, time, and performance of the model on the UCF-101 dataset. According to the code, training the
Latte-S/2
model on UCF-101 with two 32G NVIDIA V100 GPUs takes approximately 2.5 minutes for every 100 steps. This would translate to an 18-day training period for 1 million steps, which seems quite extensive. Could you provide details on the configuration used for training? Additionally, information on the training time and Inception Score (IS) precision for the Latte-S/2 model on the UCF-101 dataset would be invaluable. You mentioned that 8 cards were used for all experiments in Issue #39 and I wonder which type of GPUs are used.Thank you for your time and for maintaining such a fantastic resource for the community.
Best regards,
Shun LuHi, thanks for your interest.
- The
max_train_steps
parameter inffs_train.yaml
does not necessarily mean that the model needs to be trained up to 1000k steps; you can consider it as a maximum value.- The configuration in this repository is almost the same as the configuration I used. I use 8 A100 GPUs (80G) to conduct all the experiments shown in the paper (except for LatteT2V).
If you have any questions, please let me know.
Great thanks to your prompt and insightful response. I have two more questions when reproducing the results:
- Training Steps and Duration for Latte-XL/2 on UCF-101: Could you please specify the exact number of training steps and the training time when training the Latte-XL/2 using eight A100 GPUs on the UCF-101 dataset?
- Availability of Latte-S/2 Model on UCF-101: Have you trained the Latte-S/2 model on the UCF-101 dataset? If so, would it be possible for you to share such a model for reference? It would be useful for me to verify my experiments.
Thank you once again for your time and support.
Warm regards
- You can refer to #58. I remember training on an 8 A100 for 2 days or so and the model could generate video.
- I have only trained models of different sizes on the FFS dataset, and I can share these models with you if you need them.
from latte.
r
Sincere appreciation for the valuable information.
- Could you kindly inform me about the exact/approximate training steps required to reach an accuracy of 68.53% on UCF101 for the model listed in Table 1?
- Additionally, I really need these models on the FFS dataset and the exact/approximate training steps. I kindly request them at your convenience to my email: [email protected].
Really thanks for your generous support.
from latte.
r
Sincere appreciation for the valuable information.
- Could you kindly inform me about the exact/approximate training steps required to reach an accuracy of 68.53% on UCF101 for the model listed in Table 1?
- Additionally, I really need these models on the FFS dataset and the exact/approximate training steps. I kindly request them at your convenience to my email: [email protected].
Really thanks for your generous support.
- It is difficult for me to tell you the exact number of training steps for the model that achieved this Inception Score because I forgot which model was used. But it must have taken a long time (about several weeks).
- I have uploaded these models to here (all models training about 250k iterations).
from latte.
r
Sincere appreciation for the valuable information.
- Could you kindly inform me about the exact/approximate training steps required to reach an accuracy of 68.53% on UCF101 for the model listed in Table 1?
- Additionally, I really need these models on the FFS dataset and the exact/approximate training steps. I kindly request them at your convenience to my email: [email protected].
Really thanks for your generous support.
- It is difficult for me to tell you the exact number of training steps for the model that achieved this Inception Score because I forgot which model was used. But it must have taken a long time (about several weeks).
- I have uploaded these models to here (all models training about 250k iterations).
Got it and thanks a lot.
from latte.
Hi There! 👋
This issue has been marked as stale due to inactivity for 14 days.
We would like to inquire if you still have the same problem or if it has been resolved.
If you need further assistance, please feel free to respond to this comment within the next 7 days. Otherwise, the issue will be automatically closed.
We appreciate your understanding and would like to express our gratitude for your contribution to Latte. Thank you for your support. 🙏
from latte.
Related Issues (20)
- the code of variant 4 HOT 1
- Question: evaluate the FVD HOT 6
- Error once speed up training HOT 2
- How to get preprocessed_ffs HOT 4
- Any plan to implement Latte in HuggingFace's diffusers library? HOT 3
- 模型在ucf101上无法收敛 HOT 5
- Can Latte train for I2V tasks? HOT 2
- Batch Size Ablations HOT 1
- what the param <input_sq_size> stands for? HOT 2
- Can we use batch_size>1 in sample_t2x.py HOT 7
- Evaluate the `FVD` on FFS HOT 4
- How can I utilize the weights of pre-trained PixArt-α to initialize the parameters of the spatial Transformer block in the Latte T2V model? HOT 2
- FVD on UCF-101 HOT 15
- Is there tutorial on transfering t2i to t2v model? HOT 5
- inference memory with torch.set_grad_enabled(True) HOT 2
- T2V training and evaluation HOT 7
- Results & ckpts of different sized Latte on UCF-101 HOT 3
- 训练模型后的效果评估 HOT 4
- FFS evaluation HOT 5
- Some problems with reimplementing the training process HOT 20
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from latte.