The artfusion from chendaryen

我正在使用这行命令
python main.py --name test --base ./configs/kl16_content12.yaml --basedir ./checkpoints -t True --gpus 0,1,
然后有如下报错：
(T2I) zzj@zzj:~/disk1/zzj/ArtFusion-main$ python main.py --name test --base ./configs/kl16_content12.yaml --basedir ./checkpoints -t True --gpus 0,1,
Global seed set to 2023
Running on GPUs 0,1,
DualCondLDM: Running in eps-prediction mode
DualConditionDiffusionWrapper has 181.02 M params.
Keeping EMAs of 205.
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 16, 16, 16) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
Restored from checkpoints/vae/kl-f16.ckpt
Training DualCondLDM as an adaptive conditional model.
Keeping EMAs of 3.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
datasets/wiki-art/train.npy
total 79432 hybrid training data, 118287 content data.
datasets/wiki-art/train.npy
total 79432 hybrid training data, 118287 content data.
accumulate_grad_batches = 8
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Global seed set to 2023
initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
Global seed set to 2023
initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

Save project config
exception!!!!!
Summoning checkpoint.
Save last in ./checkpoints/2023-07-17T04-24-58_test/models/last.ckpt
Traceback (most recent call last):
File "main.py", line 608, in
trainer.fit(model, data)
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 173, in start_training
self.spawn(self.new_process, trainer, self.mp_queue, return_result=False)
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 201, in spawn
mp.spawn(self._wrapped_function, args=(function, args, kwargs, return_queue), nprocs=self.num_processes)
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 208, in _wrapped_function
result = function(*args, **kwargs)
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 236, in new_process
results = trainer.run_stage()
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1306, in _run_train
self._pre_training_routine()
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1301, in _pre_training_routine
self.call_hook("on_pretrain_routine_start")
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1495, in call_hook
callback_fx(*args, **kwargs)
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 148, in on_pretrain_routine_start
callback.on_pretrain_routine_start(self, self.lightning_module)
File "/home/zzj/disk1/zzj/ArtFusion-main/main.py", line 223, in on_pretrain_routine_start
os.path.join(self.cfgdir, "{}-project.yaml".format(self.now)))
File "/home/zzj/anaconda3/envs/T2I/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 223, in save
with io.open(os.path.abspath(f), "w", encoding="utf-8") as file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/zzj/disk1/zzj/ArtFusion-main/checkpoints/2023-07-17T04-24-58_test/configs/2023-07-17T04-24-58-project.yaml'

在初始化第二张卡时候报错，请问能否告知解决办法。

About training on my personal dataset

Hello, bro, I am very interested in your research. Can the code for the first phase of VAE model training be updated simultaneously? Since the dataset I trained was not based on ImageNet or Coco, the subsequent model training results were bad, and the problem shown in the figure below occurred. Thanks for your contributions to this field. Looking forward to your reply.

非常期待您的工作，请问您大概什么时候可以开源代码

High resolution content image

In Supplementary Material, part E. Additional Qualitative Results, you mention this work can apply on higher-dimensional content images, while keeping the size of the style images at 256x256.
Can you explain or given an example on how to modift the code to generate result using 512x512 content image and 256x256 style image?
Thanks.

训练200轮的时候，训练过程中所有的sample图片都是下面这种图，请问正常吗

test

If I just want to test this project for training models, what should I do？

chendaryen / artfusion Goto Github PK

artfusion's People

Contributors

Stargazers

Watchers

Forkers

artfusion's Issues

推理的时候没有加载encode和decode模型

请问您的代码不支持双卡运行吗

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

About training on my personal dataset

非常期待您的工作，请问您大概什么时候可以开源代码

High resolution content image

训练200轮的时候，训练过程中所有的sample图片都是下面这种图，请问正常吗

test

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

chendaryen / artfusion Goto Github PK

artfusion's People

Contributors

Stargazers

Watchers

Forkers

artfusion's Issues

distributed_backend=nccl All distributed processes registered. Starting with 2 processes

Recommend Projects

Recommend Topics

Recommend Org

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes