nvlabs / imaginaire Goto Github PK

NVIDIA's Deep Imagination Team's PyTorch Library

License: Other

Python 95.48% Shell 0.65% Batchfile 0.04% Makefile 0.02% Cuda 3.47% C++ 0.33%

imaginaire's Introduction

Imaginaire

Docs | License | Installation | Model Zoo

Imaginaire is a pytorch library that contains optimized implementation of several image and video synthesis methods developed at NVIDIA.

License

Imaginaire is released under NVIDIA Software license. For commercial use, please consult NVIDIA Research Inquiries.

What's inside?

We have a tutorial for each model. Click on the model name, and your browser should take you to the tutorial page for the project.

Supervised Image-to-Image Translation

Algorithm Name	Feature	Publication
pix2pixHD	Learn a mapping that converts a semantic image to a high-resolution photorealistic image.	Wang et. al. CVPR 2018
SPADE	Improve pix2pixHD on handling diverse input labels and delivering better output quality.	Park et. al. CVPR 2019

Unsupervised Image-to-Image Translation

Algorithm Name	Feature	Publication
UNIT	Learn a one-to-one mapping between two visual domains.	Liu et. al. NeurIPS 2017
MUNIT	Learn a many-to-many mapping between two visual domains.	Huang et. al. ECCV 2018
FUNIT	Learn a style-guided image translation model that can generate translations in unseen domains.	Liu et. al. ICCV 2019
COCO-FUNIT	Improve FUNIT with a content-conditioned style encoding scheme for style code computation.	Saito et. al. ECCV 2020

Video-to-video Translation

Algorithm Name	Feature	Publication
vid2vid	Learn a mapping that converts a semantic video to a photorealistic video.	Wang et. al. NeurIPS 2018
fs-vid2vid	Learn a subject-agnostic mapping that converts a semantic video and an example image to a photoreslitic video.	Wang et. al. NeurIPS 2019

World-to-world Translation

Algorithm Name	Feature	Publication
wc-vid2vid	Improve vid2vid on view consistency and long-term consistency.	Mallya et. al. ECCV 2020
GANcraft	Convert semantic block worlds to realistic-looking worlds.	Hao et. al. ICCV 2021

imaginaire's People

Contributors

Stargazers

Watchers

Forkers

chomolungma zebrajack zhangsdly danigy toshan16 ariesw sjefvanleeuwen rajasekar24 amauryjunior kp-forks rainabba lzhbrian itsmeakapa markusbuchholz sancakozdemir boursa rozgo mshans66 ahuirecome zcunyi beoy winjia elbert-lau ml-and-ai-repo freewind2016 jackarch xqpinitial zumbalamambo kevinn1999 codeaudit scape1989 szvajk albertotono bethanymbaker abdelpakey nolll77 ptljeet arish92 cv-ip ankitshah009 devd1092 samuelpietri csu-gh jonggan-kim fdoperezi pinigin hongqin nguyenanhtien dcthang shngt cule tebin jovialio haroldss saeedseyyedi genetictools yangtong1989 jackliaoall-ai-video wangliwei-intel peterli1001 hlgrprng nkjack gutihernandez edwardlmaooooooo muralits98 nguyenducnhaty youtang1993 bruinxiong zidanmusk mikechen66 zeta1999 harshrpg shui-zuode vinyvince pranish-ramteke hephaex 1248226869 lotayou jjdbear peterouzh tanmdl dongyuya darwin-systems ashishpatel26 zhaoyk1986 taktak1 eridanletalis miaorain cheparsky grigoriitishchuk xrosliang qiuweibin2005 angellfear zdyshine jiaruixu aleksanderceferin attashe lmxyy tianhaoyue xiamenwcy

imaginaire's Issues

[code] Will the code of the paper(One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing) be made public in this project?

Hi，
I am reading the paper named "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing", which has attractive performances. And I find this repository under the guidance of that article, which say it will release the source code.
However, I can not find the code about that paper, and I want to know if the source code will be put here.
In addition, this project is very nice, thank you for contributors.
@tcwang0509 @arunmallya （^_^）

Google colab notebooks, anyone?

Hello, Has anyone gotten this working with Google Colab yet? Would be very excited to check it out!! I'm working on it, but haven't had success yet.

One major issue: Colab is running cuda 10.1, and it looks like it may not be possible to do a local installation of cuda 10.2, which is needed for Imaginaire to work.

Paired Image to Image Translation

paired Image to Image Translation

Thank you very much for the awesome library!

I want to perform a paired image to image translation like in the previous pix2pixHD implementation without any label or instance maps, i.e. image from domain A -> domain B.

In the previous implementation, one had to set

If your input is not a label map, please just specify --label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A, train_B instead of train_label, train_img, where the goal is to translate images from A to B.
If you don't have instance maps or don't want to use them, please specify --no_instance.

How can this be done in the imaginaire implementation?

In the yaml file, I set



    type: imaginaire.datasets.paired_images    
    num_workers: 4
    input_types:
        - images:
            ext: png
            num_channels: 3
            interpolator: BILINEAR
            normalize: True
            #use_dont_care: True
        - seg_maps:
            ext: png
            num_channels: 3 # 35
            interpolator: BILINEAR
            normalize: True
            #use_dont_care: True
        #- instance_maps:
        #    ext: png
        #    num_channels: 1
        #    interpolator: NEAREST
        #    normalize: False
        #    use_dont_care: True
    
    input_image:
        - images

    input_labels:
        - seg_maps
        #- instance_maps

i.e. the instance maps are commented out and instead of 'train_A' I use seg_maps as the source image directory that shall be translated to the directory 'images' (train_B in previous implementation). Is this the way to do it here?

The output images are 4 columns. What is the meaning of the columns ? Column 1 seems to be the input image column 2 seems to be a label map. Columns 3 and 4 do look almost identical to column 1 (probably the synthesized images), therefore it seems that something went wrong with my settings, as no translation has been performed.

Could SPADE be used in an identical way or can the SPADE generator be used as a replacement for pix2pixHD ?

environment

animal_faces dataset Google driver link ?

Hi, thank your for your sharing the code and dataset , But I can not link to google driver in command line .

Would you give a link of google driver ??

RuntimeError: expected scalar type Half but found Float

I ran install.sh and then tried running the test script but this is the error I am getting. I have been struggling with getting this library working for more than three weeks now and have fixed multiple issues through long stackoverflow and github searched. Please any assistance would be very helpful.

try-i.log

UnboundLocalError: local variable 'data' referenced before assignment

Hi， when i run the code 'python train.py --single_gpu --config configs/projects/fs_vid2vid/faceForensics/ampO1.yaml' on the reference data of 'projects/fs_vid2vid/test_data/faceForensics/driving'.
And the trained dataset and the val dataset are into LMDB format based on 'python scripts/build_lmdb.py --config configs/projects/fs_vid2vid/faceForensics/ampO1.yaml --data_root /projects/fs_vid2vid/test_data/faceForensics/driving/ --output_root datasets/faceForensics/lmdb/[train|val] --paired --overwrite'.
During training, the error is :

Traceback (most recent call last):
File "train.py", line 104, in
main()
File "train.py", line 98, in main
trainer.end_of_epoch(data, current_epoch, current_iteration)
UnboundLocalError: local variable 'data' referenced before assignment

I found that the reason for this problem is for it, data in enumerate(train_data_loader):(train.py line72) this for loop is not executed.
How could I solve this problem?

Building LMDB with paired option

I am trying to train pix2pixHD model for edge-to-image translation with my own dataset. I prepared my dataset as follows

my_dataset
    └── train
        ├── images
        │   ├── 0.jpg
        │   ├── 1.jpg
        │   ├── 2.jpg
        │
        └── seg_maps
            ├── 0.jpg
            ├── 1.jpg
            ├── 2.jpg

I also have val set which I omitted above.

I modified the config file as mentioned in #10. However when I started training, I got the following error.

b'images/.'
Traceback (most recent call last):
  File "train.py", line 90, in <module>
    main()
  File "train.py", line 69, in main
    for it, data in enumerate(train_data_loader):
  File "/home/rapsodo/.conda/envs/transformer_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/home/rapsodo/.conda/envs/transformer_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 403, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/rapsodo/.conda/envs/transformer_env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/rapsodo/.conda/envs/transformer_env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/rapsodo/Desktop/587/imaginaire/imaginaire/datasets/paired_videos.py", line 302, in __getitem__
    return self._getitem(index, concat=True)
  File "/home/rapsodo/Desktop/587/imaginaire/imaginaire/datasets/paired_videos.py", line 240, in _getitem
    data = self.load_from_dataset(keys, lmdbs)
  File "/home/rapsodo/Desktop/587/imaginaire/imaginaire/utils/data.py", line 408, in load_from_lmdb
    key.encode(), data_type))
  File "/home/rapsodo/Desktop/587/imaginaire/imaginaire/datasets/lmdb.py", line 72, in getitem_by_path
    if img.ndim == 3 and img.shape[-1] == 3:
UnboundLocalError: local variable 'img' referenced before assignment

When I debugged the code, I realized that LMDB dataset should be built with paired=True. Am I right?

If I am right, I think build_lmbd.py command in "https://github.com/NVlabs/imaginaire/blob/master/projects/pix2pixhd/README.md" should be updated.

Thank you.

Different results in content encoding in munit with amp O1 or amp O0

Hi there,

First of all thank you for this amazing library.
It really helps people like me to bootstrap in the amazing world of generative networks!

That said I have noticed a strange behaviour when running the training on munit/afhq_doc2cat:

when you run in amp O1 optimization level, the content reconstruction error diverges (error above 3.5 all the time)
however running exactly the same settings but with amp O0, the same content reconstruction error converges to values as small as 0.6
See attached content_recon.png

I wonder if this is the expected behaviour.

Additional information:

the same behaviour happens when using torch.cuda.amp instead of apex
the same behaviour happens when using my dataset which is not about dogs and cats ;)
the style encoding reconstruction error does NOT seem to suffer from the same issue
See attached style_recon.png

Configuration:

ubuntu 18.04
2 V100 GPUs
nvidia driver 450.66
pytorch 1.6
cuda 10.2.89
cudnn 8.04.30

Looking forward to reading from you,

Pierre

RuntimeError: CUDA error: no kernel image is available for execution on the device

I ran the install.sh script and it completed successfully, however when I try test_training.sh I get RuntimeError: CUDA error: no kernel image is available for execution on the device. Thanks for any suggestions

try.log

Perceptual Loss used for MUNIT not using Instance Normalization for content comparison only

Hi there,

If I understood correctly the original MUNIT article/implementation, the vgg loss was doing an additional Instance normalization to "remove" Style from the input image.
In the imaginaire implementation, the PerceptualLoss referenced in the MUNIT trainer seems to be the original perception loss, without any Instance normalization.

Am I missing something?
If this is correct, does it still make sense to call the vgg loss in addition to the others for MUNIT?

Thank you in advance,

Pierre

Ask a question about YoutubeDance datasets file paths in fs-vid2vid

Hello, I want to implement the pose synthesis in fs-vid2vid. I have downloaded a set of youtube dancing datasets that you have provided. And I have converted them to Openpose format and Densepos format. Finally generated the LMDB file. The file path is shown below:

pose
└───lmdb
    └───train
        └───human_instance_maps
               └───data.mdb
               └───lock.mdb
        └───images
               └───data.mdb
               └───lock.mdb
        └───poses-openpose
               └───data.mdb
               └───lock.mdb
        └───pose_maps-densepose
               └───data.mdb
               └───lock.mdb
        └───all_filenames.json
        └───metadata.json
    └───val
        └───human_instance_maps
               └───data.mdb
               └───lock.mdb
        └───images
                     ...(similar to train file path)
└───raw
    └───train
        └───human_instance_maps
               └───000000
                     └───frame000329_INDS.png
                     └───frame000330_INDS.png
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......
        └───images
               └───000000
                     └───frame000329.jpg
                     └───frame000330.jpg
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......
        └───poses-openpose
               └───000000
                     └───frame000329_keypoints.json
                     └───frame000330_keypoints.json
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......
        └───pose_maps-densepose
               └───000000
                     └───frame000329_IUV.png
                     └───frame000330_IUV.png
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......

Now, I use python -m torch.distributed.launch --nproc_per_node=4 train.py --config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml train this datasets. And ampO1.yaml roots part is shown in below

train:
        roots:
            - ./datasets/pose/lmdb/train/
        batch_size: 6
        initial_sequence_length: 4
        max_sequence_length: 16
        augmentations:
            resize_smallest_side: 540
            horizontal_flip: False
    
 val:
        roots:
            - ./datasets/pose/lmdb/val/
        batch_size: 1
        augmentations:            
            resize_smallest_side: 540
            horizontal_flip: False

However, I get the error

Traceback (most recent call last):
  File "train.py", line 99, in <module>
    main()
  File "train.py", line 93, in main
    trainer.end_of_epoch(data, current_epoch, current_iteration)
UnboundLocalError: local variable 'data' referenced before assignment

I found that the reason for this problem is for it, data in enumerate(train_data_loader):(train.py line77) this for loop is not executed.
I debug these codes, I get train_dataset object (dataset.py line 74) is shown below

Is there a problem with my datasets path, or is there something I need to improve?

mode collapse

hi, when I trained with coco_funit, In the first few epochs, the results are normal, but mode collapse appears from the 59th epoch. Is this normal? Will it also appear during your training?

COCO-Stuff edge maps

Thank you very much for the awesome library!

I want to know how to get the edge maps of COCO-Stuff for training SPADE. Currently, I only have COCO training images, COCO validation images, and seg_maps from stuffthingmaps_trainval2017.

fs_vid2vid pose model - youtube playlist 3

Third youtube plaliyst link has more than 90% videos with multiple people in single frame will that affect the model performance or does it help model generalize ?

Just wondering how does model learn from so much noise vs important data.

[wc_vid2vid] Style in the result suddenly changes

Thanks for sharing the code of the amazing work. Currently, I have used it for training a model on our own dataset but am now facing some problems.

I simply create a small training set of around 50 samples and test on the training set just for learning to run the code and making it overfit. The training starts with the provided pre-trained checkpoint. The overall video generated is good except for the first two frames.

These are the first 4 frames of a video generated. You can find that the 3rd and 4th frames overfit the ground truth very well but the 1st and 2nd frames seem like being generated by the original checkpoint (model).

Here are my questions:
(1) Does anyone have an idea why this happens? Actually, I modified the code a little bit, but I am not sure whether this is the reason. What I modified is line 61 of
https://github.com/NVlabs/imaginaire/blob/master/imaginaire/generators/wc_vid2vid.py, as originally Python raises an error that self.gen_cfg.single_image_model does not have attribute checkpoint. From the config file https://github.com/NVlabs/imaginaire/blob/master/configs/projects/wc_vid2vid/cityscapes/seg_ampO1.yaml, we can see that single_image_model does not have an attribute called checkpoint, either. Thus I simply set load_weights = False.

(2) Also, it seems that the final generated output does not use (copy) the original color from the guidance image at all, like letting the network generate the color of every pixel, am I wrong?

(3) Are depth images required for training/test? My dataset does not have depth images but there is no error running the code.

Look forward to receiving your reply. Thank you very much and wish you a happy new year.

Seek for the test dataset for COCO-FUNIT

In the COCO-FUNIT paper, the model was tested on four datasets, Carnivores, Mammals, Birds, and Motorbikes. However, in this repository, only the animal faces dataset(Carnivores) was available. So can you share the other three test datasets(Mammals, Birds, and Motorbikes)? It would help us a lot. Thanks.

Best regards.

Questions about the training data

Thanks for your such a great series of works on image and video synthesis.

I'm very interested in the work "World-Consistent Video-to-Video Synthesis", which solves the long-term visual consistency in video-to-video synthesis efficiently.

I hope to re-training models of the work and be more familiar with this area. However, I don't know how to deal with the data preparation, which consists of many steps, like edge map, depth map, segmentation label, and they are related to many tools/SOTA methods. Could you please give more details of these data preparation steps? Personally, I think it will help people enter this task and contribute the growth of this community.

Thanks.

Inference for fewshotvid2vid?

So I wanted to try an inference on my own inputs, therefore I followed the instructions on the readme, I modified the faceForensic config with the correct path, but when I try to run it I get the following error:

Epoch length: 0
Traceback (most recent call last):
  File "inference.py", line 91, in <module>
    main()
  File "inference.py", line 87, in main
    trainer.test(test_data_loader, args.output_dir, cfg.inference_args)
  File "/content/drive/My Drive/imaginaire/imaginaire/trainers/fs_vid2vid.py", line 148, in test
    test_data_loader.dataset.set_inference_sequence_idx(
  File "/content/drive/My Drive/imaginaire/imaginaire/datasets/paired_few_shot_videos.py", line 46, in set_inference_sequence_idx
    assert index < len(self.mapping)
AssertionError

Any idea why?

Doc website inaccessible.

http://imaginaire.cc/ can not be accessed.

Windows : Resample2d_CUDA DLL Load failed

Hello,

My config :
conda install python=3.6
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
Windows 10
RTX 2080 TI

I have sucessfull installed all dependence but when I test an inference (vid2vid), I have this error :

import resample2d File "D:\imaginaire\venv\lib\site-packages\resample2d_cuda-0.0.0-py3.6-win-amd64.egg\resample2d.py", line 4, in <module> import resample2d_cuda ImportError: DLL load failed: Le module spécifié est introuvable.

I have sucessfull installed third party with :
`
python setup.py install
creating d:\imaginaire\venv\lib\site-packages\resample2d_cuda-0.0.0-py3.6-win-amd64.egg
Extracting resample2d_cuda-0.0.0-py3.6-win-amd64.egg to d:\imaginaire\venv\lib\site-packages
resample2d-cuda 0.0.0 is already the active version in easy-install.pth

Installed d:\imaginaire\venv\lib\site-packages\resample2d_cuda-0.0.0-py3.6-win-amd64.egg
Processing dependencies for resample2d-cuda==0.0.0
Finished processing dependencies for resample2d-cuda==0.0.0`

Thanks for your feedback on Windows install.

Thanks.

missing: scripts/test_installation.sh

RuntimeError for linux when using test_train

Hi,
I get the following runtime error. My system is ubuntu 18.04 and I used anaconda 3. I installed pytorch with cuda-toolkit 10.02 by using conda install.

RuntimeError: apex.optimizers.FusedAdam requires cuda extensions

Does this runtime error come from the cudnn? If I sign up the account and download from the Nvidia website which folder I need to put in. Do I need to put it into my anaconda env for this project or my main usr/local/lib ?

Many thanks!

Kindly Regards,

Jiali Li

RuntimeError: expected scalar type Half but found Float

any suggestions?

try.log

Hope to support CUDA11 as soon as possible

It appears that Apex's configuration is incompatible with CUDA11.

Incomplete animal face test dataset

I followed readme in coco funit by runing "python scripts/download_test_data.py --model_name coco_funit" to get test dataset, It only contains 6 images instead of 30 categories of images

Can I ask a single V100 GPU, 32GB of memory can train

Can I ask a single V100 GPU, 32GB of memory can train, about how long it will take to lose convergence

Plan to add "Panoptic-based Image Synthesis"?

Hi, thank you for a great repository!

I recently read "Panoptic-based Image Synthesis" (CVPR 2020) from NVIDIA team, which improves SPADE with additional panoptic segmentation labels. Do you have any plan to add this model to the official imaginaire repository?

UNIT implementation

First of all, thank you for this great library.

I was checking the UNIT implementation and saw that it differs from the original paper at least in 2 things:

Spatial AE is used instead of VAE (L1 instead of KL loss)
There is no weight sharing in E and G layers
Cycle reconstruction is probably implemented differently and probably also the reason for not needing weight sharing (assumption, as I haven't checked original UNIT implementation)

Can you give us a bit more of the reasoning and describe any other major changes that might be there and I didn't notice?

Thanks!

"unauthorized: authentication required" from ./scripts/build_docker.sh

Sending build context to Docker daemon 1.35MB
Step 1/22 : FROM nvcr.io/nvidian/pytorch:20.07-py3
unauthorized: authentication required

Is "nvidian" a typo? It works fine when I switch it to "nvidia".

Google Colab Installation Problem

I'm trying to install Imaginaire on Google Colab (full installation log attached). Here's the error that I get on running the test_training.sh script:

100% 1/1 [00:00<00:00, 1736.77it/s]
100% 1/1 [00:00<00:00, 400.91it/s]
100% 1/1 [00:00<00:00, 1413.65it/s]
100% 1/1 [00:00<00:00, 1112.25it/s]
100% 1/1 [00:00<00:00, 1455.34it/s]
100% 1/1 [00:00<00:00, 945.30it/s]
python scripts/build_lmdb.py --config configs/unit_test/spade.yaml --paired --data_root dataset/unit_test/raw/spade/ --output_root dataset/unit_test/lmdb/spade --overwrite >> /tmp/unit_test.log [Success]
Traceback (most recent call last):
File "train.py", line 12, in
from imaginaire.utils.gpu_affinity import set_affinity
File "/content/gdrive/My Drive/audiovisual-compression/imaginaire/imaginaire/utils/gpu_affinity.py", line 9, in
pynvml.nvmlInit()
File "/usr/local/lib/python3.6/dist-packages/pynvml/nvml.py", line 749, in nvmlInit
check_return(ret)
File "/usr/local/lib/python3.6/dist-packages/pynvml/nvml.py", line 366, in check_return
raise NVMLError(ret)
pynvml.nvml.NVMLError_DriverNotLoaded: Driver Not Loaded
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 260, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'train.py', '--local_rank=0', '--config', 'configs/unit_test/spade.yaml']' returned non-zero exit status 1.
python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/spade.yaml >> /tmp/unit_test.log [Failure]
installation log.txt

How to generate the unprojection files?

Hi, Thanks for sharing the code, great work.

I want to use my own data, but I don't how to generate the unprojection files.

数据集下载

Thanks for your work for image enhancement ! I have tried to download these datasets in the paper. But in the halfway I always encounter a networks issue . Could you provide a baiduyun links or others ! Thanks again！！！

How to build docker image on centos 7?

Hello!

I cannot build from source because I don't have sudo permissions so I can't install many of the dependencies.
My first question is, do you still need the prerequisites (Anaconda3, cuda10.2, cudnn) if you want to build and run on docker?

Secondly, since the dockerfile uses apt-get, and Centos doesn't use apt, does anyone have any recommendations for how I can build? Not all of these packages are available on yum and without sudo, building these dependencies from source is a huge pain.

The build fails because my machine doesn't have access to apt
/bin/sh: apt-get: not found

wc_vid2vid unprojection scripts

Will the SfM/generation of unprojections code be released soon?

Input to Pix2pixHD is an RGB image not a label map

Hi, if the input to pix2pixHD is not a label map but an RGB image, what's the easiest way to achieve this? Thanks!

fs-vid2vid YouTube dancing pretrained weights

Hi, it is my understanding that both FaceForensics and YouTube dancing share the same pretrained weights. However when I run

python inference.py --single_gpu --num_workers 0 \
--config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml \
--output_dir projects/fs_vid2vid/output/YouTubeDancing \
--checkpoint epoch_00200_iteration_000005800_checkpoint.pt

I get a bunch of key mismatches like Missing key(s) in state_dict: "module.weight_generator.ref_img_first.layers.conv.weight_orig" and Unexpected key(s) in state_dict: "module.num_updates_tracked" to name a few.

SPADE required GPU memory

I get a RuntimeError: CUDA out of memory with the SPADE model with a batch size of 1 and image size of 256 x 256.

I know that a single RTX 2080ti might be less than the required hardware, but is it not possible to train the model with the --single_gpu argument? I get the error AssertionError: Default process group is not initialized.

Is there a recommended way to reduce the memory requirements ? I was able to train a SPADE model on the same hardware with the previous implementation (before imaginaire)

How to do example guided image generation, as shown in Fig.1 of the GauGAN paper?

code release

Hi will the code be released soon?

vid2vid got bad results

Hi,

I am trying to use vid2vid to test my own data. The semantic segmentation map is from DeeplabV3+(checkpoint: xception65_cityscapes_trainfine), and the results are as follows:

I want to know whether the label of the vid2vid segmentation map is consistent with the label obtained by deeplabV3+? And can you give some suggestions on the results I got.

Many thanks! :)

initial weight type is None？？

imaginaire/configs/projects/coco_funit/animal_faces/base64_bs8_class149.yaml

Line 20 in bac04f5

type: none

Hi， when I trained with coco_funit, I see the result is very bad, it is all zeros out. and I check the network
I saw the initial weight type is None ,is that true ??

How to keep the content unchanged and only change the style?

Hi, thanks for your great work!
I have a question that how to keep the content unchanged and only change the style? In paper, you show result about Summer ↔ winter, like this, the content is unchanged and style change.
But when I train with other datasets, I can not keep the content unchanged.
For example,

The left is input , right is generate by code. But the content is change from left to right?

So, how can I can keep the content unchange?

Thank you!

Is local enhancer supported in imaginaire's implementation of pix2pixHD?

Hello, I've noticed that in pix2pixHD config for cityscapes it appears, based on the following lines:

imaginaire/configs/projects/pix2pixHD/cityscapes/ampO1.yaml

Lines 40 to 42 in 7828f78

    
           local_enhancer: 
        
               num_enhancers: 0 
        
               num_res_blocks: 3

that you only use global generator instead of global generator + local enhancer. Is this intended behavior, or is it specific to cityscapes config? Does the current imaginaire's implementation of pix2pixHD support training global generator and local enhancer as in the original implementation? If so, could you recommend what a config should look like in that case?

HELP! expected scalar type Half but found Float

Hi,

I ran install.sh and then tries running the test script but this is the error I am getting. I have been struggling with getting this library working for more than three weeks now and have fixed multiple issues through long stackoverflow and github searched. Please any assistance would be very helpful.

try-i.log

RuntimeError: expected scalar type Half but found Float

How to run fs_vid2vid?

Sorry, may be it is obvious but can you give some explanation how to run it?
As I understand if I have my video with face I should convert it to face keypoints via dlib. But what the next step?
Can you please write some commands how to run this network when I want to give my video as an input
Thanks!

Error when running on Win10

I set up the environment and tried to run pix2pixHD inferring. But with this commend line:

python inference.py --single_gpu --config "configs/projects/pix2pixhd/cityscapes/ampO1.yaml" --output_dir "projects/pix2pixhd/output/cityscapes" --checkpoint "../models/cityscapes_1k.pt"

I got the following runtime error:

Traceback (most recent call last): File "inference.py", line 91, in <module> main() File "inference.py", line 39, in main set_affinity(args.local_rank) File "D:\Sources\Download\imaginaire-master\imaginaire-master\imaginaire\utils\gpu_affinity.py", line 57, in set_affinity os.sched_setaffinity(0, dev.getCpuAffinity()) AttributeError: module 'os' has no attribute 'sched_setaffinity'

Is there any way to get around this?

Could someone help me solve this problem?

Question 2
Here is the length information of my dataset. What is the detailed meaning of Num datasets, Num sequences, Max sequence length and Epoch length?

nvlabs / imaginaire Goto Github PK

imaginaire's Introduction

Imaginaire

Docs | License | Installation | Model Zoo

License

What's inside?

Supervised Image-to-Image Translation

Unsupervised Image-to-Image Translation

Video-to-video Translation

World-to-world Translation

imaginaire's People

Contributors

Stargazers

Watchers

Forkers

imaginaire's Issues

Recommend Projects

Recommend Topics

Recommend Org