Code Monkey home page Code Monkey logo

imaginaire's Introduction

imaginaire_logo.svg

Imaginaire

Imaginaire is a pytorch library that contains optimized implementation of several image and video synthesis methods developed at NVIDIA.

License

Imaginaire is released under NVIDIA Software license. For commercial use, please consult NVIDIA Research Inquiries.

What's inside?

IMAGE ALT TEXT

We have a tutorial for each model. Click on the model name, and your browser should take you to the tutorial page for the project.

Supervised Image-to-Image Translation

Algorithm Name Feature Publication
pix2pixHD Learn a mapping that converts a semantic image to a high-resolution photorealistic image. Wang et. al. CVPR 2018
SPADE Improve pix2pixHD on handling diverse input labels and delivering better output quality. Park et. al. CVPR 2019

Unsupervised Image-to-Image Translation

Algorithm Name Feature Publication
UNIT Learn a one-to-one mapping between two visual domains. Liu et. al. NeurIPS 2017
MUNIT Learn a many-to-many mapping between two visual domains. Huang et. al. ECCV 2018
FUNIT Learn a style-guided image translation model that can generate translations in unseen domains. Liu et. al. ICCV 2019
COCO-FUNIT Improve FUNIT with a content-conditioned style encoding scheme for style code computation. Saito et. al. ECCV 2020

Video-to-video Translation

Algorithm Name Feature Publication
vid2vid Learn a mapping that converts a semantic video to a photorealistic video. Wang et. al. NeurIPS 2018
fs-vid2vid Learn a subject-agnostic mapping that converts a semantic video and an example image to a photoreslitic video. Wang et. al. NeurIPS 2019

World-to-world Translation

Algorithm Name Feature Publication
wc-vid2vid Improve vid2vid on view consistency and long-term consistency. Mallya et. al. ECCV 2020
GANcraft Convert semantic block worlds to realistic-looking worlds. Hao et. al. ICCV 2021

imaginaire's People

Contributors

arunmallya avatar mingyuliutw avatar tcwang0509 avatar xunhuang1995 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imaginaire's Issues

[code] Will the code of the paper(One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing) be made public in this project?

Hi,
I am reading the paper named "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing", which has attractive performances. And I find this repository under the guidance of that article, which say it will release the source code.
However, I can not find the code about that paper, and I want to know if the source code will be put here.
In addition, this project is very nice, thank you for contributors.
@tcwang0509 @arunmallya (^_^)

Google colab notebooks, anyone?

Hello, Has anyone gotten this working with Google Colab yet? Would be very excited to check it out!! I'm working on it, but haven't had success yet.

One major issue: Colab is running cuda 10.1, and it looks like it may not be possible to do a local installation of cuda 10.2, which is needed for Imaginaire to work.

Paired Image to Image Translation

paired Image to Image Translation

Thank you very much for the awesome library!

I want to perform a paired image to image translation like in the previous pix2pixHD implementation without any label or instance maps, i.e. image from domain A -> domain B.

In the previous implementation, one had to set

If your input is not a label map, please just specify --label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A, train_B instead of train_label, train_img, where the goal is to translate images from A to B.
If you don't have instance maps or don't want to use them, please specify --no_instance.

How can this be done in the imaginaire implementation?

In the yaml file, I set



    type: imaginaire.datasets.paired_images    
    num_workers: 4
    input_types:
        - images:
            ext: png
            num_channels: 3
            interpolator: BILINEAR
            normalize: True
            #use_dont_care: True
        - seg_maps:
            ext: png
            num_channels: 3 # 35
            interpolator: BILINEAR
            normalize: True
            #use_dont_care: True
        #- instance_maps:
        #    ext: png
        #    num_channels: 1
        #    interpolator: NEAREST
        #    normalize: False
        #    use_dont_care: True
    
    input_image:
        - images

    input_labels:
        - seg_maps
        #- instance_maps

i.e. the instance maps are commented out and instead of 'train_A' I use seg_maps as the source image directory that shall be translated to the directory 'images' (train_B in previous implementation). Is this the way to do it here?

The output images are 4 columns. What is the meaning of the columns ? Column 1 seems to be the input image column 2 seems to be a label map. Columns 3 and 4 do look almost identical to column 1 (probably the synthesized images), therefore it seems that something went wrong with my settings, as no translation has been performed.

Could SPADE be used in an identical way or can the SPADE generator be used as a replacement for pix2pixHD ?

RuntimeError: expected scalar type Half but found Float

I ran install.sh and then tried running the test script but this is the error I am getting. I have been struggling with getting this library working for more than three weeks now and have fixed multiple issues through long stackoverflow and github searched. Please any assistance would be very helpful.

try-i.log

UnboundLocalError: local variable 'data' referenced before assignment

Hi, when i run the code 'python train.py --single_gpu --config configs/projects/fs_vid2vid/faceForensics/ampO1.yaml' on the reference data of 'projects/fs_vid2vid/test_data/faceForensics/driving'.
And the trained dataset and the val dataset are into LMDB format based on 'python scripts/build_lmdb.py --config configs/projects/fs_vid2vid/faceForensics/ampO1.yaml --data_root /projects/fs_vid2vid/test_data/faceForensics/driving/ --output_root datasets/faceForensics/lmdb/[train|val] --paired --overwrite'.
During training, the error is :

Traceback (most recent call last):
File "train.py", line 104, in
main()
File "train.py", line 98, in main
trainer.end_of_epoch(data, current_epoch, current_iteration)
UnboundLocalError: local variable 'data' referenced before assignment

I found that the reason for this problem is for it, data in enumerate(train_data_loader):(train.py line72) this for loop is not executed.
How could I solve this problem?

Building LMDB with paired option

I am trying to train pix2pixHD model for edge-to-image translation with my own dataset. I prepared my dataset as follows

my_dataset
    └── train
        ├── images
        │   ├── 0.jpg
        │   ├── 1.jpg
        │   ├── 2.jpg
        │
        └── seg_maps
            ├── 0.jpg
            ├── 1.jpg
            ├── 2.jpg

I also have val set which I omitted above.

I modified the config file as mentioned in #10. However when I started training, I got the following error.

b'images/.'
Traceback (most recent call last):
  File "train.py", line 90, in <module>
    main()
  File "train.py", line 69, in main
    for it, data in enumerate(train_data_loader):
  File "/home/rapsodo/.conda/envs/transformer_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/home/rapsodo/.conda/envs/transformer_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 403, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/rapsodo/.conda/envs/transformer_env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/rapsodo/.conda/envs/transformer_env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/rapsodo/Desktop/587/imaginaire/imaginaire/datasets/paired_videos.py", line 302, in __getitem__
    return self._getitem(index, concat=True)
  File "/home/rapsodo/Desktop/587/imaginaire/imaginaire/datasets/paired_videos.py", line 240, in _getitem
    data = self.load_from_dataset(keys, lmdbs)
  File "/home/rapsodo/Desktop/587/imaginaire/imaginaire/utils/data.py", line 408, in load_from_lmdb
    key.encode(), data_type))
  File "/home/rapsodo/Desktop/587/imaginaire/imaginaire/datasets/lmdb.py", line 72, in getitem_by_path
    if img.ndim == 3 and img.shape[-1] == 3:
UnboundLocalError: local variable 'img' referenced before assignment

When I debugged the code, I realized that LMDB dataset should be built with paired=True. Am I right?

If I am right, I think build_lmbd.py command in "https://github.com/NVlabs/imaginaire/blob/master/projects/pix2pixhd/README.md" should be updated.

Thank you.

Different results in content encoding in munit with amp O1 or amp O0

Hi there,

First of all thank you for this amazing library.
It really helps people like me to bootstrap in the amazing world of generative networks!

That said I have noticed a strange behaviour when running the training on munit/afhq_doc2cat:

  • when you run in amp O1 optimization level, the content reconstruction error diverges (error above 3.5 all the time)
  • however running exactly the same settings but with amp O0, the same content reconstruction error converges to values as small as 0.6
    See attached content_recon.png
    content_recon

I wonder if this is the expected behaviour.

Additional information:

  • the same behaviour happens when using torch.cuda.amp instead of apex
  • the same behaviour happens when using my dataset which is not about dogs and cats ;)
  • the style encoding reconstruction error does NOT seem to suffer from the same issue
    See attached style_recon.png
    style_recon

Configuration:

  • ubuntu 18.04
  • 2 V100 GPUs
  • nvidia driver 450.66
  • pytorch 1.6
  • cuda 10.2.89
  • cudnn 8.04.30

Looking forward to reading from you,

Pierre

Perceptual Loss used for MUNIT not using Instance Normalization for content comparison only

Hi there,

If I understood correctly the original MUNIT article/implementation, the vgg loss was doing an additional Instance normalization to "remove" Style from the input image.
In the imaginaire implementation, the PerceptualLoss referenced in the MUNIT trainer seems to be the original perception loss, without any Instance normalization.

Am I missing something?
If this is correct, does it still make sense to call the vgg loss in addition to the others for MUNIT?

Thank you in advance,

Pierre

Ask a question about YoutubeDance datasets file paths in fs-vid2vid

Hello, I want to implement the pose synthesis in fs-vid2vid. I have downloaded a set of youtube dancing datasets that you have provided. And I have converted them to Openpose format and Densepos format. Finally generated the LMDB file. The file path is shown below:

pose
└───lmdb
    └───train
        └───human_instance_maps
               └───data.mdb
               └───lock.mdb
        └───images
               └───data.mdb
               └───lock.mdb
        └───poses-openpose
               └───data.mdb
               └───lock.mdb
        └───pose_maps-densepose
               └───data.mdb
               └───lock.mdb
        └───all_filenames.json
        └───metadata.json
    └───val
        └───human_instance_maps
               └───data.mdb
               └───lock.mdb
        └───images
                     ...(similar to train file path)
└───raw
    └───train
        └───human_instance_maps
               └───000000
                     └───frame000329_INDS.png
                     └───frame000330_INDS.png
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......
        └───images
               └───000000
                     └───frame000329.jpg
                     └───frame000330.jpg
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......
        └───poses-openpose
               └───000000
                     └───frame000329_keypoints.json
                     └───frame000330_keypoints.json
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......
        └───pose_maps-densepose
               └───000000
                     └───frame000329_IUV.png
                     └───frame000330_IUV.png
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......

Now, I use python -m torch.distributed.launch --nproc_per_node=4 train.py --config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml train this datasets. And ampO1.yaml roots part is shown in below

train:
        roots:
            - ./datasets/pose/lmdb/train/
        batch_size: 6
        initial_sequence_length: 4
        max_sequence_length: 16
        augmentations:
            resize_smallest_side: 540
            horizontal_flip: False
    
 val:
        roots:
            - ./datasets/pose/lmdb/val/
        batch_size: 1
        augmentations:            
            resize_smallest_side: 540
            horizontal_flip: False

However, I get the error

Traceback (most recent call last):
  File "train.py", line 99, in <module>
    main()
  File "train.py", line 93, in main
    trainer.end_of_epoch(data, current_epoch, current_iteration)
UnboundLocalError: local variable 'data' referenced before assignment

I found that the reason for this problem is for it, data in enumerate(train_data_loader):(train.py line77) this for loop is not executed.
I debug these codes, I get train_dataset object (dataset.py line 74) is shown below
image

Is there a problem with my datasets path, or is there something I need to improve?

mode collapse

hi, when I trained with coco_funit, In the first few epochs, the results are normal, but mode collapse appears from the 59th epoch. Is this normal? Will it also appear during your training?
epoch_00063_iteration_000094000

COCO-Stuff edge maps

Thank you very much for the awesome library!

I want to know how to get the edge maps of COCO-Stuff for training SPADE. Currently, I only have COCO training images, COCO validation images, and seg_maps from stuffthingmaps_trainval2017.

fs_vid2vid pose model - youtube playlist 3

Third youtube plaliyst link has more than 90% videos with multiple people in single frame will that affect the model performance or does it help model generalize ?

Just wondering how does model learn from so much noise vs important data.

[wc_vid2vid] Style in the result suddenly changes

Thanks for sharing the code of the amazing work. Currently, I have used it for training a model on our own dataset but am now facing some problems.

I simply create a small training set of around 50 samples and test on the training set just for learning to run the code and making it overfit. The training starts with the provided pre-trained checkpoint. The overall video generated is good except for the first two frames.

stuttgart_00_000000_000000_leftImg8bit
stuttgart_00_000000_000001_leftImg8bit
stuttgart_00_000000_000002_leftImg8bit
stuttgart_00_000000_000003_leftImg8bit

These are the first 4 frames of a video generated. You can find that the 3rd and 4th frames overfit the ground truth very well but the 1st and 2nd frames seem like being generated by the original checkpoint (model).

Here are my questions:
(1) Does anyone have an idea why this happens? Actually, I modified the code a little bit, but I am not sure whether this is the reason. What I modified is line 61 of
https://github.com/NVlabs/imaginaire/blob/master/imaginaire/generators/wc_vid2vid.py, as originally Python raises an error that self.gen_cfg.single_image_model does not have attribute checkpoint. From the config file https://github.com/NVlabs/imaginaire/blob/master/configs/projects/wc_vid2vid/cityscapes/seg_ampO1.yaml, we can see that single_image_model does not have an attribute called checkpoint, either. Thus I simply set load_weights = False.

(2) Also, it seems that the final generated output does not use (copy) the original color from the guidance image at all, like letting the network generate the color of every pixel, am I wrong?

(3) Are depth images required for training/test? My dataset does not have depth images but there is no error running the code.

Look forward to receiving your reply. Thank you very much and wish you a happy new year.

Seek for the test dataset for COCO-FUNIT

In the COCO-FUNIT paper, the model was tested on four datasets, Carnivores, Mammals, Birds, and Motorbikes. However, in this repository, only the animal faces dataset(Carnivores) was available. So can you share the other three test datasets(Mammals, Birds, and Motorbikes)? It would help us a lot. Thanks.

Best regards.

Questions about the training data

Thanks for your such a great series of works on image and video synthesis.

I'm very interested in the work "World-Consistent Video-to-Video Synthesis", which solves the long-term visual consistency in video-to-video synthesis efficiently.

I hope to re-training models of the work and be more familiar with this area. However, I don't know how to deal with the data preparation, which consists of many steps, like edge map, depth map, segmentation label, and they are related to many tools/SOTA methods. Could you please give more details of these data preparation steps? Personally, I think it will help people enter this task and contribute the growth of this community.

Thanks.

Inference for fewshotvid2vid?

So I wanted to try an inference on my own inputs, therefore I followed the instructions on the readme, I modified the faceForensic config with the correct path, but when I try to run it I get the following error:

Epoch length: 0
Traceback (most recent call last):
  File "inference.py", line 91, in <module>
    main()
  File "inference.py", line 87, in main
    trainer.test(test_data_loader, args.output_dir, cfg.inference_args)
  File "/content/drive/My Drive/imaginaire/imaginaire/trainers/fs_vid2vid.py", line 148, in test
    test_data_loader.dataset.set_inference_sequence_idx(
  File "/content/drive/My Drive/imaginaire/imaginaire/datasets/paired_few_shot_videos.py", line 46, in set_inference_sequence_idx
    assert index < len(self.mapping)
AssertionError

Any idea why?

Windows : Resample2d_CUDA DLL Load failed

Hello,

My config :
conda install python=3.6
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
Windows 10
RTX 2080 TI

I have sucessfull installed all dependence but when I test an inference (vid2vid), I have this error :

import resample2d File "D:\imaginaire\venv\lib\site-packages\resample2d_cuda-0.0.0-py3.6-win-amd64.egg\resample2d.py", line 4, in <module> import resample2d_cuda ImportError: DLL load failed: Le module spécifié est introuvable.

I have sucessfull installed third party with :
`
python setup.py install
creating d:\imaginaire\venv\lib\site-packages\resample2d_cuda-0.0.0-py3.6-win-amd64.egg
Extracting resample2d_cuda-0.0.0-py3.6-win-amd64.egg to d:\imaginaire\venv\lib\site-packages
resample2d-cuda 0.0.0 is already the active version in easy-install.pth

Installed d:\imaginaire\venv\lib\site-packages\resample2d_cuda-0.0.0-py3.6-win-amd64.egg
Processing dependencies for resample2d-cuda==0.0.0
Finished processing dependencies for resample2d-cuda==0.0.0`

Thanks for your feedback on Windows install.

Thanks.

RuntimeError for linux when using test_train

Hi,
I get the following runtime error. My system is ubuntu 18.04 and I used anaconda 3. I installed pytorch with cuda-toolkit 10.02 by using conda install.

RuntimeError: apex.optimizers.FusedAdam requires cuda extensions

Does this runtime error come from the cudnn? If I sign up the account and download from the Nvidia website which folder I need to put in. Do I need to put it into my anaconda env for this project or my main usr/local/lib ?

Many thanks!

Kindly Regards,

Jiali Li

Incomplete animal face test dataset

I followed readme in coco funit by runing "python scripts/download_test_data.py --model_name coco_funit" to get test dataset, It only contains 6 images instead of 30 categories of images

UNIT implementation

First of all, thank you for this great library.

I was checking the UNIT implementation and saw that it differs from the original paper at least in 2 things:

  • Spatial AE is used instead of VAE (L1 instead of KL loss)
  • There is no weight sharing in E and G layers
  • Cycle reconstruction is probably implemented differently and probably also the reason for not needing weight sharing (assumption, as I haven't checked original UNIT implementation)

Can you give us a bit more of the reasoning and describe any other major changes that might be there and I didn't notice?

Thanks!

Google Colab Installation Problem

I'm trying to install Imaginaire on Google Colab (full installation log attached). Here's the error that I get on running the test_training.sh script:

100% 1/1 [00:00<00:00, 1736.77it/s]
100% 1/1 [00:00<00:00, 400.91it/s]
100% 1/1 [00:00<00:00, 1413.65it/s]
100% 1/1 [00:00<00:00, 1112.25it/s]
100% 1/1 [00:00<00:00, 1455.34it/s]
100% 1/1 [00:00<00:00, 945.30it/s]
python scripts/build_lmdb.py --config configs/unit_test/spade.yaml --paired --data_root dataset/unit_test/raw/spade/ --output_root dataset/unit_test/lmdb/spade --overwrite >> /tmp/unit_test.log [Success]
Traceback (most recent call last):
File "train.py", line 12, in
from imaginaire.utils.gpu_affinity import set_affinity
File "/content/gdrive/My Drive/audiovisual-compression/imaginaire/imaginaire/utils/gpu_affinity.py", line 9, in
pynvml.nvmlInit()
File "/usr/local/lib/python3.6/dist-packages/pynvml/nvml.py", line 749, in nvmlInit
check_return(ret)
File "/usr/local/lib/python3.6/dist-packages/pynvml/nvml.py", line 366, in check_return
raise NVMLError(ret)
pynvml.nvml.NVMLError_DriverNotLoaded: Driver Not Loaded
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 260, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'train.py', '--local_rank=0', '--config', 'configs/unit_test/spade.yaml']' returned non-zero exit status 1.
python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/spade.yaml >> /tmp/unit_test.log [Failure]
installation log.txt

数据集下载

Thanks for your work for image enhancement ! I have tried to download these datasets in the paper. But in the halfway I always encounter a networks issue . Could you provide a baiduyun links or others ! Thanks again!!!

How to build docker image on centos 7?

Hello!

I cannot build from source because I don't have sudo permissions so I can't install many of the dependencies.
My first question is, do you still need the prerequisites (Anaconda3, cuda10.2, cudnn) if you want to build and run on docker?

Secondly, since the dockerfile uses apt-get, and Centos doesn't use apt, does anyone have any recommendations for how I can build? Not all of these packages are available on yum and without sudo, building these dependencies from source is a huge pain.

The build fails because my machine doesn't have access to apt
/bin/sh: apt-get: not found

fs-vid2vid YouTube dancing pretrained weights

Hi, it is my understanding that both FaceForensics and YouTube dancing share the same pretrained weights. However when I run

python inference.py --single_gpu --num_workers 0 \
--config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml \
--output_dir projects/fs_vid2vid/output/YouTubeDancing \
--checkpoint epoch_00200_iteration_000005800_checkpoint.pt

I get a bunch of key mismatches like Missing key(s) in state_dict: "module.weight_generator.ref_img_first.layers.conv.weight_orig" and Unexpected key(s) in state_dict: "module.num_updates_tracked" to name a few.

SPADE required GPU memory

I get a RuntimeError: CUDA out of memory with the SPADE model with a batch size of 1 and image size of 256 x 256.

I know that a single RTX 2080ti might be less than the required hardware, but is it not possible to train the model with the --single_gpu argument? I get the error AssertionError: Default process group is not initialized.

Is there a recommended way to reduce the memory requirements ? I was able to train a SPADE model on the same hardware with the previous implementation (before imaginaire)

vid2vid got bad results

Hi,

I am trying to use vid2vid to test my own data. The semantic segmentation map is from DeeplabV3+(checkpoint: xception65_cityscapes_trainfine), and the results are as follows:

I want to know whether the label of the vid2vid segmentation map is consistent with the label obtained by deeplabV3+? And can you give some suggestions on the results I got.

Many thanks! :)

How to keep the content unchanged and only change the style?

Hi, thanks for your great work!
I have a question that how to keep the content unchanged and only change the style? In paper, you show result about Summer ↔ winter, like this, the content is unchanged and style change.
But when I train with other datasets, I can not keep the content unchanged.
For example,
image

The left is input , right is generate by code. But the content is change from left to right?

So, how can I can keep the content unchange?

Thank you!

Is local enhancer supported in imaginaire's implementation of pix2pixHD?

Hello, I've noticed that in pix2pixHD config for cityscapes it appears, based on the following lines:

local_enhancer:
num_enhancers: 0
num_res_blocks: 3

that you only use global generator instead of global generator + local enhancer. Is this intended behavior, or is it specific to cityscapes config? Does the current imaginaire's implementation of pix2pixHD support training global generator and local enhancer as in the original implementation? If so, could you recommend what a config should look like in that case?

HELP! expected scalar type Half but found Float

Hi,

I ran install.sh and then tries running the test script but this is the error I am getting. I have been struggling with getting this library working for more than three weeks now and have fixed multiple issues through long stackoverflow and github searched. Please any assistance would be very helpful.

try-i.log

RuntimeError: expected scalar type Half but found Float

How to run fs_vid2vid?

Sorry, may be it is obvious but can you give some explanation how to run it?
As I understand if I have my video with face I should convert it to face keypoints via dlib. But what the next step?
Can you please write some commands how to run this network when I want to give my video as an input
Thanks!

Error when running on Win10

I set up the environment and tried to run pix2pixHD inferring. But with this commend line:

python inference.py --single_gpu --config "configs/projects/pix2pixhd/cityscapes/ampO1.yaml" --output_dir "projects/pix2pixhd/output/cityscapes" --checkpoint "../models/cityscapes_1k.pt"

I got the following runtime error:

Traceback (most recent call last): File "inference.py", line 91, in <module> main() File "inference.py", line 39, in main set_affinity(args.local_rank) File "D:\Sources\Download\imaginaire-master\imaginaire-master\imaginaire\utils\gpu_affinity.py", line 57, in set_affinity os.sched_setaffinity(0, dev.getCpuAffinity()) AttributeError: module 'os' has no attribute 'sched_setaffinity'

Is there any way to get around this?

wc_vid2vid inferring

Hi, I downloaded the trained weight of wc_vid2vid model, 7.39GB!
Inferring failed due to CUDA out of memory. My GPU is a single 2080ti.
Is it impossible to do even only Inferring on such a device?

Questions about length of pytorch training dataset in fs_vid2vid.

I want to run the fs_vid2vid. I prepare 5 videos for training and 1 video for val, extract their frames and get corresponding landmarks saved in the json file. And I preprocess them into lmdb format.

Question 1
However, when I run the fs_vid2vid program using my tiny dataset, I found that the length of traing dataset is 2 and the length of corresponding training dataloader is 0, which throws an error.

Could someone help me solve this problem?

Question 2
Here is the length information of my dataset. What is the detailed meaning of Num datasets, Num sequences, Max sequence length and Epoch length?
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.