Code Monkey home page Code Monkey logo

genforce's Introduction

GenForce Lib for Generative Modeling

An efficient PyTorch library for deep generative modeling. May the Generative Force (GenForce) be with You.

image

Updates

  • Encoder Training: We support training encoders on top of pre-trained GANs for GAN inversion.
  • Model Converters: You can easily migrate your already started projects to this repository. Please check here for more details.

Highlights

  • Distributed training framework.
  • Fast training speed.
  • Modular design for prototyping new models.
  • Model zoo containing a rich set of pretrained GAN models, with Colab live demo to play.

Installation

  1. Create a virtual environment via conda.

    conda create -n genforce python=3.7
    conda activate genforce
  2. Install cuda and cudnn. (We use CUDA 10.0 in case you would like to use TensorFlow 1.15 for model conversion.)

    conda install cudatoolkit=10.0 cudnn=7.6.5
  3. Install torch and torchvision.

    pip install torch==1.7 torchvision==0.8
  4. Install requirements

    pip install -r requirements.txt

Quick Demo

We provide a quick training demo, scripts/stylegan_training_demo.py, which allows to train StyleGAN on a toy dataset (500 animeface images with 64 x 64 resolution). Try it via

./scripts/stylegan_training_demo.sh

We also provide an inference demo, synthesize.py, which allows to synthesize images with pre-trained models. Generated images can be found at work_dirs/synthesis_results/. Try it via

python synthesize.py stylegan_ffhq1024

You can also play the demo at Colab.

Play with GANs

Test

Pre-trained models can be found at model zoo.

  • On local machine:

    GPUS=8
    CONFIG=configs/stylegan_ffhq256_val.py
    WORK_DIR=work_dirs/stylegan_ffhq256_val
    CHECKPOINT=checkpoints/stylegan_ffhq256.pth
    ./scripts/dist_test.sh ${GPUS} ${CONFIG} ${WORK_DIR} ${CHECKPOINT}
  • Using slurm:

    CONFIG=configs/stylegan_ffhq256_val.py
    WORK_DIR=work_dirs/stylegan_ffhq256_val
    CHECKPOINT=checkpoints/stylegan_ffhq256.pth
    GPUS=8 ./scripts/slurm_test.sh ${PARTITION} ${JOB_NAME} \
        ${CONFIG} ${WORK_DIR} ${CHECKPOINT}

Train

All log files in the training process, such as log message, checkpoints, synthesis snapshots, etc, will be saved to the work directory.

  • On local machine:

    GPUS=8
    CONFIG=configs/stylegan_ffhq256.py
    WORK_DIR=work_dirs/stylegan_ffhq256_train
    ./scripts/dist_train.sh ${GPUS} ${CONFIG} ${WORK_DIR} \
        [--options additional_arguments]
  • Using slurm:

    CONFIG=configs/stylegan_ffhq256.py
    WORK_DIR=work_dirs/stylegan_ffhq256_train
    GPUS=8 ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} \
        ${CONFIG} ${WORK_DIR} \
        [--options additional_arguments]

Play with Encoders for GAN Inversion

Train

  • On local machine:

    GPUS=8
    CONFIG=configs/stylegan_ffhq256_encoder_y.py
    WORK_DIR=work_dirs/stylegan_ffhq256_encoder_y
    ./scripts/dist_train.sh ${GPUS} ${CONFIG} ${WORK_DIR} \
        [--options additional_arguments]
  • Using slurm:

    CONFIG=configs/stylegan_ffhq256_encoder_y.py
    WORK_DIR=work_dirs/stylegan_ffhq256_encoder_y
    GPUS=8 ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} \
        ${CONFIG} ${WORK_DIR} \
        [--options additional_arguments]

Contributors

Member Module
Yujun Shen models and running controllers
Yinghao Xu runner and loss functions
Ceyuan Yang data loader
Jiapeng Zhu evaluation metrics
Bolei Zhou cheerleader

NOTE: The above form only lists the person in charge for each module. We help each other a lot and develop as a TEAM.

We welcome external contributors to join us for improving this library.

License

The project is under the MIT License.

Acknowledgement

We thank PGGAN, StyleGAN, StyleGAN2, StyleGAN2-ADA for their work on high-quality image synthesis. We thank IDInvert and GHFeat for their contribution to GAN inversion. We also thank MMCV for the inspiration on the design of controllers.

BibTex

We open source this library to the community to facilitate the research of generative modeling. If you do like our work and use the codebase or models for your research, please cite our work as follows.

@misc{genforce2020,
  title =        {GenForce},
  author =       {Shen, Yujun and Xu, Yinghao and Yang, Ceyuan and Zhu, Jiapeng and Zhou, Bolei},
  howpublished = {\url{https://github.com/genforce/genforce}},
  year =         {2020}
}

genforce's People

Contributors

justimyhxu avatar limbo0000 avatar njuaplusplus avatar shenyujun avatar zhoubolei avatar zhujiapeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genforce's Issues

about activation_scale

Hi, I have a question regarding the activation_scale.
Why does LeakyReLU has an activation_scale of sqrt(2) ?

Thanks in advance.

if activation_type == 'linear':
self.activate = nn.Identity()
self.activate_scale = 1.0
elif activation_type == 'lrelu':
self.activate = nn.LeakyReLU(negative_slope=0.2, inplace=True)
self.activate_scale = np.sqrt(2.0)
else:
raise NotImplementedError(f'Not implemented activation function: '
f'`{activation_type}`!')

Memory Error

Hi,

I notice that genforce caches all my loaded images into memory, creating a memory overload of > 60 GB which causes a memory error.
I am using a dataset of 9M images and of course there is no need to cache all of them to the cpu memory.
Could you suggest a workaround?

Thank you,
Y

The encoder

Hi, thanks for your work.
Is there the pretrained encoder for GAN inversion?

Summaries in tensorboard (Image/Audio)

What would be the easiest way to add non-scalar summaries to tensorboard from within genforce? I'm happy to put up a PR if you can give me a high level idea of where I could add it.

AttributeError: 'SynthesisModule' object has no attribute 'lod'

I get the following error when running Stylegan2 on my dataset of around 200K images:

Traceback (most recent call last):
  File "./train.py", line 117, in <module>
    main()
  File "./train.py", line 113, in main
    runner.train()
  File "/home/libs/genforce/runners/base_runner.py", line 287, in train
    self.train_step(data_batch, **train_kwargs)
  File "/home/libs/genforce/runners/stylegan_runner.py", line 34, in train_step
    G.synthesis.lod.data.fill_(self.lod)
  File "/home/miniconda3/envs/genforce/lib/python3.7/site-packages/torch/nn/modules/module.py", line 948, in __getattr__
    type(self).__name__, name))
AttributeError: 'SynthesisModule' object has no attribute 'lod'
Traceback (most recent call last):
  File "./train.py", line 117, in <module>
    main()
  File "./train.py", line 113, in main
    runner.train()
  File "/home/libs/genforce/runners/base_runner.py", line 287, in train
    self.train_step(data_batch, **train_kwargs)
  File "/home/libs/genforce/runners/stylegan_runner.py", line 34, in train_step
    G.synthesis.lod.data.fill_(self.lod)
  File "/home/miniconda3/envs/genforce/lib/python3.7/site-packages/torch/nn/modules/module.py", line 948, in __getattr__
    type(self).__name__, name))
AttributeError: 'SynthesisModule' object has no attribute 'lod'
Traceback (most recent call last):
  File "./train.py", line 117, in <module>
    main()
  File "./train.py", line 113, in main
    runner.train()
  File "/home/libs/genforce/runners/base_runner.py", line 287, in train
    self.train_step(data_batch, **train_kwargs)
  File "/home/libs/genforce/runners/stylegan_runner.py", line 34, in train_step
    G.synthesis.lod.data.fill_(self.lod)
  File "/home/miniconda3/envs/genforce/lib/python3.7/site-packages/torch/nn/modules/module.py", line 948, in __getattr__
    type(self).__name__, name))

I have the following config:

# python3.7
"""Configuration for training StyleGAN on FF-HQ (1024) dataset.

All settings are particularly used for one replica (GPU), such as `batch_size`
and `num_workers`.
"""

runner_type = 'StyleGANRunner'
gan_type = 'stylegan2'
resolution = 512
batch_size = 4
val_batch_size = 16
total_img = 25000_000

# Training dataset is repeated at the beginning to avoid loading dataset
# repeatedly at the end of each epoch. This can save some I/O time.
DATA = '/media/ilan/DeepFakes/synthetic-experiment/data_backup'
data = dict(
    num_workers=4,
    repeat=500,
    train=dict(root_dir=DATA, resolution=resolution, mirror=0.5),
    val=dict(root_dir=DATA, resolution=resolution),
    #train=dict(root_dir='data/ffhq.zip', data_format='zip',
    #           resolution=resolution, mirror=0.5),
    #val=dict(root_dir='data/ffhq.zip', data_format='zip',
    #         resolution=resolution),
)

controllers = dict(
    RunningLogger=dict(every_n_iters=10),
    ProgressScheduler=dict(
        every_n_iters=1, init_res=8, minibatch_repeats=4,
        lod_training_img=600_000, lod_transition_img=600_000,
        batch_size_schedule=dict(res4=64, res8=32, res16=16, res32=8),
    ),
    Snapshoter=dict(every_n_iters=500, first_iter=True, num=200),
    FIDEvaluator=dict(every_n_iters=5000, first_iter=True, num=50000),
    Checkpointer=dict(every_n_iters=5000, first_iter=True),
)

modules = dict(
    discriminator=dict(
        model=dict(gan_type=gan_type, resolution=resolution),
        lr=dict(lr_type='FIXED'),
        opt=dict(opt_type='Adam', base_lr=1e-3, betas=(0.0, 0.99)),
        kwargs_train=dict(),
        kwargs_val=dict(),
    ),
    generator=dict(
        model=dict(gan_type=gan_type, resolution=resolution),
        lr=dict(lr_type='FIXED'),
        opt=dict(opt_type='Adam', base_lr=1e-3, betas=(0.0, 0.99)),
        kwargs_train=dict(w_moving_decay=0.995, style_mixing_prob=0.9,
                          trunc_psi=1.0, trunc_layers=0, randomize_noise=True),
        kwargs_val=dict(trunc_psi=1.0, trunc_layers=0, randomize_noise=False),
        g_smooth_img=10_000,
    )
)

loss = dict(
    type='LogisticGANLoss',
    d_loss_kwargs=dict(r1_gamma=10.0),
    g_loss_kwargs=dict(),
)

which I run through:

GPUS=4
CONFIG=stylegan2_synthetic_512.py
WORK_DIR=work_dirs/experiment
./scripts/dist_train.sh ${GPUS} ${CONFIG} ${WORK_DIR}

Training StyleGAN on Cityscapes

Hi, thanks for your awesome work.
I noticed you mentioned Cityscapes Dataset in the Model Zoo. I wonder how to train the 1024 * 512 size image for StyleGAN, directly resize or crop to 512 * 512?
Thanks!

error in running train.py

I use the torch-1.5.1 torchvision-0.6.0 cudatoolkit-10.1, but there still exits this error:
Can you give me some suggestions?

[2020-10-16 16:19:08][INFO] Building models ...
[2020-10-16 16:19:09][INFO] Finish building models.
[2020-10-16 16:19:09][INFO] Building controllers ...
[2020-10-16 16:19:09][INFO] Progressive Schedule:
[2020-10-16 16:19:09][INFO] Resolution 8 (lod 3): batch size 32 * 1, learning rate scale 1.0
[2020-10-16 16:19:09][INFO] Resolution 16 (lod 2): batch size 16 * 1, learning rate scale 1.0
[2020-10-16 16:19:09][INFO] Resolution 32 (lod 1): batch size 8 * 1, learning rate scale 1.0
[2020-10-16 16:19:09][INFO] Resolution 64 (lod 0): batch size 4 * 1, learning rate scale 1.0
[2020-10-16 16:19:09][INFO] Finish building controllers.
[2020-10-16 16:19:09][INFO] Building train dataset ...
Traceback (most recent call last):
File "./train.py", line 117, in
main()
File "./train.py", line 113, in main
runner.train()
File "/home/cqr/genforce-master/runners/base_runner.py", line 269, in train
self.build_dataset('train')
File "/home/cqr/genforce-master/runners/base_runner.py", line 140, in build_dataset
dataset = BaseDataset(**self.config.data[mode])
File "/home/cqr/genforce-master/datasets/datasets.py", line 175, in init
zip_file = ZipLoader.get_zipfile(self.root_dir)
File "/home/cqr/genforce-master/datasets/datasets.py", line 38, in get_zipfile
zip_files[file_path] = zipfile.ZipFile(file_path, 'r')
File "/home/cqr/anaconda3/envs/genforce/lib/python3.7/zipfile.py", line 1240, in init
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'data/demo.zip'
Traceback (most recent call last):
File "/home/cqr/anaconda3/envs/genforce/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/cqr/anaconda3/envs/genforce/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cqr/anaconda3/envs/genforce/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/cqr/anaconda3/envs/genforce/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/cqr/anaconda3/envs/genforce/bin/python', '-u', './train.py', '--local_rank=0', 'configs/stylegan_demo.py', '--work_dir', 'work_dirs/stylegan_demo', '--launcher=pytorch']' returned non-zero exit status 1.

Sudden increase in memory usage during training

Thank you very much for your work. During the process of using your code, I encountered the following issues. I would greatly appreciate your assistance.
During the process of using stylegan training, I found that the occupancy of memory would undergo a sudden increase before and after Rebuild the dataset. For example, in this log, memory use increased from 3.5G to 9.9G.

[2023-05-02 03:34:34][INFO] Iter  93750/587498, data time: 0.001s, iter time: 0.173s, run time:  5h47m, lr (discriminator): 1.000e-03, lr (generator): 1.000e-03, kimg:  1800.0, lod: 5.00, minibatch:   16, g_loss: 2.324, d_loss: 0.585, real_grad_penalty: 0.025, Gs_beta: 0.9989 (memory: 3.5G) (ETA: 23h43m)
[2023-05-02 03:34:34][INFO] Reset the optimizer state at iter 093753 (lod 4.999893).
[2023-05-02 03:34:34][INFO] Rebuild the dataset at iter 093753 (lod 4.999893).
[2023-05-02 03:34:46][INFO] Iter  93760/587498, data time: 0.852s, iter time: 1.231s, run time:  5h47m, lr (discriminator): 1.000e-03, lr (generator): 1.000e-03, kimg:  1800.1, lod: 5.00, minibatch:    8, g_loss: 2.243, d_loss: 0.537, real_grad_penalty: 0.030, Gs_beta: 0.9994 (memory: 9.9G) (ETA:  7d00h)

My issue looks somewhat related to #22. I have tried to modify pin_memory on the dataloader, but is useless. I want to know how should I change my code to avoid problems like this.

This is my config, I keep the other current parameters consistent with stylegan_demo.py.

runner_type = 'StyleGANRunner'
gan_type = 'stylegan'
resolution = 512
batch_size = 64
val_batch_size = 16
total_img = 25000_000

Thanks a lot!

Convert class conditional models(stylegan2-ada pytorch) to genforce format?

I'm trying to convert stylegan2-ada pytorch version into genforce format. The ones that are not conditional are converted without any error but when there is a conditional model, i get this error assert tf_var_name in official_tf_to_pth_var_mapping AssertionError. Which i believe tells that some of the mappings are not found in the model. Can you please tell if the class conditional models are supported or not. Or what else could be the problem. I'm using cifar10 model file for now. And I've also tested with one of the conditional model on faces that i've trained. Both give the same error.

Randomize noise

Hello
Thank your for this implementation and all the interesting papers you have published!!
In your styleganv1 implementation you are using constant noise for layers {1 to last layer} except for layer0.
In layer0, the noise is random and therefore the corresponding NoiseApplyingLayer is trained with different noise vectors but all other layers are trained with the same noise vector (constant vector for each layer).
I wanted to know why do you randomize only the first noise vector. Your pretrained models were trained like this?
Thank you !
cap

Training on my own data

When I am training on my own data , layer 10 to layer 13 is missing in stylegan case(checkpoints).
Is this issue related to my dataset?

Error when running stylegan training

In the file runners/stylegan_runner at line 34 we do G.synthesis.lod.data.fill_(self.lod). In my case self.lod is None and running this line I get

File "/home/amir/.pycharm_helpers/pydev/pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "", line 1, in
TypeError: fill
() received an invalid combination of arguments - got (NoneType), but expected one of:

  • (Tensor value)
    didn't match because some of the arguments have invalid types: (NoneType)
  • (Number value)
    didn't match because some of the arguments have invalid types: (NoneType)

I tried to figure out what lod in G.synthesis but couldn't find it. I also tried to figure out where self.lod is modified during the training run. What exactely happens here? Do you maybe have an idea what could do wrong.

stylegan2

Thx for the great lib! When can we expect stylegan2 runner?

Resolution assert in progressive_resize_image

I am getting the following error in progressive_resize_image. I have printed my image. shape and size. I have attached my congig.json as well.

[2022-11-21 11:32:51][INFO] Building controllers ...
[2022-11-21 11:32:51][INFO] Progressive Schedule:
[2022-11-21 11:32:51][INFO] Resolution 8 (lod 5): batch size 32 * 1, learning rate scale 1.0
[2022-11-21 11:32:51][INFO] Resolution 16 (lod 4): batch size 16 * 1, learning rate scale 1.0
[2022-11-21 11:32:51][INFO] Resolution 32 (lod 3): batch size 8 * 1, learning rate scale 1.0
[2022-11-21 11:32:51][INFO] Resolution 64 (lod 2): batch size 4 * 1, learning rate scale 1.0
[2022-11-21 11:32:51][INFO] Resolution 128 (lod 1): batch size 4 * 1, learning rate scale 1.5
[2022-11-21 11:32:51][INFO] Resolution 256 (lod 0): batch size 4 * 1, learning rate scale 2.0
[2022-11-21 11:32:52][INFO] Finish building controllers.
[2022-11-21 11:32:52][INFO] Building train dataset ...
[2022-11-21 11:32:57][INFO] Finish building train dataset.
[2022-11-21 11:32:57][INFO] Building loss function ...
[2022-11-21 11:32:57][INFO] Finish building loss function.

[2022-11-21 11:32:57][INFO] Start training.
[2022-11-21 11:32:57][INFO] Reset the optimizer state at iter 000001 (lod 5.000000).
[2022-11-21 11:32:57][INFO] Rebuild the dataset at iter 000001 (lod 5.000000).
(457, 640, 3) 256
(333, 500, 3) 256
(1024, 1024, 3) 256
(610, 1024, 3) 256
(640, 427, 3) 256
(225, 338, 3) 256
(500, 333, 3) 256
(591, 392, 3) 256
(480, 640, 3) 256
(457, 640, 3) 8
(470, 640, 3) 8
(640, 640, 3) 8
(519, 500, 3) 8
(685, 860, 3) 8
(309, 500, 3) 8
(374, 500, 3) 8
(332, 500, 3) 8
Traceback (most recent call last):
File "./train.py", line 122, in
main()
File "./train.py", line 118, in main
runner.train()
File "/nfs/hpc/share/ullaham/SailON/phase3/GAN/genforce/runners/base_runner.py", line 294, in train
data_batch = next(self.train_loader)
File "/nfs/hpc/share/ullaham/SailON/phase3/GAN/genforce/datasets/dataloaders.py", line 91, in next
data = next(self.iter_loader)
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/nfs/hpc/share/ullaham/SailON/phase3/GAN/genforce/datasets/datasets.py", line 227, in getitem
image = progressive_resize_image(image, self.resolution)
File "/nfs/hpc/share/ullaham/SailON/phase3/GAN/genforce/datasets/transforms.py", line 82, in progressive_resize_image
assert height == width
AssertionError

Traceback (most recent call last):
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/nfs/stak/users/ullaham/hpc-share/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/nfs/stak/users/ullaham/hpc-share/anaconda3/bin/python', '-u', './train.py', '--local_rank=0', 'configs/stylegan_SVO.py', '--work_dir', 'work_dirs/stylegan_SVO_train', '--launcher=pytorch']' returned non-zero exit status 1.

{
"runner_type": "StyleGANRunner",
"gan_type": "stylegan",
"resolution": 256,
"batch_size": 4,
"val_batch_size": 32,
"total_img": 100000,
"data": {
"num_workers": 4,
"repeat": 500,
"train": {
"root_dir": "data/SVO_train.zip",
"data_format": "zip",
"resolution": 256,
"mirror": 0.5
},
"val": {
"root_dir": "data/SVO_train.zip",
"data_format": "zip",
"resolution": 256
}
},
"controllers": {
"RunningLogger": {
"every_n_iters": 10
},
"ProgressScheduler": {
"every_n_iters": 1,
"init_res": 8,
"minibatch_repeats": 4,
"lod_training_img": 600000,
"lod_transition_img": 600000,
"batch_size_schedule": {
"res4": 64,
"res8": 32,
"res16": 16,
"res32": 8
}
},
"Snapshoter": {
"every_n_iters": 500,
"first_iter": true,
"num": 200
},
"FIDEvaluator": {
"every_n_iters": 5000,
"first_iter": true,
"num": 50000
},
"Checkpointer": {
"every_n_iters": 5000,
"first_iter": true
}
},
"modules": {
"discriminator": {
"model": {
"gan_type": "stylegan",
"resolution": 256
},
"lr": {
"lr_type": "FIXED"
},
"opt": {
"opt_type": "Adam",
"base_lr": 0.001,
"betas": [
0.0,
0.99
]
},
"kwargs_train": {},
"kwargs_val": {}
},
"generator": {
"model": {
"gan_type": "stylegan",
"resolution": 256
},
"lr": {
"lr_type": "FIXED"
},
"opt": {
"opt_type": "Adam",
"base_lr": 0.001,
"betas": [
0.0,
0.99
]
},
"kwargs_train": {
"w_moving_decay": 0.995,
"style_mixing_prob": 0.9,
"trunc_psi": 1.0,
"trunc_layers": 0,
"randomize_noise": true
},
"kwargs_val": {
"trunc_psi": 1.0,
"trunc_layers": 0,
"randomize_noise": false
},
"g_smooth_img": 10000
}
},
"loss": {
"type": "LogisticGANLoss",
"d_loss_kwargs": {
"r1_gamma": 10.0
},
"g_loss_kwargs": {}
},
"work_dir": "work_dirs/stylegan_SVO_train",
"resume_path": null,
"weight_path": null,
"seed": null,
"launcher": "pytorch",
"backend": "nccl",
"cudnn_benchmark": true,
"cudnn_deterministic": false,
"is_distributed": true,
"num_gpus": 1
}

Error when running train.py in forward pass of generator

Hi there,

I'm currently trying to run a training of style gan by resuming the pretrained model trained on ffhq 256x256. I did not change anything in the configs. I'm using the ffhq dataset which has 70k images. I get the following error during the forward pass

Traceback (most recent call last):
File "/home/amir/artifact_regularization/genforce/train.py", line 120, in
main()
File "/home/amir/artifact_regularization/genforce/train.py", line 116, in main
runner.train()
File "/home/amir/artifact_regularization/genforce/runners/base_runner.py", line 288, in train
self.train_step(data_batch, **train_kwargs)
File "/home/amir/artifact_regularization/genforce/runners/stylegan_runner.py", line 43, in train_step
d_loss = self.loss.d_loss(self, data)
File "/home/amir/artifact_regularization/genforce/runners/losses/logistic_gan_loss.py", line 73, in d_loss
fakes = G(latents, label=labels, **runner.G_kwargs_train)['image']
File "/home/amir/.conda/envs/dl_proj_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amir/artifact_regularization/genforce/models/stylegan_generator.py", line 187, in forward
synthesis_results = self.synthesis(wp, lod, randomize_noise)
File "/home/amir/.conda/envs/dl_proj_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amir/artifact_regularization/genforce/models/stylegan_generator.py", line 510, in forward
results['image'] = self.final_activate(image)
UnboundLocalError: local variable 'image' referenced before assignment

Process finished with exit code 1

I stepped trough the code and figured out that the passed variable lod plays an important role for forward pass. The passed value is -1. Given that value -1, none of the cases in line 502 - 508 in models/stylegan_generator holds to generate the image. What happens there and where could this wrong lod value come frome? I figured out, that something happens to the lod in the setup method of the progress scheduler in runners/controllers/progress_schedulers. But don't understand tbh what the idea is. Can you help me?

I would be very thankful!

Amir

Get style code (Y-space)

Hi, I'm looking for a way of getting the style codes y, given a latent code z or w/w+. Should I set self.space_of_latent='y' and then return, somehow, style? Is this the same "Style Space" as the one defined here?

Error in running stylegan_demo

Hi,

Running python -m torch.distributed.launch --nproc_per_node=1 train.py ./configs/stylegan_demo.py --work_dir work_dirs/stylegan_demo/ --launcher="pytorch" throughs error when doing optimizer.step(). Below is the trace. Could you help fix this?

`
[2020-10-12 20:14:25][INFO] Rebuild the dataset at iter 000001 (lod 3.000000).
Traceback (most recent call last):
File "train.py", line 117, in
main()
File "train.py", line 113, in main
runner.train()
File "/home/ubuntu/efs/genforce/runners/base_runner.py", line 287, in train
self.train_step(data_batch, **train_kwargs)
File "/home/ubuntu/efs/genforce/runners/stylegan_runner.py", line 45, in train_step
self.optimizers['discriminator'].step()
File "/home/ubuntu/anaconda3/envs/pytorch_py3.6/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 51, in wrapper
return wrapped(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/pytorch_py3.6/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
return func(*args, **kwargs)
File "/home/ubuntu/efs/genforce/runners/optimizer.py", line 193, in step
state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
TypeError: zeros_like() received an invalid combination of arguments - got (Parameter, memory_format=torch.memory_format), but expected one of:

  • (Tensor input, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
  • (Tensor input, bool requires_grad)
    didn't match because some of the keywords were incorrect: memory_format

Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/pytorch_py3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ubuntu/anaconda3/envs/pytorch_py3.6/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/envs/pytorch_py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/home/ubuntu/anaconda3/envs/pytorch_py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/pytorch_py3.6/bin/python', '-u', 'train.py', '--local_rank=0', './configs/stylegan_demo.py', '--work_dir', 'work_dirs/stylegan_demo/', '--launcher=pytorch']' returned non-zero exit status 1.
`

Compatibility with RTX 3060

Recently I've upgraded my GPU to a RTX 3060, but it looks like that the CUDA version with support for RTX GPUs it is not the same as the requirements for genforce.

I would like to know if upgrading the versions for CUDA and the corresponding pytorch and cuDNN versions will work to make the model work with the RTX 3060 and if there are recommended versions of these libraries to make the RTX series work with genforce.

I'm trying to find versions for these libraries that works manually, but my internet connection is very slow, that is the main reason for this issue.

StyleGAN3 support

Hi, thanks for your fantastic work! This project is really useful for playing with generative models. Do you have any plans to add support for StyleGAN3?

dataloader error

Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "D:\Program\Anaconda3\envs\Python3.7\lib\site-packages\torch\utils\data_utils\worker.py",

do you have any idea ?

Failure to convert StyleGAN2-ADA-PyTorch model

I have the official model weight trained from NVlabs,but when i want to convert it into genforce style,I encounter the issue as follow:

========================================
Loading source weights from `/data/user3/fewshot/stylegan2-ada-pytorch/training-runs/00000-retina-auto2-batch32/network-snapshot-004400.pkl` ...
Successfully loaded!
--------------------
Converting source weights (G) to target ...
Traceback (most recent call last):
  File "/data/user3/fewshot/genforce/convert_model.py", line 77, in <module>
    main()
  File "/data/user3/fewshot/genforce/convert_model.py", line 66, in main
    convert_stylegan2ada_pth_weight(src_weight_path=args.source_model_path,
  File "/data/user3/fewshot/genforce/converters/stylegan2ada_pth_converter.py", line 187, in convert_stylegan2ada_pth_weight
    assert tf_var_name in official_tf_to_pth_var_mapping
AssertionError

What happens? How can I solve it?

Force train code to run on 1 GPU

Helloo

I'm working on a server, but I don't have sudoers rights. It seems to naively run the train code, according to the instructions one has to run it on multiple GPUs e.g. have sudo rights. Is there a way to use train with only 1 GPU to circumvent needing sudo rights. During stepping through the code, we found that the error occurs when using dist.init_process_group in utils/misc.py line 27.

Without changing to much in the pipeline, is the a way to fix this.

Best regards

Slow in debugging using pdb

When I using pdb to debug dist_train.sh, it becomes very slow after stopped at the breakpoint, and I can not print any debug information like p xxx. Can you help me?

A small error in backward propagation

Thanks for the release of this awesome repo and related pre-trained models, and recently I found a small error while using the pre-trained models.

The following line would incur non-contiguous tensor in network forwarding for using the different parts of style_split. So it might cause Runtime Error: non-contiguous input in backward propagation.

x = x * (style_split[:, 0] + 1) + style_split[:, 1]

And I suggest adding x = x.contiguous() would help.

fused_modulate= False model convert increase

hi

我嘗試把stylegan2模型轉換coreml。由於轉換工具本身存在缺陷,無法使用參數作為權值。然後修改fused_modulate=false後可以通過,但出現一個問題。就是轉換之後模型變大了。
torch 121M,script 168M,coreml 218M
我現在水准暫時無法看懂,參數設置後為什麼會增長。

latent_codes = np.random.randn(1,18,512).astype(np.float32)
dummy_input = torch.from_numpy(latent_codes)

torch_model = StyleGAN2Generator(fused_modulate=False)

traced_model = torch.jit.trace(torch_model.net.synthesis, dummy_input)
torch.jit.save(traced_model, './mobile.pt')

model = ct.convert( traced_model, inputs=[ct.TensorType(name="input_1", shape=latent_codes.shape)],)
model.save('./stylegan2.mlmodel')

image

Training closes silently

Hi all.

I was trying to train a StyleGAN model on the LSUN dataset, dining room. I have downloaded the data from the official website, extract it from lmdb file, and zip it. I kept the training config same as ffhq256, just added a flag crop_resize_resolution = 256 to crop the images.

I have noticed it several times, when the training comes to the situation below, the script will silently stop, without any notification or error report.

[2021-06-27 21:58:39][INFO] Reset the optimizer state at iter 002345 (lod 4.999893).                                                                                                                            
[2021-06-27 21:58:39][INFO] Rebuild the dataset at iter 002345 (lod 4.999893). 

but if change 'batch_size' from 4 to 12, and 'total_img' from 25000_000 to 75000_000, such a problem will not happen. May I ask if anyone has observed similar phenomenons, or any insights here?

My env is pytorch 1.8.1 + CUDA 11.1. Please let me know if any other information is helpful.

Best,
Jianyuan

EOFError: Ran out of input

(genforce) C:\Users\A\Desktop\genforce-master>python synthesize.py stylegan_ffhq1024
Building generator for model stylegan_ffhq1024 ...
Finish building generator.
Loading checkpoint from checkpoints\stylegan_ffhq1024.pth ...
Traceback (most recent call last):
File "synthesize.py", line 147, in
main()
File "synthesize.py", line 108, in main
checkpoint = torch.load(checkpoint_path, map_location='cpu')
File "C:\Users\A\anaconda3\envs\genforce\lib\site-packages\torch\serialization.py", line 595, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "C:\Users\A\anaconda3\envs\genforce\lib\site-packages\torch\serialization.py", line 764, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

HOW??

Using my own dataset to retrain

I train the generator, and the results are not clear.
image
What could be the reason for this?

My torch version is 1.12.1+cu113.

pre-trained models

Hi, thanks for your work. I want to do some experiments of StyleGAN, but I don't find the pre-trained model (trained on FFHQ_256) of TensorFlow version in the model zoo. Could you provide your pre-trained model of TensorFlow version or methods to convert the PyTorch version to TensorFlow Version? Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.