wtliao / text2image Goto Github PK
View Code? Open in Web Editor NEWText to Image Generation with Semantic-Spatial Aware GAN
Text to Image Generation with Semantic-Spatial Aware GAN
Hi,
Thanks for providing the code for the paper!
I tried to reproduce the FID score on the COCO dataset by generating images for the validation dataset as reported in the paper using the generator for 120th epoch (netG_120.pth) and text encoder for 595th epoch. I used the Pytorch implementation of the FID score and it gives me around 121 FID score as opposed to 19 reported.
I had resized the original COCO images to 256x256 resolution to have a consistent image size but the score is still high.
Even the generation is weird for sample sentences.
For eg.
The image attached has caption "A close up of a boat on a field with a cloudy sky". This caption has been taken from the paper but the generation using the final generator model and text encoder model is nowhere near presented in the paper.
Any suggestions from your side as to what has to be done?
Also can you please mention the difference between main.py and main_finetune.py? There is not much of a difference in both these scripts.
What function/script shall be the one used for inference task?
I want to try training by using opts.py, which was mentioned on README.md. But it seems that the file is missing.
Hello, could you share the code for calculating the IS and FID ? I want to use this metric to evaluate my method.
Could you please tell me how to put the image data into a folder when I want to use FID, LPIPS, or IS? The train set and data set are split by ".pikle" files and I don't know how to the evaluation metric.
@wtliao
I want to train the CUB and COCO models by myself instead of using provided trained model (onedrive repo). Would model.py help me to train the model ? or is there something more I should follow ?
Traceback (most recent call last):
File "D:\text2image-main\text2image-main\main.py", line 496, in
dataset = TextDataset(cfg.DATA_DIR, 'test',
File "D:\text2image-main\text2image-main\datasets.py", line 124, in init
self.wordtoix, self.n_words = self.load_text_data(data_dir, split)
File "D:\text2image-main\text2image-main\datasets.py", line 243, in load_text_data
x = pickle.load(f)
EOFError: Ran out of input
i find something ways to solve , while set "num_workers = 0". but this way cannot solve.
please help
In the paper, it is mentioned that the text_encoder will be fixed during the image synthesis training. However, in the provided code, the text_encoder is frozen, meaning it won’t be trained or optimized. Did I misunderstand something?
Hi, thank you for your excellent releasing code!
I have a little question, how can I visualize the predicted mask maps when validation.
I attempted to use the save_image() API from torchvision.utils, but the results are not consistent with the provided mask maps from the paper.
Looking forward to your reply! Thank you!
Traceback (most recent call last):
File "main.py", line 410, in
image_encoder = CNN_ENCODER(cfg.TEXT.EMBEDDING_DIM)
File "/data01/hxf/text2image/DAMSM.py", line 127, in init
model = models.inception_v3(pretrained=True, transform_input=False)
File "/home/omnisky/hxf/lib/python3.6/site-packages/torchvision/models/inception.py", line 53, in inception_v3
progress=progress)
File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/hub.py", line 557, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
Hi.
Where is opts.py
? I'd like to further train your model on my own dataset.
Thanks in advance.
Hi there,
A brilliant work! Thanks.
I would be more grateful if you can provide the inference.py, sometimes also called predict.py, by which I can generate image of any input sentence.
Hello, I am a freshman, I want to run this program. But when things got tough.(FileNotFoundError: [Errno 2] No such file or directory: '../data/birds/captions.pickle'). How should this step be solved, thank you.
I have calculated the scores:
For now, I am generating the images from epoch 550
Is the best FID score is calculated from last epoch 550 ? or Should I calculate the IS and FID score of the checkpoints every 10 or 50 epochs (or let me know what epoch I should use) and then choose the checkpoints with best FID and IS score ?
Hello, author. At the beginning of the training, the system asked me for the netG model of the 600th training. After I downloaded the model you gave me, the program continued to run, but stopped after only a few minutes. Ask why you need the model for the 600th run to start training and why the program stops after a few minutes. Thanks to the author.
我的英文不太好,加一个我原本的意思。
作者你好,在开始训练时系统向我索要第600次训练的netG模型。我在您所给的模型下载下来后,程序可以继续运行,但只运行了几分钟就会停下了。请问为什么开始训练需要第600次的模型,还有为什么程序运行几分钟就会停下来。感谢作者。
AttnGAN-master\code>python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
Using config:
{'B_VALIDATION': False,
'CONFIG_NAME': 'DAMSM',
'CUDA': True,
'DATASET_NAME': 'birds',
'DATA_DIR': '../data/birds',
'GAN': {'B_ATTENTION': True,
'B_DCGAN': False,
'CONDITION_DIM': 100,
'DF_DIM': 64,
'GF_DIM': 128,
'R_NUM': 2,
'Z_DIM': 100},
'GPU_ID': 0,
'RNN_TYPE': 'LSTM',
'TEXT': {'CAPTIONS_PER_IMAGE': 10, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 18},
'TRAIN': {'BATCH_SIZE': 48,
'B_NET_D': True,
'DISCRIMINATOR_LR': 0.0002,
'ENCODER_LR': 0.002,
'FLAG': True,
'GENERATOR_LR': 0.0002,
'MAX_EPOCH': 600,
'NET_E': '',
'NET_G': '',
'RNN_GRAD_CLIP': 0.25,
'SMOOTH': {'GAMMA1': 4.0,
'GAMMA2': 5.0,
'GAMMA3': 10.0,
'LAMBDA': 1.0},
'SNAPSHOT_INTERVAL': 50},
'TREE': {'BASE_SIZE': 299, 'BRANCH_NUM': 1},
'WORKERS': 1}
C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\torchvision\transforms\transforms.py:285: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Traceback (most recent call last):
File "pretrain_DAMSM.py", line 243, in
transform=image_transform)
File "C:\Users\bodal\Downloads\AttnGAN-master\code\datasets.py", line 110, in init
self.bbox = self.load_bbox()
File "C:\Users\bodal\Downloads\AttnGAN-master\code\datasets.py", line 126, in load_bbox
header=None).astype(int)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\readers.py", line 811, in init
self._engine = self._make_engine(self.engine)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 51, in init
self._open_handles(src, kwds)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\base_parser.py", line 229, in _open_handles
errors=kwds.get("encoding_errors", "strict"),
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 707, in get_handle
newline="",
FileNotFoundError: [Errno 2] No such file or directory: '../data/birds\CUB_200_2011/bounding_boxes.txt'
The test set for the CUB dataset has only 2933 images, and the official FID calculation method requires at least 10k generated images for the validity of the FID values. How many images did you generate to evaluate the FID values? And how is the number of corresponding truth images handled?
I can not find the dataset of bird
Hello,I test the trained models from your onedrive in COCO,the FID score is 28.1929. It has a big gap with the paper data (19.37) . My FID code is from https://github.com/bioinf-jku/TTUR and my GPU is 4 3090 . I'm confused about this and want to know the parameters that your FID code sets about COCO, such as the image size of groundtruth,whether or not you split the coco data and so on.Thanks
I wanted to calculate R-precision for comparisons with few state-of-the-art benchmarks so I adjusted the code into main.py to calculate R-precision using already given finetuned trained model by you. I achieved 73% value around. I was just checking your paper newest (5th) version where you have reported R precision 86%. Can you please share your code to calculate the R-precision so that I can figure out why we have large differences in our R precision scores.
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.