Code Monkey home page Code Monkey logo

text2image's People

Contributors

wtliao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

text2image's Issues

FID not matching for COCO dataset

Hi,

Thanks for providing the code for the paper!

I tried to reproduce the FID score on the COCO dataset by generating images for the validation dataset as reported in the paper using the generator for 120th epoch (netG_120.pth) and text encoder for 595th epoch. I used the Pytorch implementation of the FID score and it gives me around 121 FID score as opposed to 19 reported.

I had resized the original COCO images to 256x256 resolution to have a consistent image size but the score is still high.

Even the generation is weird for sample sentences.

For eg.

The image attached has caption "A close up of a boat on a field with a cloudy sky". This caption has been taken from the paper but the generation using the final generator model and text encoder model is nowhere near presented in the paper.

Any suggestions from your side as to what has to be done?
img_0

Also can you please mention the difference between main.py and main_finetune.py? There is not much of a difference in both these scripts.

missing 'opts.py' file

I want to try training by using opts.py, which was mentioned on README.md. But it seems that the file is missing.

The question of "main.py"

I am just a little confused about Why do you write like that. When I run the code if I don't change the 600 to 550, there may be occurred an error.
image
image

IS and FID

Hello, could you share the code for calculating the IS and FID ? I want to use this metric to evaluate my method.

How to put the data to evaluate the result?

Could you please tell me how to put the image data into a folder when I want to use FID, LPIPS, or IS? The train set and data set are split by ".pikle" files and I don't know how to the evaluation metric.

Help, this problem how to solve

Traceback (most recent call last):
File "D:\text2image-main\text2image-main\main.py", line 496, in
dataset = TextDataset(cfg.DATA_DIR, 'test',
File "D:\text2image-main\text2image-main\datasets.py", line 124, in init
self.wordtoix, self.n_words = self.load_text_data(data_dir, split)
File "D:\text2image-main\text2image-main\datasets.py", line 243, in load_text_data
x = pickle.load(f)
EOFError: Ran out of input

i find something ways to solve , while set "num_workers = 0". but this way cannot solve.
please help

fix text_encoder

In the paper, it is mentioned that the text_encoder will be fixed during the image synthesis training. However, in the provided code, the text_encoder is frozen, meaning it won’t be trained or optimized. Did I misunderstand something?

about the mask maps

Hi, thank you for your excellent releasing code!
I have a little question, how can I visualize the predicted mask maps when validation.
I attempted to use the save_image() API from torchvision.utils, but the results are not consistent with the provided mask maps from the paper.

Looking forward to your reply! Thank you!

seek help, How can I solve this problem

Traceback (most recent call last):
File "main.py", line 410, in
image_encoder = CNN_ENCODER(cfg.TEXT.EMBEDDING_DIM)
File "/data01/hxf/text2image/DAMSM.py", line 127, in init
model = models.inception_v3(pretrained=True, transform_input=False)
File "/home/omnisky/hxf/lib/python3.6/site-packages/torchvision/models/inception.py", line 53, in inception_v3
progress=progress)
File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/hub.py", line 557, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

Train on custom dataset

Hi.

Where is opts.py? I'd like to further train your model on my own dataset.

Thanks in advance.

inference.py file

Hi there,
A brilliant work! Thanks.

I would be more grateful if you can provide the inference.py, sometimes also called predict.py, by which I can generate image of any input sentence.

captions.pickle no found

Hello, I am a freshman, I want to run this program. But when things got tough.(FileNotFoundError: [Errno 2] No such file or directory: '../data/birds/captions.pickle'). How should this step be solved, thank you.

Issue of FID and IS score

@wtliao

I have calculated the scores:

  • FID score but I am getting very high value : FID: 73.33472569962976
  • IS score is quite low : Inception mean: 4.732609 , Inception std: 0.1345223

For now, I am generating the images from epoch 550

Is the best FID score is calculated from last epoch 550 ? or Should I calculate the IS and FID score of the checkpoints every 10 or 50 epochs (or let me know what epoch I should use) and then choose the checkpoints with best FID and IS score ?

Problems encountered in training

Hello, author. At the beginning of the training, the system asked me for the netG model of the 600th training. After I downloaded the model you gave me, the program continued to run, but stopped after only a few minutes. Ask why you need the model for the 600th run to start training and why the program stops after a few minutes. Thanks to the author.
我的英文不太好,加一个我原本的意思。
作者你好,在开始训练时系统向我索要第600次训练的netG模型。我在您所给的模型下载下来后,程序可以继续运行,但只运行了几分钟就会停下了。请问为什么开始训练需要第600次的模型,还有为什么程序运行几分钟就会停下来。感谢作者。

FileNotFoundError: File b'../data/coco\\CUB_200_2011/bounding_boxes.txt' does not exist

AttnGAN-master\code>python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
Using config:
{'B_VALIDATION': False,
'CONFIG_NAME': 'DAMSM',
'CUDA': True,
'DATASET_NAME': 'birds',
'DATA_DIR': '../data/birds',
'GAN': {'B_ATTENTION': True,
'B_DCGAN': False,
'CONDITION_DIM': 100,
'DF_DIM': 64,
'GF_DIM': 128,
'R_NUM': 2,
'Z_DIM': 100},
'GPU_ID': 0,
'RNN_TYPE': 'LSTM',
'TEXT': {'CAPTIONS_PER_IMAGE': 10, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 18},
'TRAIN': {'BATCH_SIZE': 48,
'B_NET_D': True,
'DISCRIMINATOR_LR': 0.0002,
'ENCODER_LR': 0.002,
'FLAG': True,
'GENERATOR_LR': 0.0002,
'MAX_EPOCH': 600,
'NET_E': '',
'NET_G': '',
'RNN_GRAD_CLIP': 0.25,
'SMOOTH': {'GAMMA1': 4.0,
'GAMMA2': 5.0,
'GAMMA3': 10.0,
'LAMBDA': 1.0},
'SNAPSHOT_INTERVAL': 50},
'TREE': {'BASE_SIZE': 299, 'BRANCH_NUM': 1},
'WORKERS': 1}
C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\torchvision\transforms\transforms.py:285: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Traceback (most recent call last):
File "pretrain_DAMSM.py", line 243, in
transform=image_transform)
File "C:\Users\bodal\Downloads\AttnGAN-master\code\datasets.py", line 110, in init
self.bbox = self.load_bbox()
File "C:\Users\bodal\Downloads\AttnGAN-master\code\datasets.py", line 126, in load_bbox
header=None).astype(int)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\readers.py", line 811, in init
self._engine = self._make_engine(self.engine)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 51, in init
self._open_handles(src, kwds)
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers\base_parser.py", line 229, in _open_handles
errors=kwds.get("encoding_errors", "strict"),
File "C:\Users\bodal\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 707, in get_handle
newline="",
FileNotFoundError: [Errno 2] No such file or directory: '../data/birds\CUB_200_2011/bounding_boxes.txt'

the number of generated images for FID evaluation on CUB ?

The test set for the CUB dataset has only 2933 images, and the official FID calculation method requires at least 10k generated images for the validity of the FID values. How many images did you generate to evaluate the FID values? And how is the number of corresponding truth images handled?

FID score in COCO

Hello,I test the trained models from your onedrive in COCO,the FID score is 28.1929. It has a big gap with the paper data (19.37) . My FID code is from https://github.com/bioinf-jku/TTUR and my GPU is 4 3090 . I'm confused about this and want to know the parameters that your FID code sets about COCO, such as the image size of groundtruth,whether or not you split the coco data and so on.Thanks

How to calculate R-precision?

@wtliao

I wanted to calculate R-precision for comparisons with few state-of-the-art benchmarks so I adjusted the code into main.py to calculate R-precision using already given finetuned trained model by you. I achieved 73% value around. I was just checking your paper newest (5th) version where you have reported R precision 86%. Can you please share your code to calculate the R-precision so that I can figure out why we have large differences in our R precision scores.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.