researchmm / generate-it Goto Github PK

View Code? Open in Web Editor NEW

64.0 64.0 8.0 871 KB

A collection of models for image<->text generation in ACM MM 2021.

License: MIT License

Python 100.00%

generate-it's People

Contributors

Stargazers

Watchers

Forkers

xiamenwcy zeeroocooll xl2248 zhuzhutingru123 liaozhihao2757 smallflyingpig techthiyanes peide

generate-it's Issues

"it-generator" project requirements issues

@HYPJUDY I am unable to setup the requirements for the project using the "requirements.txt" on my Windows machine as it produces the following error:

I did find a way around this using PyTorch previous versions:

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

There is additionally an issue with conflicting tensorboard versions. I commented out the tensorboard in the "requirements.txt" for now:

Also I believe the yacs and apex libraries are required, as they are also not in the requirements and produce errors when running the sample_images.py file.

training error: unpickling error for pretrained model Epoch20_LXRT.pth

Inference using own datasets

Hello, I'm interested in your work!

I want to know what should i do if i want to infer the model by using my own datasets.

Should i build the file like dataset_coco.json, and re-extract cluster and grid features?

Thank you!

Could you tell me how to train the model on my own dataset?

Hello! Thank you for opening such nice work!

I am sorry that I am a freshman. Could you tell me how to train the model on my own dataset? Thank you very much!

a question about image mask

In train.py(103-110)

103: # for image
104: _visual_mask = torch.zeros((batch_size, visual_token_num), dtype=torch.float32, device=device)
105: # need to mask token content in selected_idx for prediction/generation
106: num_masks = random.randint(max(1, int(0.1 * visual_token_num)), visual_token_num)
107: selected_idx = random.sample(range(visual_token_num), num_masks)
108: _visual_mask[:, selected_idx] = 1
109: mask_position = (_visual_mask == 1).to(torch.long).view(-1)
110: mask_position = mask_position.nonzero().squeeze()

I think '_visual_mask = 1' means the model can see it, '_visual_mask = 0' is the opposite. The above codes randomly sample mask position, which selects which grid(8*8) the model can see(_visual_mask=1). The position that really needs to be masked is the position where the _visual_mask is equal to 0. So the code on line 109 should be changed to
mask_position = (_visual_mask == 0).to(torch.long).view(-1)
is this right?

researchmm / generate-it Goto Github PK

generate-it's People

Contributors

Stargazers

Watchers

Forkers

generate-it's Issues

"it-generator" project requirements issues

training error: unpickling error for pretrained model Epoch20_LXRT.pth

Inference using own datasets

Could you tell me how to train the model on my own dataset?

a question about image mask

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent