First of all, thank you for this excellent work. I would like to fin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Assistance Requested for Fine-Tuning on Visual Question Answering Task about cogvlm HOT 4 CLOSED

crux82 commented on June 8, 2024 1

Assistance Requested for Fine-Tuning on Visual Question Answering Task

from cogvlm.

Comments (4)

zRzRzRzRzRzRzR commented on June 8, 2024

bbox 也需要按照字符的输入和输出格式进行匹配，格式为 [x,y,w,h](如果您是为了微调），可以直接放在文字中输出，bbox并不需要单独的处理，直接放在常规文字的后方就行

这是一个例子：

图像中有两只猫，分别是[[x1,y,1,w1.h1],[x2,y2,w2,h2]]这种格式

from cogvlm.

crux82 commented on June 8, 2024

@zRzRzRzRzRzRzR

OK!

First of all, thank you very much for the prompt response.

One last question: from the script example with the dataset from https://www.kaggle.com/datasets/aadhavvignesh/captcha-images, only an image is used as input, without any additional text (in my case, the question), and the output is expected to be a string.

While the output is clear to me, I am now unclear on how to pass the image/text input pair.

Can you help me?

Thank you again!

from cogvlm.

crux82 commented on June 8, 2024

Dear @zRzRzRzRzRzRzR

I solved finding the dataset generator: https://github.com/THUDM/CogVLM/blob/main/utils/utils/dataset.py#L20

Thank you anyway!

Bests

Danilo

from cogvlm.

basteran commented on June 8, 2024

Dear @zRzRzRzRzRzRzR and other authors, I am interested too in fine-tuning your model on a Visual Question Answering task.

Looking through the repository and the scripts, I can't find any mention of how to format my bounding boxes in the labels. As an example, if I want the model to generate in output something like
"The girl [[000,111,222,333]] is standing tall and the boy [[000,111,222,333]] is sitting on a chair [[000,111,222,333]]"
given a prompt and an image, how should I format these labels? Should I pass them as plain text? Don't I need to transform the bounding boxes into patches first?

Thank you in advance!

from cogvlm.

Recommend Projects

Assistance Requested for Fine-Tuning on Visual Question Answering Task about cogvlm HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent