Code Monkey home page Code Monkey logo

Comments (4)

zRzRzRzRzRzRzR avatar zRzRzRzRzRzRzR commented on June 8, 2024

bbox 也需要按照 字符的输入和输出格式进行匹配,格式为 [x,y,w,h](如果您是为了微调),可以直接放在文字中输出,bbox并不需要单独的处理,直接放在常规文字的后方就行

这是一个例子:

图像中有两只猫,分别是[[x1,y,1,w1.h1],[x2,y2,w2,h2]]这种格式

from cogvlm.

crux82 avatar crux82 commented on June 8, 2024

@zRzRzRzRzRzRzR

OK!

First of all, thank you very much for the prompt response.

One last question: from the script example with the dataset from https://www.kaggle.com/datasets/aadhavvignesh/captcha-images, only an image is used as input, without any additional text (in my case, the question), and the output is expected to be a string.

While the output is clear to me, I am now unclear on how to pass the image/text input pair.

Can you help me?

Thank you again!

from cogvlm.

crux82 avatar crux82 commented on June 8, 2024

Dear @zRzRzRzRzRzRzR

I solved finding the dataset generator: https://github.com/THUDM/CogVLM/blob/main/utils/utils/dataset.py#L20

Thank you anyway!

Bests

Danilo

from cogvlm.

basteran avatar basteran commented on June 8, 2024

Dear @zRzRzRzRzRzRzR and other authors, I am interested too in fine-tuning your model on a Visual Question Answering task.

Looking through the repository and the scripts, I can't find any mention of how to format my bounding boxes in the labels. As an example, if I want the model to generate in output something like
"The girl [[000,111,222,333]] is standing tall and the boy [[000,111,222,333]] is sitting on a chair [[000,111,222,333]]"
given a prompt and an image, how should I format these labels? Should I pass them as plain text? Don't I need to transform the bounding boxes into patches first?

Thank you in advance!

from cogvlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.