Comments (4)
bbox 也需要按照 字符的输入和输出格式进行匹配,格式为 [x,y,w,h](如果您是为了微调),可以直接放在文字中输出,bbox并不需要单独的处理,直接放在常规文字的后方就行
这是一个例子:
图像中有两只猫,分别是[[x1,y,1,w1.h1],[x2,y2,w2,h2]]这种格式
from cogvlm.
OK!
First of all, thank you very much for the prompt response.
One last question: from the script example with the dataset from https://www.kaggle.com/datasets/aadhavvignesh/captcha-images, only an image is used as input, without any additional text (in my case, the question), and the output is expected to be a string.
While the output is clear to me, I am now unclear on how to pass the image/text input pair.
Can you help me?
Thank you again!
from cogvlm.
Dear @zRzRzRzRzRzRzR
I solved finding the dataset generator: https://github.com/THUDM/CogVLM/blob/main/utils/utils/dataset.py#L20
Thank you anyway!
Bests
Danilo
from cogvlm.
Dear @zRzRzRzRzRzRzR and other authors, I am interested too in fine-tuning your model on a Visual Question Answering task.
Looking through the repository and the scripts, I can't find any mention of how to format my bounding boxes in the labels. As an example, if I want the model to generate in output something like
"The girl [[000,111,222,333]] is standing tall and the boy [[000,111,222,333]] is sitting on a chair [[000,111,222,333]]"
given a prompt and an image, how should I format these labels? Should I pass them as plain text? Don't I need to transform the bounding boxes into patches first?
Thank you in advance!
from cogvlm.
Related Issues (20)
- GPU selection / multi-GPU HOT 2
- Deploy HOT 2
- Chat using one image and three prompt HOT 1
- CogVLM是开放中文模型了吗,开源模型是否已经支持中文提问回答以及中文数据微调呢? HOT 1
- [CogVLM-chat-v1.1] LM weights are different with vicuna-7b-v1.5 HOT 3
- Running Gradio app locally results in inappropriate error: "NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE." HOT 1
- Using CogVLM as an API HOT 1
- Code of finetuning the cogagent on Mind2Web ? HOT 1
- Deploy CogVLM using Docker
- Could we replace the vicuna-7b directly with stronger llm? HOT 1
- 我想用同样的promt,在每次都清除上下文的情况下得到3种答案,为什么结果都是一样的 HOT 2
- Chat with PDF documentation instead of images
- CogAgent 视觉预训练模型 EVA2-CLIP-L HOT 1
- CogVLM源代码是否支持多轮对话训练 HOT 5
- 关于模型视觉定位原理
- 运行微调脚本报错缺少相关参数 HOT 2
- 如何构建CogAgent的微调数据集? HOT 1
- 两张3090微调CogVLM的可能性? HOT 1
- 加载cogvlm-chat-hf模型报错 Error while deserializing header: MetadataIncompleteBuffer
- 我该使用什么格式的输入来用模型进行visual grounding 任务? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cogvlm.