Comments (3)
The output integers are not absolute coordinates. They are the proportion of the axis, i.e., [0.097, 0.514, 0.283, 0.996].
from cogvlm.
By the way, last time we try to input the prompt "Is there any pesons?" with the this image. However, the model only output one bounding box. Generally, when there are multiple objects in the image, the model only tends to predict a box.
from cogvlm.
The traditional REC task tends to find only one thing, but you are right it is better to support many (like detection). I think training on detection data can solve this problem.
from cogvlm.
Related Issues (20)
- Any change the grounding models can answer normal questions
- Question on vision encoder resolution size HOT 1
- How to add lora matrix to CogAgent model via Peft?
- url = MODEL_URLS[name] KeyError HOT 1
- Unrecognized arguments during fine-tuning HOT 2
- Huggingface版本是否提供lora微调的支持? HOT 3
- 官方代码的hf版本的cogvlm模型推理报错Out of Memory HOT 3
- Please release evaluation prompts HOT 1
- 怎么使用最新的模型? HOT 4
- Could you please provide a simple script to use your multimodel for extracting image features and text features like openclip with encode_image() or encode_text()?
- 如何使用CogVLM finetune训练自有数据集具有视觉定位能力的多模态大模型? HOT 2
- Replacing the 224-image_size EVA model in CogAgent model with the 490-image_size one
- Multi Image Detection HOT 2
- Release quantization scripts HOT 1
- cogvlm模型微调报错(使用Captcha Images数据集) HOT 2
- Key_error :'INV_FREQ' HOT 5
- Inquiry on CogVLM's capacity for comprehension HOT 2
- How long does it take to finetune or train the models? HOT 2
- 使用cogagent时,没有如何连接打开浏览器说明?如何连接手机app,有没有使用agent的教程 HOT 1
- Query pdf files instead of images HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cogvlm.