Comments (6)
@wacdev Thanks for the question. We have added the "Caption everything in a paragraph" feature.
from caption-anything.
@ttengwang does this "Caption everything in a paragraph" feature rely on openai chatgpt?
can we use bing gpt instead
from caption-anything.
Yes, a chatGPT-like LLM is required for paragraph generation. It is ok to replace it with another gpt, as long as there is an API available to facilitate the integration.
from caption-anything.
@ttengwang
another question
without click to drive the prompt, what input gpt consume?
can you add explanation to existing click driven image like this
https://github.com/ttengwang/Caption-Anything/blob/main/assets/demo1.png
at last I just want to thank you for your work, this definitely give me confidence and a great start to catch on.
during last 2 years I have dig about Audio description service which I want to integrate a affordable wearable camera to aid visual impairment people in their daily life.
Audio description (also referred to as “description” or “video description”) is defined as “the verbal depiction of key visual elements in media and live productions.” AD is meant to provide information on visual content that is considered essential to the comprehension of the program
from caption-anything.
Thank you so much for your kind words and encouragement. It truly means a lot to our team. You can check out our technical report for more details https://arxiv.org/pdf/2305.02677.pdf
The description of "paragraph generation" is at the bottom of page 5.
from caption-anything.
gotta
@ttengwang
from caption-anything.
Related Issues (20)
- Related work HOT 2
- Demo crashes both in terminal and in colab HOT 1
- 添加了 "--gradio_share" 后运行demo报错 HOT 2
- Torch 1.10.1 not available HOT 2
- Trajectory Support HOT 2
- Hugging Face demo is wrong
- Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference
- Segmentation fault (core dumped) HOT 2
- Hugging Face demo is wrong
- Hugging Face demo is wrong
- successfully implemented it without gpt version
- My server can't network to call ChatGPT.
- 进去后怎么点击按钮没有反应呢 HOT 1
- 支持中文对话吗?
- [strange] Dense captions are all filtered out HOT 1
- Can detailed image description text be generated?
- How many tokens would one query use?
- Out of memory with 16G menory + 12G video memory HOT 1
- About interaction HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from caption-anything.