Comments (6)
I would like to help write this up. Can you give section headers?
from clip.cpp.
hi @fire,
Thanks for offering a hand in this.
My considerations for this issue are as follows:
- Extending the motivation section with nots for possible use cases, including edge inference on mobile apps and serverless applications because the cold start is an issue with large frameworks such as Pytorch and TensorFlow.
- Welcoming contributions for issues, pull requests and suggestions as discussions.
- Being more expressive in the building section --currently it only contains commands.
- A more appealing visualization, including a header with, for example, icon for license etc. --unfortunately I'm not a visual guy :D
- A new section for usage of our new benchmarking binary described in #23
- Anything else you think that can be improved.
Feel free to contribute for any of them.
from clip.cpp.
Motivation for the clip.cpp
Project
CLIP helps computers understand images and text together. It's used in many areas, like when you search for an image online or when a computer needs to describe what's in an image without any help.
What's Special About This Project?
-
Size: The size of this project is very small, it can use 85.6 MB multi-modal generative models. This means
clip.cpp
can be used on devices that don't have a lot of storage space. -
Startup Time:
clip.cpp
starts up quickly. This is important because sometimes, programs take a long time to start, especially on servers and phones where starting up quickly is crucial.
from clip.cpp.
A more appealing visualization, including a header with, for example, icon for license etc. --unfortunately I'm not a visual guy :D
Would like a video showing the typing command with a png in a terminal and a photo size by side. the result is returned.
from clip.cpp.
My understanding is that this could be used with a blip caption model, such as ‘blip-base’, for zero-shot image labeling. Is that correct?
I think this project could gain a lot of traction if we can get ViT-bigG-14 and ViT-L-14/openai working. These are the clip models used for text encoding during sdxl training. (ref)
It would be amazing to get blip-base and blip2-2.7b working. I haven’t looked into the papers to find out which caption model they used.
from clip.cpp.
this could be used with a blip caption model
Yes, BLIB and other large multimodal models are CLIP feature extractor + some bridging mechnism that projects CLIP hidden states to the language model embeddings + a large language model like OPT, Vicuna, T5 etc. This will be another project, see #31
we can get ViT-bigG-14 and ViT-L-14/openai working
Large OpenAI and Open CLIP variants ar already working in this project. But Stable Diffusion is a long story on its own. It's also another project that I want to use clip.cpp in, but yeah, level of traction is also important to devote time on all of zit.
from clip.cpp.
Related Issues (20)
- Support custom mean-std normalization HOT 3
- not enough space in the context's memory pool (on Apple M1 Max, 32GB RAM, clip-vit-b-32) HOT 6
- Provide Python bindings
- [ZSL] Results doesn't match hugging face demo HOT 5
- Segmentation Fault and Core Dump when running image-search-build with Multiple images in folder Using the 4bit model HOT 5
- Can the text-encoder / vision-encoder be optional rather than mandatory? HOT 2
- Publish as a Pip-installable Python package HOT 1
- Improve zero-shot labeling
- python binding: OSError libggml.so: cannot open shared object file HOT 4
- Migrate to GGUF HOT 4
- Move ZSL implementation to `clip` lib as a function
- Support downloading models in Python bindings HOT 1
- Introduce Java bindings
- Support batch inference for models other than patch32 HOT 3
- python bindings🐍: Support for accepting list of Input in the encoding methods HOT 7
- Implement bicubic interpolation
- Can u please make exe of this project? HOT 2
- no module named 'gguf' HOT 2
- Metal support? HOT 5
- Building with -DCLIP_BUILD_IMAGE_SEARCH=ON for image-search fails, ‘cos_gt’ is not a member of ‘unum::usearch’ HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clip.cpp.