Demonstrate model conversion, detail how to compile, explain the general API. <p d

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Motivation for the clip.cpp Project <p dir="aut

this could be used with a blip caption model <p dir="au

Write a better readme about clip.cpp HOT 6 CLOSED

monatis commented on May 23, 2024 1

Write a better readme

from clip.cpp.

Comments (6)

fire commented on May 23, 2024

I would like to help write this up. Can you give section headers?

from clip.cpp.

monatis commented on May 23, 2024

hi @fire,

Thanks for offering a hand in this.
My considerations for this issue are as follows:

Extending the motivation section with nots for possible use cases, including edge inference on mobile apps and serverless applications because the cold start is an issue with large frameworks such as Pytorch and TensorFlow.
Welcoming contributions for issues, pull requests and suggestions as discussions.
Being more expressive in the building section --currently it only contains commands.
A more appealing visualization, including a header with, for example, icon for license etc. --unfortunately I'm not a visual guy :D
A new section for usage of our new benchmarking binary described in #23
Anything else you think that can be improved.

Feel free to contribute for any of them.

from clip.cpp.

fire commented on May 23, 2024

Motivation for the `clip.cpp` Project

CLIP helps computers understand images and text together. It's used in many areas, like when you search for an image online or when a computer needs to describe what's in an image without any help.

What's Special About This Project?

Size: The size of this project is very small, it can use 85.6 MB multi-modal generative models. This means clip.cpp can be used on devices that don't have a lot of storage space.
Startup Time: clip.cpp starts up quickly. This is important because sometimes, programs take a long time to start, especially on servers and phones where starting up quickly is crucial.

from clip.cpp.

fire commented on May 23, 2024

A more appealing visualization, including a header with, for example, icon for license etc. --unfortunately I'm not a visual guy :D

Would like a video showing the typing command with a png in a terminal and a photo size by side. the result is returned.

from clip.cpp.

Kwisss commented on May 23, 2024

My understanding is that this could be used with a blip caption model, such as ‘blip-base’, for zero-shot image labeling. Is that correct?

I think this project could gain a lot of traction if we can get ViT-bigG-14 and ViT-L-14/openai working. These are the clip models used for text encoding during sdxl training. (ref)

It would be amazing to get blip-base and blip2-2.7b working. I haven’t looked into the papers to find out which caption model they used.

from clip.cpp.

monatis commented on May 23, 2024

this could be used with a blip caption model

Yes, BLIB and other large multimodal models are CLIP feature extractor + some bridging mechnism that projects CLIP hidden states to the language model embeddings + a large language model like OPT, Vicuna, T5 etc. This will be another project, see #31

we can get ViT-bigG-14 and ViT-L-14/openai working

Large OpenAI and Open CLIP variants ar already working in this project. But Stable Diffusion is a long story on its own. It's also another project that I want to use clip.cpp in, but yeah, level of traction is also important to devote time on all of zit.

from clip.cpp.

Write a better readme about clip.cpp HOT 6 CLOSED

Comments (6)

Motivation for the `clip.cpp` Project

What's Special About This Project?

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent