Code Monkey home page Code Monkey logo

Comments (6)

fire avatar fire commented on May 23, 2024

I would like to help write this up. Can you give section headers?

from clip.cpp.

monatis avatar monatis commented on May 23, 2024

hi @fire,

Thanks for offering a hand in this.
My considerations for this issue are as follows:

  • Extending the motivation section with nots for possible use cases, including edge inference on mobile apps and serverless applications because the cold start is an issue with large frameworks such as Pytorch and TensorFlow.
  • Welcoming contributions for issues, pull requests and suggestions as discussions.
  • Being more expressive in the building section --currently it only contains commands.
  • A more appealing visualization, including a header with, for example, icon for license etc. --unfortunately I'm not a visual guy :D
  • A new section for usage of our new benchmarking binary described in #23
  • Anything else you think that can be improved.

Feel free to contribute for any of them.

from clip.cpp.

fire avatar fire commented on May 23, 2024

Motivation for the clip.cpp Project

CLIP helps computers understand images and text together. It's used in many areas, like when you search for an image online or when a computer needs to describe what's in an image without any help.

What's Special About This Project?

  • Size: The size of this project is very small, it can use 85.6 MB multi-modal generative models. This means clip.cpp can be used on devices that don't have a lot of storage space.

  • Startup Time: clip.cpp starts up quickly. This is important because sometimes, programs take a long time to start, especially on servers and phones where starting up quickly is crucial.

from clip.cpp.

fire avatar fire commented on May 23, 2024

A more appealing visualization, including a header with, for example, icon for license etc. --unfortunately I'm not a visual guy :D

Would like a video showing the typing command with a png in a terminal and a photo size by side. the result is returned.

from clip.cpp.

Kwisss avatar Kwisss commented on May 23, 2024

My understanding is that this could be used with a blip caption model, such as ‘blip-base’, for zero-shot image labeling. Is that correct?

I think this project could gain a lot of traction if we can get ViT-bigG-14 and ViT-L-14/openai working. These are the clip models used for text encoding during sdxl training. (ref)

It would be amazing to get blip-base and blip2-2.7b working. I haven’t looked into the papers to find out which caption model they used.

from clip.cpp.

monatis avatar monatis commented on May 23, 2024

this could be used with a blip caption model

Yes, BLIB and other large multimodal models are CLIP feature extractor + some bridging mechnism that projects CLIP hidden states to the language model embeddings + a large language model like OPT, Vicuna, T5 etc. This will be another project, see #31

we can get ViT-bigG-14 and ViT-L-14/openai working

Large OpenAI and Open CLIP variants ar already working in this project. But Stable Diffusion is a long story on its own. It's also another project that I want to use clip.cpp in, but yeah, level of traction is also important to devote time on all of zit.

from clip.cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.