monatis / clip.cpp Goto Github PK
View Code? Open in Web Editor NEWCLIP inference in plain C/C++ with no extra dependencies
License: MIT License
CLIP inference in plain C/C++ with no extra dependencies
License: MIT License
use any of the models here: https://huggingface.co/Green-Sky/ggml_openai_clip-vit-base-patch16
clip_model_load: ggml ctx size = 287.12 MB
.................................................clip_model_load: model size = 285.77 MB / num tensors = 397
clip_model_load: 8 MB of compute buffer allocated
clip_model_load: model loadded
ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 16992432, available 16777216)
zsl: /home/green/workspace/clip.cpp/ggml/src/ggml.c:4341: ggml_new_tensor_impl: Assertion `false' failed.
Aborted (core dumped)
I have created a vision-only
model with the convert tool. When im trying to load it and use it with the python binding as shown below my memory just explodes until my system crashes.
model = Clip(
model_path_or_repo_id="/path/to/model",
)
# This is where it explodes
model.load_preprocess_encode_image("/path/to/image")
I have tried both with or without quantization and same results. When I run with same with the full model, i.e. not vision-only
it works. Didn't investigate further, and this is a decent workaround for now for me, just wanted to let you know :)
iterating over a path that has images and calculate the image embeddings in python is really expensive,
I was thinking maybe it can be offloaded into the c code and exposed into the python binding.
as an example, form the doc notebook:
image_files= [...] #list of image paths
##⚠️it take about ~30 min to embed 5000 images of fashion dataset
image_embeddings = [model.load_preprocess_encode_image(im) for im in tqdm(image_files)]
image_embeddings = np.array(image_embeddings, dtype=np.float16)
the favorable behavior may be like this:
images = [...] #list of image paths
# accepting list of image files
# load_preprocess_encode_images iterating and process it in c exposed to the bindings
image_embeddings = model.load_preprocess_encode_images(image_files)
image_embeddings = np.array(image_embeddings, dtype=np.float16)
https://huggingface.co/openai/clip-vit-large-patch14-336/
this model has a larger image? size
lip_model_load: ggml ctx size = 819.86 MB
.........................................................................clip_model_load: model size = 817.09 MB / num tensors = 589
clip_model_load: 16 MB of compute buffer allocated
clip_model_load: model loadded
GGML_ASSERT: /home/green/workspace/clip.cpp/clip.cpp:1086: nx == image_size && ny == image_size
Aborted (core dumped)
Issue Description:
When running the image-search-build
command with a folder containing more than one image, it results in a segmentation fault and core dump.
Problem Description:
Running the image-search-build
command crashes with a segmentation fault when the specified image folder contains more than one image. This issue occurs only when there are multiple images in the folder, while it works fine with a single image.
Steps to Reproduce:
$ bin/image-search-build -m ../models/laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.q4_1.bin ../tests
clip_model_load: loading model from '../models/laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.q4_1.bin' - please wait....................................................clip_model_load: model size = 93.92 MB / num tensors = 397
clip_model_load: model loaded
main: starting base dir scan of '../tests'
main: processing 2 files in 'tests'
.Segmentation fault (core dumped)
Expected Behavior:
The program should handle folders with multiple images correctly and generate the expected output.
Actual Behavior:
The program crashes with a segmentation fault when processing a folder with multiple images, resulting in a core dump.
Additional Information:
Environment Information:
Your help and support in addressing this matter would be greatly appreciated. Thank you!
Hello, first of all thank you for this!
Can you please compile your project so we don't need to compile it ourselves? like stable diffusion cpp so all we need is to download from release and use it.
kind regards
% uname -mps
Darwin arm64 arm
git clone https://github.com/monatis/clip.cpp.git --recurse-submodules clip.cpp
cd clip.cpp
mkdir build
cd build
cmake .. -GNinja
ninja
Error
CMake Warning at ggml/src/CMakeLists.txt:48 (message):
Your arch is announced as x86_64, but it seems to actually be ARM64
See also ggerganov/whisper.cpp#66 (comment)
I'm still not 100% sure whether to call it llava.cpp or by another name to indicate its future support for other multimodal generation models in the future --maybe multimodal.cpp or lmm.cpp (large multimodal model). Open to suggestions by let's call it llava.cpp with a code name.
CMakeLists.txt
with a flag CLIP_STANDALONE
to toggle standalone mode. When ON
, build against the ggml
submodule. When OFF
, build with ggml.h
and ggml.c
files directly included in llama.cpp
.clip.cpp
and llama.cpp
repos as submodules and build with CLIP_STANDALONE=OFF
to build against ggml sources included in llama.cpp.Now that we have Python bindings implemented, it would be great if we provide a Pip-installable package.
There might be some complexities to ship the binary DLL for different platforms and SIMD instructions, but X86_64 binaries with AVX2 for Linux and Windows should be sufficient in the first place. Other platforms and instructions sets (e.g., AVX512) can build from source.
Also, make sure you include the correct python version in the documentation
It should accept one image and at least 2 texts and label the image with one of the texts in a zero-shot fashion.
Together with #60, it would be awesome to support downloading pre-converted models from HF Hub.
Internally, we can have a dict of model names and their URLs.
pip install clip_cpp
works great for X64 Linux distributions and I'm hoping we can expand this to support other architectures. Specifically, I am hoping for Darwin arm64. 🙏 🙇
Since the entire public API is C-compatible, JNI or JNA might be possible now. That could be used for interesting use cases such as image search directly on Android phones etc.
Many companies require a permissive license to allow use of open-source code (typically MIT or Apache2). Some projects wish to restrict use to research or personal (often with Creative Commons BY-NC-SA). For what it's worth, ggml uses the commercial-friendly MIT license.
Would you mind adding a license file that fits with your goals for the project? That clarity would be greatly appreciated.
Line 802 in f2b5c61
loadded
-> loaded
currently lager models dont load. (tested with https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K)
clip_model_load: loading model from '../models/laion_clip-vit-h-14-laion2b-s32b-b79k/ggml-model-f16.bin' - please wait...clip_model_load: n_vocab = 49408
clip_model_load: num_positions = 77
clip_model_load: t_hidden_size = 1024
clip_model_load: t_n_intermediate = 4096
clip_model_load: t_n_head = 16
clip_model_load: t_n_layer = 24
clip_model_load: image_size = 224
clip_model_load: patch_size = 14
clip_model_load: v_hidden_size = 1280
clip_model_load: v_n_intermediate = 5120
clip_model_load: v_n_head = 16
clip_model_load: v_n_layer = 32
clip_model_load: ftype = 1
clip_model_load: ggml ctx size = 1887.22 MB
.................................................................................................................clip_model_load: model size = 1882.50 MB / num tensors = 909
clip_model_load: model loadded
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 100900784, available 100663296)
zsl: /home/green/workspace/clip.cpp/ggml/src/ggml.c:4131: ggml_new_tensor_impl: Assertion `false' failed.
Aborted (core dumped)
after modifying
Line 713 in a12792d
* 100ul
, it gets futher, but now fails with:
clip_model_load: ggml ctx size = 1887.22 MB
.................................................................................................................clip_model_load: model size = 1882.50 MB / num tensors = 909
clip_model_load: model loadded
zsl: /home/green/workspace/clip.cpp/ggml/src/ggml.c:11044: ggml_compute_forward_soft_max_f32: Assertion `!isnan(sp[i])' failed.
zsl: /home/green/workspace/clip.cpp/ggml/src/ggml.c:11044: ggml_compute_forward_soft_max_f32: Assertion `!isnan(sp[i])' failed.
Aborted (core dumped)
Decide on a common dataset to do a proper benchmarking.
Ideally, it should be able to compare both inference speed and the vector quality, e.g., zero-shot labeling, image retrieval etc.
Hi,
I have been trying out the LLaVA models in llama.cpp and they work great! I am curious about the CLIP model and clip.cpp used in llama.cpp:
I would like to use the same CLIP model to encode texts and images from the CLIP models from LLaVA, but it seems like the source code in llama.cpp does not have a way to encode text. I am curious if it is possible to add text encoding capabilities in the clip.cpp inside llama.cpp, or is it possible to load the LLaVA 's CLIP in clip.cpp and gain text encoding capabilities.
I hope this is not confusing and a right place to ask. Thank you for the great work!
As reported in #44, ZSL doesn't match HF's behavior:
After reviewing HF's code for ZSL, I figured out that they don't normalize vectors prior to the dot product calculation in ZSL. In clip.cpp, all encoding functions return normalized vectors, so we need to make normalization optional. This will require a signature change for those functions.
Additionally, we can write a single function that runs the zero-shot labeling task end-to-end, similar to clip_compare_text_and_image
, and expose that function to the Python binding as well.
Currently, we set memory buffer to a fixed size, which can be improved in following way.
clip_model_load
function and uses that one in clip_*_encode
functions.The second path may slow down the initialization for many users, so it should be explicitly turned on even if it's chosen for implementation. I'm not sure that it's user-friendly.
It can be helpful for optimizing the memory usage especially for larger models.
This is necessary for use cases where it is required to access embeddings from Python for further processing. see this
Hi, I am working on a side project off stable-diffusion.cpp
which really needs the functionality of your clip implementation. What's the best way to use clip? Also it seems ggml
version clip.cpp
depends on is really old. Could you update it? Thanks for your efforts.
Hi. I can compile and run clip.cpp's main in the normal way and it works. It's cool, thanks.
I cannot compile and run clip.cpp's image-search functionality by using the cmake -DCLIP_BUILD_IMAGE_SEARCH=ON. When I do that it compiles the normal executables fine but upon reaching image search it fails. I attempted to go into the _deps for usearch and built them myself but that also failed to provide them in the right place, I think. I don't really know what the "error: ‘cos_gt’ is not a member of ‘unum::usearch’ " error means, I've been assuming the libs just aren't being found.
I am attempting to build on Debian 11 with g++ (Debian 10.2.1-6) 10.2.1 20210110. cmake version 3.18.4.
superkuh@janus:~/app_installs/clip.cpp/build4$ cmake -DCLIP_BUILD_IMAGE_SEARCH=ON ..
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Linux detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/superkuh/app_installs/clip.cpp/build4
superkuh@janus:~/app_installs/clip.cpp/build4$ l
total 84K
drwxr-xr-x 13 superkuh superkuh 4.0K Oct 8 12:25 ..
drwxr-xr-x 5 superkuh superkuh 4.0K Oct 8 12:25 _deps
-rw-r--r-- 1 superkuh superkuh 21K Oct 8 12:25 CMakeCache.txt
-rw-r--r-- 1 superkuh superkuh 12K Oct 8 12:25 Makefile
drwxr-xr-x 4 superkuh superkuh 4.0K Oct 8 12:25 ggml
-rw-r--r-- 1 superkuh superkuh 2.1K Oct 8 12:25 cmake_install.cmake
drwxr-xr-x 2 superkuh superkuh 4.0K Oct 8 12:25 bin
drwxr-xr-x 3 superkuh superkuh 4.0K Oct 8 12:25 models
drwxr-xr-x 4 superkuh superkuh 4.0K Oct 8 12:25 examples
drwxr-xr-x 3 superkuh superkuh 4.0K Oct 8 12:25 tests
-rw-r--r-- 1 superkuh superkuh 7.4K Oct 8 12:25 compile_commands.json
drwxr-xr-x 5 superkuh superkuh 4.0K Oct 8 12:25 CMakeFiles
drwxr-xr-x 9 superkuh superkuh 4.0K Oct 8 12:25 .
superkuh@janus:~/app_installs/clip.cpp/build4$ make
Scanning dependencies of target ggml
[ 4%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[ 8%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 13%] Linking C static library libggml.a
[ 13%] Built target ggml
Scanning dependencies of target clip
[ 17%] Building CXX object CMakeFiles/clip.dir/clip.cpp.o
[ 21%] Linking CXX static library libclip.a
[ 21%] Built target clip
Scanning dependencies of target quantize
[ 26%] Building CXX object models/CMakeFiles/quantize.dir/quantize.cpp.o
[ 30%] Linking CXX executable ../bin/quantize
[ 30%] Built target quantize
Scanning dependencies of target common-clip
[ 34%] Building CXX object examples/CMakeFiles/common-clip.dir/common-clip.cpp.o
[ 39%] Linking CXX static library libcommon-clip.a
[ 39%] Built target common-clip
Scanning dependencies of target extract
[ 43%] Building CXX object examples/CMakeFiles/extract.dir/extract.cpp.o
[ 47%] Linking CXX executable ../bin/extract
[ 47%] Built target extract
Scanning dependencies of target simple_c
[ 52%] Building C object examples/CMakeFiles/simple_c.dir/simple.c.o
[ 56%] Linking CXX executable ../bin/simple_c
[ 56%] Built target simple_c
Scanning dependencies of target zsl
[ 60%] Building CXX object examples/CMakeFiles/zsl.dir/zsl.cpp.o
[ 65%] Linking CXX executable ../bin/zsl
[ 65%] Built target zsl
Scanning dependencies of target main
[ 69%] Building CXX object examples/CMakeFiles/main.dir/main.cpp.o
[ 73%] Linking CXX executable ../bin/main
[ 73%] Built target main
Scanning dependencies of target image-search
[ 78%] Building CXX object examples/image-search/CMakeFiles/image-search.dir/search.cpp.o
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp: In function ‘int main(int, char**)’:
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:114:44: error: ‘cos_gt’ is not a member of ‘unum::usearch’
114 | unum::usearch::index_gt<unum::usearch::cos_gt<float>> embd_index;
| ^~~~~~
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:114:44: error: ‘cos_gt’ is not a member of ‘unum::usearch’
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:114:56: error: template argument 1 is invalid
114 | unum::usearch::index_gt<unum::usearch::cos_gt<float>> embd_index;
| ^~
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:116:16: error: request for member ‘view’ in ‘embd_index’, which is of non-class type ‘int’
116 | embd_index.view("images.usearch");
| ^~~~
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:127:47: error: request for member ‘size’ in ‘embd_index’, which is of non-class type ‘int’
127 | if (image_file_index.size() != embd_index.size()) {
| ^~~~
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:158:31: error: request for member ‘search’ in ‘embd_index’, which is of non-class type ‘int’
158 | auto results = embd_index.search({vec.data(), vec.size()}, params.n_results);
| ^~~~~~
make[2]: *** [examples/image-search/CMakeFiles/image-search.dir/build.make:82: examples/image-search/CMakeFiles/image-search.dir/search.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:473: examples/image-search/CMakeFiles/image-search.dir/all] Error 2
make: *** [Makefile:149: all] Error 2
I have looked in compile_commands.json for the relevant section which is failing and tried it by itself many times while altering other aspects (like where the _dep compiled stuff I did manually was).
superkuh@janus:~/app_installs/clip.cpp/build4/examples/image-search$ /usr/bin/c++ -DUSEARCH_USE_NATIVE_F16=0 -DUSEARCH_USE_OPENMP=0 -DUSEARCH_USE_SIMSIMD=0 -I/home/superkuh/app_installs/clip.cpp/. -I/home/superkuh/app_installs/clip.cpp/examples -I/home/superkuh/app_installs/clip.cpp/ggml/src/. -I/home/superkuh/app_installs/clip.cpp/ggml/src/../include -I/home/superkuh/app_installs/clip.cpp/ggml/src/../include/ggml -I/home/superkuh/app_installs/clip.cpp/build4/_deps/usearch-src/include -O3 -DNDEBUG -march=native -mf16c -mfma -mavx -mavx2 -o CMakeFiles/image-search.dir/search.cpp.o -c /home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp: In function ‘int main(int, char**)’:
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:114:44: error: ‘cos_gt’ is not a member of ‘unum::usearch’
114 | unum::usearch::index_gt<unum::usearch::cos_gt<float>> embd_index;
| ^~~~~~
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:114:44: error: ‘cos_gt’ is not a member of ‘unum::usearch’
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:114:56: error: template argument 1 is invalid
114 | unum::usearch::index_gt<unum::usearch::cos_gt<float>> embd_index;
| ^~
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:116:16: error: request for member ‘view’ in ‘embd_index’, which is of non-class type ‘int’
116 | embd_index.view("images.usearch");
| ^~~~
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:127:47: error: request for member ‘size’ in ‘embd_index’, which is of non-class type ‘int’
127 | if (image_file_index.size() != embd_index.size()) {
| ^~~~
/home/superkuh/app_installs/clip.cpp/examples/image-search/search.cpp:158:31: error: request for member ‘search’ in ‘embd_index’, which is of non-class type ‘int’
158 | auto results = embd_index.search({vec.data(), vec.size()}, params.n_results);
| ^~~~~~
I assumed the deps weren't being built. So I tried to build them myself,
superkuh@janus:~/app_installs/clip.cpp/build4/_deps/usearch-src$ cmake -DUSEARCH_BUILD_CLIB=YES .
-- Configuring done
-- Generating done
-- Build files have been written to: /home/superkuh/app_installs/clip.cpp/build4/_deps/usearch-src
superkuh@janus:~/app_installs/clip.cpp/build4/_deps/usearch-src$ make
[ 33%] Built target bench
[ 66%] Built target test
Scanning dependencies of target usearch_c
[ 83%] Building CXX object c/CMakeFiles/usearch_c.dir/lib.cpp.o
[100%] Linking CXX shared library ../libusearch_c.so
[100%] Built target usearch_c
But this, or similar attempts to build the various other parts of USearch _deps did not help. I cannot seem to get the compile of clip.cpp to know where the USearch libs are. If this is actually the problem.
Some LAION checkpoints, the large variant for example, use different mean and std values for image normalization. Figure out a way to, preferably, encode what values to use, or introduce another funcion to preprocess with these values.
Use image only for scanning a image and finding its classes
GGUF library is needed to convert the model so it should be added into the requirements.txt and description.
I just read through the open_clip readme and found this section:
NOTE: Many existing checkpoints use the QuickGELU activation from the original OpenAI models. This activation is actually less efficient than native torch.nn.GELU in recent versions of PyTorch. The model defaults are now nn.GELU, so one should use model definitions with -quickgelu postfix for the OpenCLIP pretrained weights. All OpenAI pretrained weights will always default to QuickGELU. One can also use the non -quickgelu model definitions with pretrained weights using QuickGELU but there will be an accuracy drop, for fine-tune that will likely vanish for longer runs.
does that mean, that we should not use QuickGELU/insert it as a hyperparameter ?
Hi, awesome work on this project!
I'm building some Swift apps using llama.cpp, and I'd love to try getting clip.cpp running on my app too.
I'm curious if you're going to support running clip.cpp on Metal like llama.cpp?
Would love if the load_preprocess_encode_image
python binding allowed me to pass in a PIL image instead of just a file path. In workloads where you already have an image in memory, it's a huge hit to latency to have to save the image to disk just so it can be passed into a clip_cpp model. Anytime we can avoid disk reads/writes is a benefit.
(Note it doesn't have to be a PIL Image, but that's what I have.)
I'm testing the python binding from ./examples
on a Darwin arm64 (Apple Silicon M2) system. I run into an AttributeError
in line 202 and again in line 206.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.11/ctypes/__init__.py", line 389, in __getattr__
func = self.__getitem__(name)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.11/ctypes/__init__.py", line 394, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: dlsym(0x8ea2fac0, make_clip_image_u8): symbol not found
I compiled using these options:
cmake -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DCLIP_NATIVE=ON -DBUILD_SHARED_LIBS=ON ..
Any help would be much appreciated 🙏 I'm new to cpp. @monatis
Been following your progress with anticipation. Here are a couple of notes:
"projection_dim"
from config
instead (both times).model | a dog |
a red apple |
---|---|---|
openai b32 | 0.228 | 0.341 |
laion b32 laion2b s34b b79k | 0.126 | 0.345 |
keep up the good work!
Hello, first super thanks for your awesome work in the clip.cpp
,
I encountered an error wile experimenting with your python binding.
installing clip_cpp
python binding with:
pip install clip_cpp
while the install is successful importing the Clip
model through this
from clip_cpp import Clip
result in this error, which it cannot link the libggml.so
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
[<ipython-input-3-3550bf91251e>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from clip_cpp import Clip
2 frames
[/usr/lib/python3.10/ctypes/__init__.py](https://localhost:8080/#) in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
372
373 if handle is None:
--> 374 self._handle = _dlopen(self._name, mode)
375 else:
376 self._handle = handle
OSError: libggml.so: cannot open shared object file: No such file or directory
I'm running this model clip-vit-base-patch32_ggml on my intel mac and it looks like the lower the quantization the slower image encoding is. I tried the main clip-vit-base-patch32_ggml-model-f32.gguf
model and the q8_0
and q4_0
variants.
These are the encode times I get for a batch of 4 images:
clip-vit-base-patch32_ggml-model-f32.gguf
Avg Batch Img Encode Time: 272.21ms
clip-vit-base-patch32_ggml-model-f16.gguf
Avg Batch Img Encode Time: 665.07ms
clip-vit-base-patch32_ggml-model-q8_0.gguf
Avg Batch Img Encode Time: 333.96ms
clip-vit-base-patch32_ggml-model-q5_1.gguf
Avg Batch Img Encode Time: 322.71ms
clip-vit-base-patch32_ggml-model-q5_0.gguf
Avg Batch Img Encode Time: 354.86ms
clip-vit-base-patch32_ggml-model-q4_1.gguf
Avg Batch Img Encode Time: 330.20ms
clip-vit-base-patch32_ggml-model-q4_0.gguf
Avg Batch Img Encode Time: 539.32ms
f16
looks like an outlier, taking the most time.
But looking at f32(272.21ms)
-> q8_0(333.96ms)
-> q5_0(354.86ms)
-> q4_0(539.32ms)
, time is getting worse. Its better with the _1
variants though.
Anyone know if this expected or is there something wrong?
THe implementation should be copied from zsl.cpp
as a single function, similar to clip_compare_text_and_image
, and it should implement the logic end-to-end.
Bonus: It can support multi-label scheme as HuggingFace's ZSL pipeline, e.g., do not squash all the scores into softmax.
Currently clip.cpp uses linear interpolation in image preprocessing. The original implementation uses the bicubic interpolation from Pillow. It needs refactoring from Pillow https://github.com/python-pillow/Pillow/blob/main/src/libImaging/Resample.c#L46-L62
As I last checked, ggml supports CONV_2D, so it should work? or even expand to any tranditional CNN like resnet?
Especially image encoding seems to be doable with reasonable effort.
I set batch dimension manually to 1, instead it can be set to the actual number of images. Concatenation may need extra attention.
Hi there,
Thank you so much for making this library. I'm unfortunately running into the following error
./main --model '/Users/lucasigel/Downloads/laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.q4_0.bin' --text "test" --image '/00000002.jpg' -v 1
clip_model_load: loading model from '/Users/lucasigel/Downloads/laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.q4_0.bin' - please wait....................................................clip_model_load: model size = 85.06 MB / num tensors = 397
clip_model_load: model loaded
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 12051936, available 8388608)
Assertion failed: (false), function ggml_new_tensor_impl, file ggml.c, line 4449.
zsh: abort ./main --model --text "test" --image -v 1
I'm running on a Mac Studio with M1 Max and 32 GB of RAM. I tried every available model binary on huggingface and still got the same memory pool error. Is this due to a memory allocation bug? I see in #17 that this got solved for some cases and I'm wondering if there are lingering issues here
The current batch inference code is only applicable to the patch32 model. When using other models such as patch16 or patch14, it produces incorrect results. Specifically, the behavior is such that the first embedding result in a batch is correct, but all subsequent results are a single incorrect fixed value.
which causes the model to leak.
./bin/zsl -m ../../laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin --image ../pic.png --text "playing music" --text "playing sports"
clip_model_load: loading model from '../../laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin' - please wait....................................................clip_model_load: model size = 288.93 MB / num tensors = 397
clip_model_load: model loaded
playing music = 0.5308
playing sports = 0.4692
Expected results:
playing music = 1.000
playing sports = 0.000
https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K
Demonstrate model conversion, detail how to compile, explain the general API.
Talk about possible usage scenarios, especially the cold start issue.
It looks like clip_tokenize
and clip_image_preprocess
both allocate arrays:
Line 721 in 8f34872
https://github.com/monatis/clip.cpp/blob/8f348725271db67517de871dea4a4e8a159e664f/clip.cpp#675
however there is no way to free these arrays. I'm not sure if I'm missing something here with C++, but these methods should either provide a free or the ability to pass your own buffer.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.