maartengr / concept Goto Github PK

View Code? Open in Web Editor NEW

192.0 5.0 16.0 4.97 MB

Concept Modeling: Topic Modeling on Images and Text

Home Page: https://maartengr.github.io/Concept/

License: MIT License

Python 100.00%

nlp topic-modeling computer-vision image-processing

concept's Introduction

Concept

Concept is a technique that leverages CLIP and BERTopic-based techniques to perform Concept Modeling on images.

Since topics are part of conversations and text, they do not represent the context of images well. Therefore, these clusters of images are referred to as 'Concepts' instead of the traditional 'Topics'.

Thus, Concept Modeling takes inspiration from topic modeling techniques to cluster images, find common concepts and model them both visually using images and textually using topic representations.

Installation

Installation, with sentence-transformers, can be done using pypi:

pip install concept

Quick Start

First, we need to download and extract 25.000 images from Unsplash used in the sentence-transformers example:

import os
import glob
import zipfile
from tqdm import tqdm
from sentence_transformers import util

# 25k images from Unsplash
img_folder = 'photos/'
if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:
    os.makedirs(img_folder, exist_ok=True)
    
    photo_filename = 'unsplash-25k-photos.zip'
    if not os.path.exists(photo_filename):   #Download dataset if does not exist
        util.http_get('http://sbert.net/datasets/'+photo_filename, photo_filename)
        
    #Extract all images
    with zipfile.ZipFile(photo_filename, 'r') as zf:
        for member in tqdm(zf.infolist(), desc='Extracting'):
            zf.extract(member, img_folder)
img_names = list(glob.glob('photos/*.jpg'))

Next, we only need to pass images to Concept:

from concept import ConceptModel
concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names)

The resulting concepts can be visualized through concept_model.visualize_concepts():

However, to get the full experience, we need to label the concept clusters with topics. To do this, we need to create a vocabulary. We are going to feed our model with 50.000 nouns from the English vocabulary:

import random
import nltk
nltk.download("wordnet")
from nltk.corpus import wordnet as wn

all_nouns = [word for synset in wn.all_synsets('n') for word in synset.lemma_names() if "_" not in word]
selected_nouns = random.sample(all_nouns, 50_000)

Then, we can pass in the resulting selected_nouns to Concept:

from concept import ConceptModel

concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names, docs=selected_nouns)

Again, the resulting concepts can be visualized. This time however, we can also see the generated topics through concept_model.visualize_concepts():

NOTE: Use Concept(embedding_model="clip-ViT-B-32-multilingual-v1") to select a model that supports 50+ languages.

Search Concepts

We can quickly search for specific concepts by embedding a search term and finding the cluster embeddings that best represent them. As an example, let us search for the term beach and see what we can find. To do this, we simply run the following:

>>> concept_model.find_concepts("beach")
[(100, 0.277577825349102),
 (53, 0.27431058773894657),
 (95, 0.25973751319723837),
 (77, 0.2560122597417548),
 (97, 0.25361988261846297)]

Each tuple contains two values, the first is the concept cluster and the second the similarity to the search term. The top 5 similar topics are returned.

Now, let us visualize those concepts to see how well the search function works:

concept_model.visualize_concepts(concepts=[100, 53, 95, 77, 97])

concept's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes dumpmemory krishnamanoj-kota caroheymes profshen personx000 loretoparisi mivanovitch neurotech-hq hazellinhao aahmadai trifle syeung19 vinodbariha sule2019

concept's Issues

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

Hi there,

I am trying to run Concept on a very small dataset of images (10 images in jpg) but while I can run it on the sample you provided (Colab) I get the following error with my dataset. Any idea what might be the issue?

Aside from this specific issue, this is an amazing work!

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

from bertopic import BERTopic

Define seed words for topics

seed_words = [
['software', 'programming', 'Python', 'Java', 'machine learning', 'data visualization'],
['project management', 'leadership'],
['healthcare', 'medical research', 'patient care', 'disease prevention']
]

Sample documents (text data)

documents = [
"This is about software development and programming languages like Python and Java.",
"Finance and banking are important topics in the economy.",
"Project management and leadership skills are essential for success.",
"Healthcare and medical research focus on patient care and disease prevention."
]

Initialize BERTopic model with seed_topic_list

model = BERTopic(seed_topic_list=seed_words)

Fit and transform documents to obtain topics and probabilities

topics, probabilities = model.fit_transform(documents)

Display the assigned topics for each document

for i, (doc, topic) in enumerate(zip(documents, topics)):
print(f"Document {i+1}: Topic {topic} - '{doc}'")

Error

/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py:1600: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead.
warnings.warn("k >= N for N * N square matrix. "
/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py:1600: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead.
warnings.warn("k >= N for N * N square matrix. "

TypeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/bertopic/_bertopic.py in _reduce_dimensionality(self, embeddings, y, partial_fit)
3471 y = np.array(y) if y is not None else None
-> 3472 self.umap_model.fit(embeddings, y=y)
3473 except TypeError:

14 frames
TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py in eigsh(A, k, M, sigma, which, v0, ncv, maxiter, tol, return_eigenvectors, Minv, OPinv, mode)
1603
1604 if issparse(A):
-> 1605 raise TypeError("Cannot use scipy.linalg.eigh for sparse A with "
1606 "k >= N. Use scipy.linalg.eigh(A.toarray()) or"
1607 " reduce k.")

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

Using GPU while processing concepts

Any plans on supporting GPU processing for this to speed up the model creation?

Multilingual support

Code for English:

from concept import ConceptModel
concept_model = ConceptModel()
concepts = concept_model.fit_transform(images, docs)
# Works correctly!

Guide suggests "Use Concept(embedding_model="clip-ViT-B-32-multilingual-v1") to select a model that supports 50+ languages.":

from concept import Concept
# ImportError: cannot import name 'Concept' from 'concept' --> I guess you mean to import ConceptModel

Importing ConceptModel:

from concept import ConceptModel
concept_model = ConceptModel(embedding_model="clip-ViT-B-32-multilingual-v1")
concepts = concept_model.fit_transform(images, docs)
# TypeError: 'JpegImageFile' object is not subscriptable

Index Error: index out of bounds error for visualize concepts

I ran the sample code for a custom data set and got the following error when I tried to visualize the concepts. Any help here would be appreciated.

AttributeError: 'ConceptModel' object has no attribute 'image_cluster_df'

Reinstalled sklearn to pre 1.0 from this thread: #19

still getting the error.

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

Trying to run this code on Google Colab and seeing this error now. Simply just trying to use the demo provided in this repo, but now it's throwing the following error:

AttributeError Traceback (most recent call last)
in
3 # Fit the Concept model to the images and vocabulary
4 concept_model = ConceptModel()
----> 5 concepts = concept_model.fit_transform(img_names, docs=selected_nouns)
6
7 # Get the predicted probabilities for each concept cluster for each image

1 frames
/usr/local/lib/python3.9/dist-packages/concept/_model.py in _extract_textual_representation(self, docs)
400 # Extract vocabulary from the documents
401 self.vectorizer_model.fit(docs)
--> 402 words = self.vectorizer_model.get_feature_names()
403
404 # Embed the documents and extract similarity between concept clusters and words

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

Questions

Hello,

Thank you for sharing you great work. I'd like to have a better understanding of the "fit_transform" function.

How do you intend to use the parameter "image_names" ? For instance, i'd like to classify facebook posts. Does it means that I can pass posts messages with images embeddings to improve topics results ? Can you share any example of code using this parameter ?

Is it possible to return top keywords describing each topic ? As far as I understand your code 'fit_transform' returns only the list of topic predictions.

Thank you very much

discussion on different concepts results

Hi.

Thank you for this library. It is really helpful.
I am using concept modeling to cluster images and do some analysis on the results.
I modified the use of find_concepts() ( that initially was meant to find the top 5 related concepts based on a search term) to find the top 5 related concepts given an image (by simply passing the path to an image and obtain the embeddings of the image with the embedding model).
However I noticed that in many cases the top-1 most related cluster is different from the cluster that is returned by fit_transform(). Sometimes the concept is in second position, but in many cases it is in positions >2. Any idea on why this might be happening?

Thank you for your time.
Best wishes.

sentence-transformers version

In the current version, Concept it pinned to use version 1.2.0 of sentence-transformers. Are there any plans to unpin this and use a more recent version of sentence-transformers?

Saving the model

Hi.

Thank you very much for creating this. It is an absolutely brilliant idea. Once we have created the model, how do we save the model and use it for any new data that comes in?

Pandas key error during model fitting

I tried the demo code and it worked for a small sample, tried to feed it more images and I got this error
KeyError: '[-1] not found in axis'

dependencies:
concept=='0.2.1'
pandas=1.4.0

/home/<username>/anaconda3/envs/rd38/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
  warnings.warn(
100%|███████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:21<00:00,  1.06s/it]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [30], in <cell line: 3>()
      1 from concept import ConceptModel
      2 concept_model = ConceptModel()
----> 3 concepts = concept_model.fit_transform(img_names[3500:6000])

File ~/anaconda3/envs/rd38/lib/python3.8/site-packages/concept/_model.py:124, in ConceptModel.fit_transform(self, images, docs, image_names, image_embeddings)
    122 # Reduce dimensionality and cluster images into concepts
    123 reduced_embeddings = self._reduce_dimensionality(image_embeddings)
--> 124 predictions = self._cluster_embeddings(reduced_embeddings)
    126 # Extract representative images through exemplars
    127 representative_images = self._extract_exemplars(image_names)

File ~/anaconda3/envs/rd38/lib/python3.8/site-packages/concept/_model.py:261, in ConceptModel._cluster_embeddings(self, embeddings)
    257 self.cluster_labels = sorted(list(set(self.hdbscan_model.labels_)))
    258 predicted_clusters = list(self.hdbscan_model.labels_)
    260 self.frequency = (
--> 261     pd.DataFrame({"Cluster": predicted_clusters, "Count": predicted_clusters})
    262       .groupby("Cluster")
    263       .count()
    264       .drop(-1)
    265       .sort_values("Count", ascending=False)
    266 )
    267 return predicted_clusters

File ~/anaconda3/envs/rd38/lib/python3.8/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/anaconda3/envs/rd38/lib/python3.8/site-packages/pandas/core/frame.py:4956, in DataFrame.drop(self, labels, axis, index, columns, level, inplace, errors)
   4808 @deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "labels"])
   4809 def drop(
   4810     self,
   (...)
   4817     errors: str = "raise",
   4818 ):
   4819     """
   4820     Drop specified labels from rows or columns.
   4821 
   (...)
   4954             weight  1.0     0.8
   4955     """
-> 4956     return super().drop(
   4957         labels=labels,
   4958         axis=axis,
   4959         index=index,
   4960         columns=columns,
   4961         level=level,
   4962         inplace=inplace,
   4963         errors=errors,
   4964     )

File ~/anaconda3/envs/rd38/lib/python3.8/site-packages/pandas/core/generic.py:4279, in NDFrame.drop(self, labels, axis, index, columns, level, inplace, errors)
   4277 for axis, labels in axes.items():
   4278     if labels is not None:
-> 4279         obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   4281 if inplace:
   4282     self._update_inplace(obj)

File ~/anaconda3/envs/rd38/lib/python3.8/site-packages/pandas/core/generic.py:4323, in NDFrame._drop_axis(self, labels, axis, level, errors, consolidate, only_slice)
   4321         new_axis = axis.drop(labels, level=level, errors=errors)
   4322     else:
-> 4323         new_axis = axis.drop(labels, errors=errors)
   4324     indexer = axis.get_indexer(new_axis)
   4326 # Case for non-unique axis
   4327 else:

File ~/anaconda3/envs/rd38/lib/python3.8/site-packages/pandas/core/indexes/base.py:6644, in Index.drop(self, labels, errors)
   6642 if mask.any():
   6643     if errors != "ignore":
-> 6644         raise KeyError(f"{list(labels[mask])} not found in axis")
   6645     indexer = indexer[~mask]
   6646 return self.delete(indexer)

KeyError: '[-1] not found in axis'

Exemplar dict is not serializable

Hi, thanks for your awesome libraries.

Just a short question: In this line:

Concept/concept/_model.py

Line 304 in d270607

    
           representative_images[cluster] = {"Indices": [int(index) for index in exemplars],

you're casting the numpy int64s to integers, presumably so they can be used as indexes?
In any case, the cluster keys remain np.int64. This means the whole dict cannot be serialized (as json doesn't know how to handle numpy data types).

My suggestion would be to int() the keys as well to make this a bit less perplexing. But I'm not sure if you rely on the indexes being np.int64 in some other place?

ValueError: operands could not be broadcast together with shapes (4,224,224) (3,)

Running a Concept example on OS S Monterey 12.3.1
...Transformers/Image_utils #143:
return (image - mean) / std

image is (4,224,224)
mean is (3,)
std is (3,)

Python 3.8.13 
% pip show tensorflow_macos
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: tensorflow-macos
Version: 2.8.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: [email protected]
License: Apache 2.0
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, setuptools, six, tensorboard, termcolor, tf-estimator-nightly, typing-extensions, wrapt
Required-by: 

pip show sentence_transformers
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: sentence-transformers
Version: 2.1.0
Summary: Sentence Embeddings using BERT / RoBERTa / XLM-R
Home-page: https://github.com/UKPLab/sentence-transformers
Author: Nils Reimers
Author-email: [email protected]
License: Apache License 2.0
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: huggingface-hub, nltk, numpy, scikit-learn, scipy, sentencepiece, tokenizers, torch, torchvision, tqdm, transformers
Required-by: bertopic, concept

% pip show transformers
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: transformers
Version: 4.11.3
Summary: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
Home-page: https://github.com/huggingface/transformers
Author: Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Sam Shleifer, Patrick von Platen, Sylvain Gugger, Suraj Patil, Stas Bekman, Google AI Language Team Authors, Open AI team Authors, Facebook AI Authors, Carnegie Mellon University Authors
Author-email: [email protected]
License: Apache
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, sacremoses, tokenizers, tqdm
Required-by: sentence-transformers

Here's the code:

import os
import glob
import zipfile
from tqdm import tqdm
from sentence_transformers import util

# 25k images from Unsplash
img_folder = 'photos/'
if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:
    os.makedirs(img_folder, exist_ok=True)

    photo_filename = 'unsplash-25k-photos.zip'
    if not os.path.exists(photo_filename):  # Download dataset if does not exist
        util.http_get('http://sbert.net/datasets/' + photo_filename, photo_filename)

    # Extract all images
    with zipfile.ZipFile(photo_filename, 'r') as zf:
        for member in tqdm(zf.infolist(), desc='Extracting'):
            zf.extract(member, img_folder)
img_names = list(glob.glob('photos/*.jpg'))

from concept import ConceptModel
concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names)

B/s]
  0%|                                                   | 0/196 [00:00<?, ?it/s]/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/transformers/feature_extraction_utils.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)
  tensor = as_tensor(value)
  5%|█▉                                         | 9/196 [02:21<48:54, 15.69s/it]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [2], in <cell line: 3>()
      1 from concept import ConceptModel
      2 concept_model = ConceptModel()
----> 3 concepts = concept_model.fit_transform(img_names)

File ~/Concept/concept/_model.py:120, in ConceptModel.fit_transform(self, images, docs, image_names, image_embeddings)
    118 # Calculate image embeddings if not already generated
    119 if image_embeddings is None:
--> 120     image_embeddings = self._embed_images(images)
    122 # Reduce dimensionality and cluster images into concepts
    123 reduced_embeddings = self._reduce_dimensionality(image_embeddings)

File ~/Concept/concept/_model.py:224, in ConceptModel._embed_images(self, images)
    221 end_index = (i * batch_size) + batch_size
    223 images_to_embed = [Image.open(filepath) for filepath in images[start_index:end_index]]
--> 224 img_emb = self.embedding_model.encode(images_to_embed, show_progress_bar=False)
    225 embeddings.extend(img_emb.tolist())
    227 # Close images

File ~/tensorflow-metal/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py:153, in SentenceTransformer.encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
    151 for start_index in trange(0, len(sentences), batch_size, desc="Batches", disable=not show_progress_bar):
    152     sentences_batch = sentences_sorted[start_index:start_index+batch_size]
--> 153     features = self.tokenize(sentences_batch)
    154     features = batch_to_device(features, device)
    156     with torch.no_grad():

File ~/tensorflow-metal/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py:311, in SentenceTransformer.tokenize(self, texts)
    307 def tokenize(self, texts: Union[List[str], List[Dict], List[Tuple[str, str]]]):
    308     """
    309     Tokenizes the texts
    310     """
--> 311     return self._first_module().tokenize(texts)

File ~/tensorflow-metal/lib/python3.8/site-packages/sentence_transformers/models/CLIPModel.py:71, in CLIPModel.tokenize(self, texts)
     68 if len(images) == 0:
     69     images = None
---> 71 inputs = self.processor(text=texts_values, images=images, return_tensors="pt", padding=True)
     72 inputs['image_text_info'] = image_text_info
     73 return inputs

File ~/tensorflow-metal/lib/python3.8/site-packages/transformers/models/clip/processing_clip.py:148, in CLIPProcessor.__call__(self, text, images, return_tensors, **kwargs)
    145     encoding = self.tokenizer(text, return_tensors=return_tensors, **kwargs)
    147 if images is not None:
--> 148     image_features = self.feature_extractor(images, return_tensors=return_tensors, **kwargs)
    150 if text is not None and images is not None:
    151     encoding["pixel_values"] = image_features.pixel_values

File ~/tensorflow-metal/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py:150, in CLIPFeatureExtractor.__call__(self, images, return_tensors, **kwargs)
    148     images = [self.center_crop(image, self.crop_size) for image in images]
    149 if self.do_normalize:
--> 150     images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
    152 # return as BatchFeature
    153 data = {"pixel_values": images}

File ~/tensorflow-metal/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py:150, in <listcomp>(.0)
    148     images = [self.center_crop(image, self.crop_size) for image in images]
    149 if self.do_normalize:
--> 150     images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
    152 # return as BatchFeature
    153 data = {"pixel_values": images}

File ~/tensorflow-metal/lib/python3.8/site-packages/transformers/image_utils.py:143, in ImageFeatureExtractionMixin.normalize(self, image, mean, std)
    141     return (image - mean[:, None, None]) / std[:, None, None]
    142 else:
--> 143     return (image - mean) / std

ValueError: operands could not be broadcast together with shapes (4,224,224) (3,)

The exception is in the normalize() function ... I believe in the 9th Pil image:

TypeError: init() got an unexpected keyword argument 'cachedir'

I was reproducing the same Colab notebook in the ReadME without any change:
https://colab.research.google.com/drive/1XHwQPT2itZXu1HayvGoj60-xAXxg9mqe?usp=sharing#scrollTo=VcgGxrLH-AU9

While importing the library from concept import ConceptModel, this error appears:

TypeError: init() got an unexpected keyword argument 'cachedir'

Apparently it stems from hdbscan module as cachedir was removed from joblib.Memory.
https://github.com/joblib/joblib/blame/3fb7fbde772e10415f879e0cb7e5d986fede8460/joblib/memory.py#L910

OSError: [Errno 24] Too many open files: 'photos/icnZ2R8PcDs.jpg'

What do recommend setting max_open_files to?

images = [Image.open("photos/"+filepath) for filepath in tqdm(img_names[:5000])]
image_names = img_names[:5000]
image_embeddings = img_embeddings[:5000]

54%|███████████████████▍                | 2693/5000 [00:00<00:00, 13545.87it/s]
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 images = [Image.open("photos/"+filepath) for filepath in tqdm(img_names[:5000])]
      2 image_names = img_names[:5000]
      3 image_embeddings = img_embeddings[:5000]

Input In [4], in <listcomp>(.0)
----> 1 images = [Image.open("photos/"+filepath) for filepath in tqdm(img_names[:5000])]
      2 image_names = img_names[:5000]
      3 image_embeddings = img_embeddings[:5000]

File ~/tensorflow-metal/lib/python3.8/site-packages/PIL/Image.py:2968, in open(fp, mode, formats)
   2965     filename = fp
   2967 if filename:
-> 2968     fp = builtins.open(filename, "rb")
   2969     exclusive_fp = True
   2971 try:

OSError: [Errno 24] Too many open files: 'photos/icnZ2R8PcDs.jpg'

% ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       11136
-n: file descriptors                8192
(base) davidlaxer@x86_64-apple-darwin13 notebooks %

How can we get probabilities for all clusters in transform function?

Currently we only get the predicted class through concept_model.transform()
Can we get the predicted probabilities for each cluster or the top n clusters?

Saving the model

Hi Maarten,

Thank you for this awesome package! I have an issue when wanting to save my concept model. I get the following error message:

PicklingError: Can't pickle <function _transform.<locals>.<lambda> at 0x000002837DC60550>: it's not found as sentence_transformers.models.CLIPModel._transform.<locals>.<lambda>

Any idea what might solve this issue?

Thanks for your help!

Question about the Function transform

Thank you for your excellent job-:) I have a question when i read the code about function transform
You say, given the images and image_embedding, and the return is Predictions:Concept predictions for each image
But when i read the code of transform, the output is not the concept prediction for each image.
can you explain it ?Thank you very much!

maartengr / concept Goto Github PK

concept's Introduction

Concept

Installation

Quick Start

Search Concepts

concept's People

Contributors

Stargazers

Watchers

Forkers

concept's Issues

Define seed words for topics

Sample documents (text data)

Initialize BERTopic model with seed_topic_list

Fit and transform documents to obtain topics and probabilities

Display the assigned topics for each document

Error

Recommend Projects

Recommend Topics

Recommend Org