Comments (7)
Can you try installing BERTopic from its main branch? I believe a fix for this can be found there.
from bertopic.
Can you try installing BERTopic from its main branch? I believe a fix for this can be found there
Error is still there, I have clone the master branch
from bertopic.
Could you share the full code and error message after cloning and installing the branch?
from bertopic.
def initialize_representation_models():
keybert_model = KeyBERTInspired()
openai_model = setup_openai_client()
return {
"KeyBERT": keybert_model,
"OpenAI": openai_model,
}
def setup_openai_client():
client = AzureOpenAI(
api_key=Params.openai_key,
api_version=Params.openai_version,
azure_endpoint= Params.openai_endpoint
)
prompt=bert_topic_label_prompt
return OpenAI(client, model=Params.openai_deployment_gpt3, chat=True, prompt=prompt, delay_in_seconds=0.3, diversity=0.2) #exponential_backoff=True,
def fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_model): #embedding_model,
topic_model = BERTopic(
embedding_model=embedding_model,
umap_model=umap_model,
hdbscan_model=hdbscan_model,
vectorizer_model=vectorizer_model,
representation_model=representation_model,
top_n_words=20,
verbose=True
)
topics, probs = topic_model.fit_transform(sentences, embeddings)
topics_df = topic_model.get_topic_info()
print(f"Number of unique topics found: {len(set(topics))}")
return topics_df
2024-04-10 12:19:42,781 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-04-10 12:20:36,721 - BERTopic - Dimensionality - Completed ✓
2024-04-10 12:20:36,721 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-04-10 12:20:39,694 - BERTopic - Cluster - Completed ✓
2024-04-10 12:20:39,700 - BERTopic - Representation - Extracting topics from clusters using representation models.
77%|██████████████████████████████████████████████████████████████████████████████████████████████████▎ | 151/195 [01:16<00:22, 1.97it/s]
Traceback (most recent call last):
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 94, in <module>
main(args.project_id, args.n_reviews, args.category)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 23, in main
auto_topics = final_auto_topics(project_id=project_id, n_reviews=n_reviews)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 101, in final_auto_topics
topics_df = fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_models)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 72, in fit_bertopic_model
topics, probs = topic_model.fit_transform(sentences, embeddings)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 433, in fit_transform
self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 3782, in _extract_topics
self.topic_representations_ = self._extract_words_per_topic(words, documents)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 4083, in _extract_words_per_topic
self.topic_aspects_[aspect] = aspect_model.extract_topics(self, documents, c_tf_idf, aspects)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\representation\_openai.py", line 223, in extract_topics
label = response.choices[0].message.content.strip().replace("topic: ", "")
AttributeError: 'NoneType' object has no attribute 'strip'
(dictionary) PS C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code>
from bertopic.
Do you have any suggestion?
from bertopic.
I'm actually not sure what is happening here. I believe OpenAI should give back at least some value, especially when you check for it. It might be that OpenAI has some additional filters and does not accept certain input/output if it doesn't adhere to their guidelines.
One other thing that I can think of is that their API changed a while ago. Are you using the latest version of their package?
from bertopic.
Yes, I have implemented a custom solution and the problem is policy violation.
ERROR MESSAGE: Value Error 'Azure has not provided the response due to a content filter being triggered'
However, in bertopic I just don't get the error message it returns None and execution stops.
I will use my custom implementation to catch these errors, but I think many people may have this issue in the future
Thank you for your reply!
from bertopic.
Related Issues (20)
- Getting probabilities for all topics given a document from loaded model HOT 1
- Issues with Zero-shot Topic Modeling regarding outliers and future operations HOT 3
- Switch from setup.py to pyproject.toml HOT 4
- Seed Words
- random openai issue with plain bertopic use HOT 18
- Nan Representative Docs when loading a serialized model HOT 1
- ModuleNotFoundError: Can't use LangChain with version 0.16.0 HOT 1
- Should raise an Exception when tokenizer is not defined HOT 1
- Handle Responsible AI scenarios for OpenAI HOT 2
- Warn when automatically choosing SklearnEmbedder backend HOT 3
- PartOfSpeech representation reproducibility and word with index 0 HOT 2
- Zero-Shot HOT 2
- Supervised topic model generating different topics to training data HOT 3
- Where is the full data set of embeddings? HOT 3
- Visualization in html page HOT 1
- Guided Modeling: Problem with seed_topic_list HOT 2
- Utilizing the GPU of MacBook Pro M3 to accelerate the process of fit_transform HOT 1
- Could we know the weights of each topic? HOT 6
- Can't reproduce same results when using cuml version of UMAP and HDBSCAN HOT 3
- approximate_distribution returns only 0s HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bertopic.