Comments (2)
Thanks for sharing this issue! Just to be sure I understand correctly, you want to implement a way to handle most finish_reason
as a way for the user to understand why the output was empty/truncated/missing, right?
Sounds great! Additional information during inference would be more than welcome considering users have struggled with missing output in the past.
Handle all four CompletionChoice.finish_reasons
Open question: How do you want to handle the length and function_call and null finish_reasons
I don't think at the moment we need to do anything with function_call
since BERTopic does not make use of it and the models generally follow instructions quite well.
I believe length
can be logged similar to content_filter
, since both do something with the output. Here, we can mention to use the truncation options available in BERTopic to prevent these issues. With length
, we might need to add "incomplete output due to..." or something similar to the label if it happens to be truncated. If it is empty, we can either leave it empty and log it or create a label that says something like "incomple output due to...".
It seems that null
should only happen when you try to access the API when it is already running which should generally not happen unless the user was already running some process right? Having said that, logging it like length
and content_filter
seems like a possible solution.
Log repr_doc_ids which caused sensitive responses
Agreed, and since BERTopic does not pass all documents to the API I would not expect excessive logging.
All in all, sounds good. Looking forward to this!
from bertopic.
Thanks for sharing this issue! Just to be sure I understand correctly, you want to implement a way to handle most finish_reason as a way for the user to understand why the output was empty/truncated/missing, right?
Yes, that is correct.
@MaartenGr, thanks for your feedback
My plan for finish_reason
modifications will be as follows:
- replace current check for content with a condition on
stop
in the successful case - log
repr_doc_ids
and finish reason forcontent_filter
andlength
- no change for
null
andfunction_call
cases
I'll have a draft shortly to collect your feedback.
On the code design front, I am tempted to extract a function so I can add some unit tests around this handling logic. What are your thoughts on that?
from bertopic.
Related Issues (20)
- Supervised topic model generating different topics to training data HOT 3
- Where is the full data set of embeddings? HOT 3
- Visualization in html page HOT 1
- Guided Modeling: Problem with seed_topic_list HOT 2
- Utilizing the GPU of MacBook Pro M3 to accelerate the process of fit_transform HOT 1
- Could we know the weights of each topic? HOT 6
- Can't reproduce same results when using cuml version of UMAP and HDBSCAN HOT 3
- approximate_distribution returns only 0s HOT 5
- Feature (Watsonx): representations using Llama-3-70b and Mixtral-8x7b HOT 1
- Which hyper parameter mostly influence the number of topics for Chinese texts? HOT 3
- Zero-Shot Topic Modelling and Topics Over Time HOT 1
- Loading of saved model returns Error: "This BERTopic instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."
- Creating representations using IBM Watsonx LLMs HOT 5
- c_tf_idf_ is None when using zero shot topic modeling. HOT 1
- Issue with Scikit-learn 1.5.0
- Error at Combining clustered topics with the zeroshot model HOT 2
- Compare LDA, NMF, LSA with BERTopic (w/ embedding: all-MiniLM-L6-v2 + dim_red: UMAP + cluster: HDBSCAN) HOT 1
- AttributeError: 'BertModel' object has no attribute 'attn_implementation' #30965 HOT 3
- Zeroshot Topic Modeling With no Embedding Model HOT 1
- Extending ".visulize_document_datamap" with "label_over_points"-flag HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bertopic.