Comments (5)
First of all: Editable installs are possible, I use them all the time.
Ah, you're right! It seems that the last time I tried was quite a while ago...
What do you think? Should I submit a PR?
Yeah, sounds great! Would be a nice experience working with it in this project. It's not related to pyproject.toml specifically but at some point, I would love to have some sort of pip install bertopic[minimal]
that only contains the very minimal amount of dependencies (even removing HDBSCAN and UMAP). But that is a bit out-of-scope and would require major changes...
From llama3's point of view:
I generally tend to steer away from LLM-based opinions unless they are backed by expert-opinions. But thanks for sharing it!
from bertopic.
Thanks for the suggestion! What would you think is the main added benefit of doing so in the context of this package? Also, I remember it did not support editable installs a while back. Any idea if that's still the case?
from bertopic.
First of all: Editable installs are possible, I use them all the time.
I think the added benefits would be:
- easier definition of entrypoints, e.g. CLI
- ability to add additional tooling config in the same file, so the complete project configuration can live in one place
- easier definition of (nested) dependency groups
e.g.
dev = [
"bertopic[docs,test]", # <- no repeating of mkdocs, pytest etc
]
docs = [
"mkdocs==1.5.3",
"mkdocs-material==9.5.18",
"mkdocstrings-python==1.10.0",
"mkdocstrings==0.24.3",
]
test = [
"pytest>=5.4.3",
"pytest-cov>=2.6.1",
]
- ability to use Dependabot for getting scheduled dependency updates (also possible with requirements.txt, but not setup.py)
From llama3's point of view:
- Consistent configuration: pyproject.toml is a TOML file that can be used to configure your package, including its name, version, and dependencies. This provides a consistent way to manage package metadata.
- Improved readability: The TOML format is more readable than the Python code in setup.py. This makes it easier for developers to understand the configuration of their packages.
- Better support for complex configurations: pyproject.toml supports more complex configurations, such as conditional dependencies and custom build steps. This allows you to manage more complex package dependencies and builds.
- Integration with other tools: pyproject.toml is used by other tools, such as Poetry and pipenv, to manage Python packages. This provides a consistent way to manage packages across different development environments.
- Improved security: The use of TOML files for configuration reduces the risk of code injection attacks that can occur when using Python code in setup.py.
What do you think? Should I submit a PR?
from bertopic.
Alright, let me submit a PR then. I just "translate", what is given in the setup.py, modifications of dependency groups should be a separate endeavor, I think.
I found some of the points from llama3 valid, and just wanted to add them here for "completeness". For example I think the "readability" point is fair.
from bertopic.
Done via #1978
from bertopic.
Related Issues (20)
- c_tf_idf_.indptr is None when attempting to save merged model HOT 6
- Consider adding a linter HOT 1
- I'm trying to use zeroshot topic model,but i encounter an error,Can anyone please help me?TypeError: BERTopic.__init__() got an unexpected keyword argument 'zeroshot_topic_list' HOT 1
- Identical topics: some become outliers, some are assigned to their topic HOT 1
- TerminatedWorkerError:A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {EXIT(1)} HOT 3
- TypeError:'NoneType' object is not subscriptable while calling topic_model.hierarchical_topics HOT 5
- Can't update model name when use notebook HOT 4
- Scikit-learn's HDBSCAN Implementation
- Issues with visualizations on loaded models. HOT 1
- (Zero-shot Topic Modeling) TypeError: object of type 'numpy.float64' has no len() HOT 1
- Additional representations did not update with topic reduction HOT 5
- [Guided Topic Modeling] ValueError: setting an array element with a sequence. HOT 6
- Array mismatch when try to fit new data HOT 1
- Does Bertopic support custom keyword extractor? HOT 5
- Why do I lose Names assigned by zero-shot after applying outlier reduction? HOT 1
- get_topic() with KeyBERTInspired? HOT 1
- Mismatch between old OpenAI API in bertopic/backend/_openai and current OpenAI (v1.33.0) HOT 1
- OpenAI Embedding HOT 1
- Potential bug with the PartOfSpeech class due to lower case matching HOT 1
- Chosen represented Topic HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bertopic.