Comments (5)
You may want to consider these libraries: pandas-profiling, great-expectations. They've already done a lot of these things and may make sense to use for exploratory data analysis
from deepchecks.
@aspfohl Thanks!
We are familiar with these libraries, and they are indeed relevant for exploring relevant outputs to add.
The implementation itself should be as a deepchecks check as we would want it to support the framework (be able to run the check as part of a suite, receive result value in json, export the outputs, add conditions, etc.)
Since the implementation logic is not very complex this is a good issue for contribution of a check to the package :)
from deepchecks.
@shir22
This summary of exploratory data is something we provide with our product Data Profiler. It would be possible to use the output of a profile as a check mechanism.
from deepchecks.
@JGSweets Looks cool.
What is the output format of a DataProfiler? (Html, text, json, ipywidgets, etc...)?
from deepchecks.
@JGSweets Looks cool. What is the output format of a DataProfiler? (Html, text, json, ipywidgets, etc...)?
Thanks! Currently, it outputs in JSON, but that's not to say it couldn't be expanded on in the future.
You are also have to save / load profilers for re-use / iterative profiling if your dataset changes / streams.
from deepchecks.
Related Issues (20)
- [BUG] add support for newer versions of transformers / optimum HOT 1
- Docker Image HOT 1
- [FEAT] Fixing Inconsistent Legend Colors for Train and Test Datasets in Train-Test-evaluation Charts HOT 2
- [BUG] Outdated Examples - COCOData does not exist HOT 1
- QST: why deepchecks use NumPy to storage the nlp text list,that can easily cause a memory overflow。 HOT 2
- [BUG] Incorrect Legends in FeatureDrift Check - DeepCheks v0.17.3 HOT 2
- Error adding custom scorers to SimpleModelComparison check
- [BUG] Creating a text data for classification task with all labels = none causes exception HOT 1
- [Docs] Documentation contains a mistake.
- [BUG] cannot import name 'is_datetime_or_timedelta_dtype' from 'pandas.core.dtypes.common' HOT 2
- The doc of new category train test is misleading [BUG]
- [OPTIMIZATION] function optimization for removing special chars from text.
- [BUG] data integrity suite passes when given a non-existing column to ignore HOT 2
- CVE-2023-24816 vulnerability in ipython package used by Deepchecks
- [FEAT] NLP property - sudden stop HOT 2
- [BUG] Scikit-learn 1.4.0 breaks _ProbaScorer
- [FEAT] Better GitHub markdown reporting with emojis for checkmark ✅ and cross ❌ HOT 1
- [FEAT] create a helm chart for Kubernetes deployment of Deepchecks Monitoring Open Source. HOT 2
- [BUG] MyModelWrapper is incorrectly interpreted as "Regressor" for classification metrics
- [FEAT] Speed up `import deepchecks` by making it lazier
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepchecks.