data-describe is a Python toolkit for Exploratory Data Analysis (EDA). It aims to accelerate data exploration and analysis by providing automated and polished analysis widgets.
For more examples of data-describe in action, see the Quick Start Tutorial.
data-describe implements the following basic features:
Feature | Description |
---|---|
Data Summary | Curated data summary |
Data Heatmap | Data variation and missingness heatmap |
Correlation Matrix | Correlation heatmaps with categorical support |
Distribution Plots | Generate histograms, violin plots, bar charts |
Scatterplots | Generate scatterplots and evaluate with scatterplot diagnostics |
Cluster Analysis | Automated clustering and plotting |
Feature Ranking | Evaluate feature importance using tree models |
data-describe is always looking to elevate the standard for Exploratory Data Analysis. Here are just a few that are implemented:
- Dimensionality Reduction Methods
- Sensitive Data (PII) Redaction
- Text Pre-processing / Topic Modeling
- Big Data Support
data-describe can be installed using pip:
pip install data-describe
import data_describe as dd
help(dd)
See the User Guide for more information.
data-describe is currently in beta status.
data-describe welcomes contributions from the community.