Code Monkey home page Code Monkey logo

awesome-data-science-models's Introduction

PyPI status PyPI license Downloads

PyPI version shields.io PyPI pyversions codecov

data ⎰ describe

data-describe is a Python toolkit for Exploratory Data Analysis (EDA). It aims to accelerate data exploration and analysis by providing automated and polished analysis widgets.

For more examples of data-describe in action, see the Quick Start Tutorial.

Main Features

data-describe implements the following basic features:

Feature Description
Data Summary Curated data summary
Data Heatmap Data variation and missingness heatmap
Correlation Matrix Correlation heatmaps with categorical support
Distribution Plots Generate histograms, violin plots, bar charts
Scatterplots Generate scatterplots and evaluate with scatterplot diagnostics
Cluster Analysis Automated clustering and plotting
Feature Ranking Evaluate feature importance using tree models

Extended Features

data-describe is always looking to elevate the standard for Exploratory Data Analysis. Here are just a few that are implemented:

  • Dimensionality Reduction Methods
  • Sensitive Data (PII) Redaction
  • Text Pre-processing / Topic Modeling
  • Big Data Support

Installation

data-describe can be installed using pip:

pip install data-describe

Getting Started

import data_describe as dd
help(dd)

See the User Guide for more information.

Project Status

data-describe is currently in beta status.

Contributing

data-describe welcomes contributions from the community.

awesome-data-science-models's People

Contributors

ahycourse avatar arivperumal19 avatar bipinkapri-git avatar bobbyjacob avatar brianray avatar dandawg avatar dvdjlaw avatar jothigovindasamy avatar pdavidsosanofi avatar ryanrusson avatar sachinsaxena021988 avatar sheth108 avatar soshel avatar stevenpais1 avatar sukanyasasmal avatar the-data-guy avatar truongc2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-data-science-models's Issues

AI Platform prediction hosting fails

(project name has been redacted)

Using endpoint [
https://us-central1-ml.googleapis.com/
]
Listed 0 items.
Using endpoint [
https://ml.googleapis.com/
]
ERROR: (gcloud.beta.ai-platform.models.create) Resource in projects [...] is the subject of a conflict: Field: model.name Error: A model with the same name already exists.
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: A model with the same name already exists.
    field: model.name
Traceback (most recent call last):
  File "host_xgboost.py", line 46, in <module>
    run()
  File "host_xgboost.py", line 35, in run
    subprocess.check_call([shutil.which('gcloud'),'beta', 'ai-platform','models','create',MODEL_NAME,'--regions','us-central1', "--enable-logging", "--enable-console-logging"])
  File "/usr/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcloud', 'beta', 'ai-platform', 'models', 'create', 'xgboost_FT', '--regions', 'us-central1', '--enable-logging', '--enable-console-logging']' returned non-zero exit status 1

move all data to the Maven Wave GCS Buckets

All the data should be pulling from the MW Public GCP Buckets:

But the example don't seem to pull from these.

Note for MW folks: this is https://console.cloud.google.com/storage/browser/amazing-public-data;tab=objects?forceOnBucketsSortingFiltering=false&project=data-describe&prefix=&forceOnObjectsSortingFiltering=false

For example:

Prep for workshop

Prep the following.

Give overview for each, create readme.md with an overview of the problem, description of the data, and the problem statement. Where possible make references to the GCP tool set. Possibly make a sub directory of each with GCP as the name to create GCP specific training.

  • Census
  • Beatles
  • Lending Club

Make sure each one has filled out the following:

  • EDA Census

  • EDA Beatles

  • EDA Lending Club

  • Exercises Census

  • Exercises Beatles

  • Exercises Lending Club

  • Answers Census

  • Answers Beatles

  • Answers Lending Club

  • MLOps Prod tutorial for Census

  • MLOps Prod tutorial for Beatles

  • MLOps Prod tutorial Lending Club

  • Add a new exercise called AI/ML Use discovery worksheet

IoT Condition Monitoring - Paderborn Bearing Dataset

The aim of this issue is to add more IoT related use cases to awesome data science models repository. The use case we have selected for this activity is experimental bearing data sets for condition monitoring based on vibration and motor current signals.

The goal is to move from condition monitoring to predictive maintenance. Modeling approaches to consider include Predict Remaining Useful LifeCycle of bearing so that corrective and timely maintenance measures can be taken. This will help avoid unplanned machine downtime.

Many plants faces unplanned downtime and it assets fail over the period of time and plants should have the ability to predict the life remaining. With the help of data science, the first step is to build an ML tool to analyze the condition or health of machine assets.

Our dataset of Interest(Paderborn Bearing Dataset) consists the following details:

  1. The Dataset contains the motor currents and vibration signals with additional measurements of torque, speed, load and temperature
  2. There are 26 Damaged bearing and 6 healthy bearing state in the data
  3. All the data is collected in 4 different operating conditions.

Dataset - https://mb.uni-paderborn.de/en/kat/main-research/datacenter/bearing-datacenter/data-sets-and-download

With help of Data Science techniques the aim is to analyze the condition of assets, identify features which contribute to failure and build effective ML models for RUL prediction

Google AI Platform Divergence

Ending with commit 7d748cf, additional work on these demos will begin to be targeted towards Google Vertex AI vs the traditional Google AI Platform. AI Platform is still available but expected to be deprecated sometime in the future.

chicago-taxi

Rework the current chicago-taxi solution. There are dependency conflicts with the current implementation on AI Platform. May need to upgrade the runtime, keras version, and deployment strategy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.