Code Monkey home page Code Monkey logo

dataminingml's Introduction

ML Analysis of Stellar Classification & GWP Datasets

This repository contains code and notebooks for performing machine learning analysis on various datasets. The analysis is divided into different tasks, each focusing on specific aspects of data processing, model building, evaluation, and visualization.

Datasets

The analysis in this repository utilizes the following kaggle datasets:

Productivity Prediction of Garment Employees (GWP) Dataset

The Productivity Prediction of Garment Employees dataset provides information about the garment manufacturing process and the productivity of employees. The dataset includes manually collected attributes and has been validated by industry experts.

Stellar Classification Dataset (Star) Dataset

The Stellar Classification Dataset is obtained from the Sloan Digital Sky Survey (SDSS). It consists of 100,000 observations of space, including stars, galaxies, and quasars. Each observation is described by 17 feature columns and a class column indicating the object type.

Contents

The repository consists of the following files and directories:

  • datasets/: Directory containing the datasets used in the analysis.
  • task_3_1.ipynb: Notebook for Task 3.1, which involves data preprocessing and feature selection.
  • task_3_2.ipynb: Notebook for Task 3.2, which focuses on building and training machine learning models.
  • task_3_3.ipynb: Notebook for Task 3.3, which utilizes hypothesis testing to compare model performance.
  • task_3_4.ipynb: Notebook for Task 3.4, which explores clustering techniques and dimensionality reduction.
  • bamboo/: Directory containing modules specific to the analysis pipeline.
    • analysis.py: Functions for plotting, visualizing and analyzing data.
    • clustering.py: Functions for working with clusters and cluster evaluation with Sklearn.
    • model.py: ModelManager class for managing and storing machine learning models.
    • processing.py: Contains functions for preprocessing different types of data in datasets.
    • selection.py: Contains functions for preparing processed dataset for training & testing.
    • gwp_pipeline.py: Contains the constants and functions used specifically for processing the GWP dataset.
    • star_pipeline.py: Contains the constants and functions used specifically for processing the Star dataset.

Modules Used

The analysis in this repository relies on the following libraries:

  • NumPy: A library for numerical computing in Python.
  • SciPy: A library for scientific computing in Python.
  • scikit-learn (sklearn): A machine learning library for Python, providing tools for data preprocessing, modeling, and evaluation.
  • Matplotlib: A plotting library for creating visualizations in Python.
  • Seaborn: A data visualization library based on Matplotlib, providing enhanced visualizations and statistical graphics.

dataminingml's People

Contributors

tnicko avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.