Code Monkey home page Code Monkey logo

eaglewarrior / machine-learning Goto Github PK

View Code? Open in Web Editor NEW
42.0 1.0 36.0 27.69 MB

This repository will contain all the stuffs required for beginners in ML and DL do follow and star this repo for regular updates

License: MIT License

Python 0.44% R 0.07% Jupyter Notebook 99.48% HTML 0.01%
machine-learning algorithm nlp-machine-learning prediction-model datascience linear-regression logisitic-regression polynomial-regression random-forest-regression decision-tree-regression

machine-learning's Introduction

Machine-learning

This repository will contain all the stuffs required for beginners in ML do follow and star this repo for regular updates

This repo contains data preprocessing steps need to be known by beginners

For every ML beginner python is recommended, this repo is full of ML python algorithms.

Contributing Guidelines for Hacktoberfest2022

  1. Star it

  2. Fork the repo

  3. Clone it onto your PC.

  4. Create a folder with your GitHub username

  5. Create separate files for all the issues you are solving and always open an issue which has all details of the process or method you will use to perform anomaly detection and wait till it is assigned (not more than 2-3 hours it will take, we are passionate open source developers )

  6. Open PRs for the issues you are solving. (You can open multiple PRs for different issues by branching).

  7. Make sure the data is only from the given category (No repetitions of same data )

    a. Healthcare -covid, heart attack, cancer, etc

    b. Finance -stocks etc

    c. Retail or CPG

    d. Image classifciation

    e. Time series

Only code like .py are not accepted please push proper jupyter (.ipynb) files with problem statement and solution analysis.

Python packages used : numpy,pandas,matploit,sklearn,statsmodels,keras,nltk,........continues will be added more.

Regression

  1. In linear regression we have used a dataset containing details of employee salary and years of experience using this model we can predict the salary of employee by years of experience.

  2. In multiple linear regression we have used dataset containing details of expenditure of startups and their profit using this model we can predict the profit of startup ,and also we have developed a model using backward elimination technique.

  3. In polynomial linear regression we have used dataset containing details of salary and years of experience ,this could be useful for HR dept. to detect the if the new joinee employee is giving right info about his/her salary.

  4. In SVR linear regression we have used dataset containing details of salary and years of experience ,this could be useful for HR dept. to detect the if the new joinee employee is giving right info about his/her salary.

  5. In Decision Tree regression we have used dataset containing details of salary and years of experience ,this could be useful for HR dept. to detect the if the new joinee employee is giving right info about his/her salary.

  6. In Random Forest regression we have used dataset containing details of salary and years of experience ,this could be useful for HR dept. to detect the if the new joinee employee is giving right info about his/her salary.this algorithm gives the best result better than polynomial regression.

Classification

  1. In logistic regression we have used a dataset containing details of salary ,age and product bought using this model we can predict whether the customer of certain age and salary will buy the product or not .

  2. In Knn regression we have used a dataset containing details of salary ,age and product bought using this model we can predict the whether the customer of certain age and salary will buy the product or not .

  3. In SVN regression we have used a dataset containing details of salary ,age and product bought using this model we can predict the whether the customer of certain age and salary will buy the product or not .

  4. In Random Forest regression we have used a dataset containing details of salary ,age and product bought using this model we can predict the whether the customer of certain age and salary will buy the product or not .

  5. In Decision Tree regression we have used a dataset containing details of salary ,age and product buyed using this model we can predict the whether the customer of certain age and salary will buy the product or not .

  6. In kernel SVM we have used a dataset containing details of salary ,age and product buyed using this model we can predict the whether the customer of certain age and salary will buy the product or not .Kernel SVM is mostly used for complicated dataset where data is not linearly separable.

  7. In Naives bayes one of the most imp classification algorithm here we have used a dataset containing details of salary ,age and product buyed using this model we can predict the whether the customer of certain age and salary will buy the product or not .Naive bayes works on bayes probability theorem , before getting into coding one have to understand how the formulae works for classifying which group the point belongs to.

Clustering

  1. In k-means clustering we have used a dataset containing details of gender ,age, score etc using this model we can predict which cluster the customer belongs to ,this is an example of customer segmentation.

  2. In Hierarchical_Clustering we have used the mall.csv where we cluster people according to their income & spending habits Here is an interesting concept of dendogram is introduced which is helpful for knowing how many clusters we need for segmentation.

Association Rule Learning

  1. Market Basket Analysis is a machine learning-based technique for identifying buying pattern from numerous retail transactions for helping the retailer in increasing the sales ,we use Apriori Algorithm which works like bayes rule approach to find relationships between products by customers.

  2. Analyzing Market basket using Eclat algorithm for identifying buying pattern from numerous retail transactions for helping the retailer in increasing the sales.

Dimensionality Reduction

In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Here in datasets we do it by the following algorithm

  1. PCA algorithm
  2. Kernel_pca
  3. LDA

Deep Learning

  1. Artificial Neural Network

Artificial neural networks or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Here we use a dataset describing the customers leaving or retaining in bank and how several factors are affecting the retain and exit of customers .Using ANN in python.

  1. Convolutional Neural Network

Convolutional neural networks. Sounds like a weird combination of biology and math with a little CS sprinkled in, but these networks have been some of the most influential innovations in the field of computer vision.Here we make a CNN which classifies or say identifies cat or dog images.

Natural Language Processing

Natural Language Processing (or NLP) is applying Machine Learning models to text and language. Teaching machines to understand what is said in spoken and written word is the focus of Natural Language Processing Here in this code I have done the following things:

Clean texts to prepare them for the Machine Learning models,
Create a Bag of Words model,
Apply Machine Learning models onto this Bag of Worlds model.

Model Selection

  1. XGBoost

XGboost is a very fast, scalable implementation of gradient boosting that has taken data science by storm, with models using XGBoost regularly winning many online data science competitions and used at scale across different industries

  1. K cross-fold validation

Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample.

The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-validation.

  1. Grid CV search

GridSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.

The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

Reinforcement Learning

Reinforcement Learning is a branch of Machine Learning, also called Online Learning. It is used to solve interacting problems where the data observed up to time t is considered to decide which action to take at time t + 1. It is also used for Artificial Intelligence when training machines to perform tasks such as walking. Desired outcomes provide the AI with reward, undesired with punishment. Machines learn through trial and error.

Upper Confidence Bound (UCB)
Thompson Sampling

Recommendation Systems

I have depicted two methods of building recommendation engine one with traditional methods and other with scalable algorithm using pyspark using a opensource book review dataset .

machine-learning's People

Contributors

adit2005 avatar amoghatsunil avatar anirbanmukherjeexd avatar banseedhar01 avatar dependabot[bot] avatar dvamsidhar2002 avatar eaglewarrior avatar gauriimaheshwarii avatar gitesh1209 avatar hakunamatata1997 avatar imshivamrai282 avatar jishu-yadav avatar koustubh-mane1 avatar lashuk1729 avatar naksh2004 avatar neerajap2001 avatar nexfreak07 avatar salman-shah2022 avatar shruti-2412 avatar srinijadharani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

machine-learning's Issues

Apriori Algorithm

I want to add a Jupyter Notebook containing an Introduction, explanation, EDA and model of the algorithm. Please assign me this issue.
@eaglewarrior

Implement different ML algorithms with open source data files

  1. Star it
  2. Fork the repo
  3. Clone it onto your PC.
  4. Create a folder with your GitHub username
  5. Create separate files for all the issues you are solving and always open an issue which has all details of the process or method you will use to perform anomaly detection and wait till it is assigned (not more than 2-3 hours it will take, we are passionate open source developers )
  6. Open PRs for the issues you are solving. (You can open multiple PRs for different issues by branching).

Note: Don't upload zip , please upload in a separate folder and we request to follow the given instructions in the repository

Implement different ML algorithms with open source data files

  1. Star it
  2. Fork the repo
  3. Clone it onto your PC.
  4. Create a folder with your GitHub username
  5. Create separate files for all the issues you are solving and always open an issue which has all details of the process or method you will use to perform anomaly detection and wait till it is assigned (not more than 2-3 hours it will take, we are passionate open source developers )
  6. Open PRs for the issues you are solving. (You can open multiple PRs for different issues by branching).

Note: Don't upload zip , please upload in a separate folder and we request to follow the given instructions in the repository

Implement different ML algorithms with open source data files

  1. Star it

  2. Fork the repo

  3. Clone it onto your PC.

  4. Create a folder with your GitHub username

  5. Create separate files for all the issues you are solving and always open an issue that has all details of the process or method you will use to perform anomaly detection and wait till it is assigned (not more than 2-3 hours it will take, we are passionate open source developers )

  6. Open PRs for the issues you are solving. (You can open multiple PRs for different issues by branching).

  7. Make sure the data is only from the given category (No repetitions of same data )

           a. Healthcare -covid, heart attack, cancer, etc
           
           b. Finance -stocks etc
           
           c. Retail or CPG
           
           d. Image classifciation
           
           e. Time series
    

Only code like .py are not accepted please push proper jupyter (.ipynb) files with problem statement and solution analysis.

Note: Don't upload zip , please upload in a separate folder and we request to follow the given instructions in the repository

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.