Code Monkey home page Code Monkey logo

machinelearningmodels's Introduction

Hey this is starting repos of ML with basic datasets
Added Data Preprocess for the text and standard dataset set
Added The Regression Folder into the Repository
From Today it will contain outputs with graphs

Data Preprocessing

The subject matter in Machine Learning is Data preprocessing
It is said that 80% is data preprocessing from the 100% of data science.
So I introduce You two types of data preprocessing in numbers

  • Categorical Variable

    There are mainly 2 types
    • Ordinal
    • Numeral

Ordinal

  • They are the one's which can be compared or ordered
    Say for example we can order grade A to grade B i.e.
    Garde A > Grade B or say marks in grade A is > grade B.
    They are handled by simply mapping ranking

Nominal

  • This is the one's which can not be compared or at same level
    best example gender we cannot compare gender
    They are to be one hot encoded i.e. say male => [1, 0]
    Female => [0, 1] as per this [Male, Female] is the data

Regression

In Statistics, Regression is defined as the method of
Obtaining co realtions or a mapping such that F(x) ~= Y
i.e. an estimate of the general population.

But let's see or look it with a simple human prespective
Not the stastical one, Let's say you are in a party
And rather interesting game pops up i.e. you need to guess
The number of balls in jar without counting, opening or anyway
touching the jar (Closest number wins. And Winner get's a good gift.
You need that gift

So you think of a way to see, And guess a number, Now the way
To do this is Take note of the previous guess and perform an estimate
with the mean of the people who are coming back with wrong answer
This is basically sandwiching towards the right direction and make a guess.
The activity you just perform let you as a winner Why? Cause of stats
This activity is estimation, and that is what we do in regression
Like the jar winning method, we do is take the guess from avilable information
Calculate or guess a number find how much far we are and then see the next person(number)
with the closest and guess again until you reach the lowest error or estimate

NLP - Text Classification

It is field of AI in which we try to process natural language (language we speak -hindi , english etc) and draw neccessay insights from it .Just like we do with it .

Preprocessing

As computer doesnot understand natural language , we will encode it into numbers. We will do it with the help of tokenizer .The advance form of tokenizer is countervector which encode text into numbers and convert it into vectors.

Preprocessing also includes removing stopwords (e.g ['here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each']) and punctuation (e.g '!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~'). As both of them do not have importance in drawing insights from text .

Further if we want to know which have importance or not we can use TFIDF (A short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.) library of python .

Classifier

There are many classifier . But the ones which are frequently used are-

1. Naive bayes classifier

Naive Bayes is a family of statistical algorithms we can make use of when doing text classification. One of the members of that family is Multinomial Naive Bayes (MNB). One of its main advantages is that you can get really good results when data available is not much (~ a couple of thousand tagged samples) and computational resources are scarce.

All you need to know is that Naive Bayes is based on Bayes’s Theorem,

2. SVM Classifier

Support Vector Machines (SVM) is just one out of many algorithms we can choose from when doing text classification. Like naive bayes, SVM doesn’t need much training data to start providing accurate results. Although it needs more computational resources than Naive Bayes, SVM can achieve more accurate results.

3. Deep Learning

Deep learning is a set of algorithms and techniques inspired by how the human brain works. The two main deep learning architectures used in text classification are Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).

  • On the one hand, deep learning algorithms require much more training data than traditional machine learning algorithms, i.e. at least millions of tagged examples. On the other hand, traditional machine learning algorithms such as SVM and NB reach a certain threshold where adding more training data doesn’t improve their accuracy. In contrast, deep learning classifiers continue to get better the more data you feed them with.
  • Deep learning algorithms such as Word2Vec or GloVe are also used in order to obtain better vector representations for words and improve the accuracy of classifiers trained with traditional machine learning algorithms(transfered learning)

machinelearningmodels's People

Contributors

astha-bhushan avatar darknez07 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.