Code Monkey home page Code Monkey logo

grip-tasks's Introduction

GRIP-Tasks

Graduate Rotational Internship Program @The Sparks Foundation


These tasks are done under the internship program as a part of July 2021 batch by Marisha Bhatti. The projects mainly come under the domain of Data Science & Business Analytics.

  • Task 1: Prediction using Supervised ML.
    • Simple Linear Regression Task. Predict the percentage of an student based on the number of study hours.
    • Dataset can be seen and used from http://bit.ly/w-data.
    • Mean Absolute Error: 4.183859899002982
    • R2 Score: 0.9454906892105354

  • Task 2: Prediction using Unsupervised ML.
    • K-Means Clustering Task. Predict the optimum number of clusters and represent it visually.
    • Dataset used is Iris.csv (can also be imported from sklearn.datasets).
    • This notebook has two parts. The first part uses a K-Nearest Neighbors model to perform a simple multi-classification task (Step 1 - 6). The second part tackles the unsupervised machine learning problem using K-Means Clustering model (Step 7 - 9).
    • The KNN model has an accuracy of 0.9736842105263158
    • From the K-Means model we find that the optimum number of clusters is 3.

  • Task 3: Exploratory Data Analysis - Retail
    • Finding out the weak areas where more profit can be made.
    • Dataset used is SampleSuperstore.csv (to view the dataset in github select view raw).
    • Category-wise:
      • Highest profit: Furniture
      • Lowest profit: Technology
      • Maximum Sales in Category: Technology
    • Sub-Category-wise:
      • Highest Profit: Copiers
      • Lowest Profite: Tables
      • Top 3 High Discount Products: Binders, Machines, Tables
    • State-wise:
      • Average Number of Deals per state is 203.9591836734694
      • Highest Profit: Vermont
      • Lowest Profit: Ohio
      • Highest amount of Sales: Wyoming
    • City-wise:
      • Highest Profit: Jamestown
      • Lowest Profit: Bethlehem

  • Task 4: Exploratory Data Analysis - Terrorism
    • Finding out the hot zone of terrorism.
    • The dataset can be downloaded from https://www.kaggle.com/START-UMD/gtd.
    • Middle East & North Africa has the most terrorist attacks. South Asia has second most terrorist attacks.
    • Iraq has the most terrorist attacks in middle east. Pakistan, Afghanistan and India are in the Top 3 in South Asia.
    • Iraq, Pakistan and Afghanistan are the Top 3 countries with most terrorist attacks.
    • In Eastern Europe, Middle East, South asia, Southeast Asia and subsaharan Africa there has been a huge increase in terrorist attacks whereas other regions have seen a decrease since 2001.

  • Task 5: Exploratory Data Analysis - Sports
    • Finding out the most successful teams, players and factors contributing win or loss of a team.
    • Datasets used are matches.csv and deliveries.csv.
    • Mumbai Indians, Chennai Super Kings, Kolkata Knight Riders are top three teams with most wins.
    • Top 3 Players based on Player of the Match Awards: Chris Gayle, AB de Villiers, MS Dhoni.
    • Top Batsmen: Virat Kohli, SK Raina, Rohit Sharma.
    • Top Bowlers: TG Southee, AD Mathews, SK Raina.

  • Task 6: Prediction using Decision Tree Algorithm.
    • Decision Tree Classifier Task. The classifier would be able to predict the right class given any new data.
    • Dataset used is Iris.csv (can also be imported from sklearn.datasets).
    • Mean of Cross Validation Score: 0.9466666666666667
    • Standard Deviation of Cross Validation Score: 0.04521553322083511

  • Task 7: Stock Market Prediction using Numerical and Textual analysis.
    • Create a hybrid model for stock price/performance prediction using numerical analysis of historical stock prices, and sentimental analysis of news headlines.
    • Historical stock prices dataset can be downloaded from finance.yahoo.com or use SENSEX.csv file in the repository for numerical analysis. Textual (news) data can be downloaded from https://bit.ly/36fFPI6 or https://www.kaggle.com/therohk/india-headlines-news-dataset?select=india-news-headlines.csv.
    • Mean Absolute Error: 0.5019762845849802
    • Mean Squared Error: 0.5019762845849802
    • Root Mean Squared Error: 0.7085028472666713

grip-tasks's People

Contributors

marisha18 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.