Code Monkey home page Code Monkey logo

complete-life-cycle-of-a-data-science-project's Introduction

Complete-Life-Cycle-of-a-Data-Science-Project

CREDITS:All corresponding resources

MOTIVATION:Motivation to create this repository to help upcoming aspirants and help to others in the data science field

Business understanding

1.Data collection

Data consists of 3 kinds

a.Structure data (tabular data,etc...)

b.Unstructured data (images,text,audio,etc...)

c.semi structured data (XML,JSON,etc...)

variable

a.qualitative (nominal,ordinal,binary)

b.quantitative(discrete,continuous)

a.Web scraping best article to refer-https://towardsdatascience.com/choose-the-best-python-web-scraping-library-for-your-application-91a68bc81c4f

https://www.analyticsvidhya.com/blog/2019/10/web-scraping-hands-on-introduction-python/?utm_source=linkedin&utm_medium=KJ|link|weekend-blogs|blogs|44087|0.875

https://www.bigdatanews.datasciencecentral.com/profiles/blogs/top-30-free-web-scraping-software

https://medium.com/analytics-vidhya/master-web-scraping-completly-from-zero-to-hero-38051423256b

1.Beautifulsoup

2.Scrapy

3.Selenium

4.Request to access data 

5.AUTOSCRAPER - https://github.com/alirezamika/autoscraper

6.Twitter scraping tool (๐š๐š ๐š’๐š—๐š or tweepy)-https://github.com/twintproject/twint

  https://analyticsindiamag.com/complete-tutorial-on-twint-twitter-scraping-without-twitters-api/
  
  Scraping Instagram -instaloader

7.urllib

8.pattern

9.Octoparse Easy Web Scraping  https://www.octoparse.com/ 

 ParseHub https://www.parsehub.com/  https://analyticsindiamag.com/parsehub-no-code-gui-based-web-scraping-tool/
 
 Diffbot  https://analyticsindiamag.com/diffbot/
 
 Trustpilot

b.Web Crawling

b.3rd party API'S

c.creating own data (manual collection eg:google docx,servey,etc...) primary data

d.Databases

Databases are 2 kind sequel and no sequel database

sql,sql lite,mysql,mongodb,hadoop,elastic search,cassendra,amazon s3,hive,googlebigtable,AWS DynamoDB,HBase,oracle db

e.Online resources - ultimate resource https://datasetsearch.research.google.com/

1)kaggle-https://www.kaggle.com/datasets , ๐š™๐š’๐š™ ๐š’๐š—๐šœ๐š๐šŠ๐š•๐š• ๐š”๐šŠ๐š๐š๐š•๐šŽ๐š๐šŠ๐š๐šŠ๐šœ๐šŽ๐š๐šœ

2)movielens-https://grouplens.org/datasets/movielens/latest/

3)data.gov-https://data.gov.in/

4)uci-https://archive.ics.uci.edu/ml/datasets.php

5)Group Lens dataset https://grouplens.org/

6)world3bank  https://data.world/ , worldbank

7)Google Cloud BigQuery public datasets

  Google Public Datasets-cloud.google.com/bigquery/public-data/
  
  Google Cloud Data Catalog  https://cloud.google.com/data-catalog
  
  Academic Torrents-https://academictorrents.com/check.htm?returnto=%2Fbrowse.php

8)online hacktons

9)image data from google_images_download

https://www.visualdata.io/discovery

http://xviewdataset.org/#dataset

https://ai.googleblog.com/2016/09/introducing-open-images-dataset.html

10)image data from Bing_Search

11)https://www.columnfivemedia.com/100-best-free-data-sources-infographic

12)Reddit:https://lnkd.in/dv5UCD4

13)https://datasets.bifrost.ai/?ref=producthunt

14)data.world:https://lnkd.in/gEK897K

15)https://data.world/datasets/open-data

   https://tinyletter.com/data-is-plural

16)FiveThirtyEight :-  https://lnkd.in/gyh-HDj , https://data.fivethirtyeight.com/

17)BuzzFeed :- https://lnkd.in/gzPWyHj

   Buzzfeed News -github.com/BuzzFeedNews
   
   Socrata - https://opendata.socrata.com/

18)Google public datasets :- https://lnkd.in/g5dH8qE

19)Quandl :- https://www.quandl.com  stock data

   statista : https://www.statista.com/ stock data

20)socorateopendata :- https://lnkd.in/gea7JMz

21)AcedemicTorrents :- https://lnkd.in/g-Ur9Xy

22)labelimage:- https://github.com/wkentaro/labelme  ,  https://github.com/tzutalin/labelImg 

Labelbox-https://labelbox.com/

Playment-https://playment.io/

SuperAnnotate -https://www.superannotate.com/

CVAT-https://github.com/openvinotoolkit/cvat

Lionbridge- https://lionbridge.ai/

LinkedAI: A No-code Data Annotations- https://analyticsindiamag.com/linkedai/

Dataturks

V7 Darwin The Rapid Image Annotator   https://docs.v7labs.com/docs/loading-a-dataset-in-python   https://github.com/v7labs/darwin-py#usage-as-a-python-library

https://github.com/heartexlabs/awesome-data-labeling

23)tensorflow_datasets as tfds  https://www.tensorflow.org/datasets  (import tensorflow_datasets as tfds)

24)https://datasets.bifrost.ai/?ref=producthunt

25)https://ourworldindata.org/

26)https://data.worldbank.org/

27)google open images:https://storage.googleapis.com/openimages/web/download.html

28)https://data.gov.in/

29)imagenet dataset-http://www.image-net.org/

30)https://parulpandey.com/2020/08/09/getting-datasets-for-data-analysis-tasks%e2%80%8a-%e2%80%8aadvanced-google-search/

31)https://storage.googleapis.com/openimages/web/index.html  , 

   https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=segmentation&r=false&c=%2Fm%2F09qck
   
   https://console.cloud.google.com/marketplace/browse?filter=solution-type:dataset&_ga=2.35328417.1459465882.1589693499-869920574.1589693499
   
   https://catalog.data.gov/dataset?groups=education2168#topic=education_navigation
   
   https://vincentarelbundock.github.io/Rdatasets/datasets.html
 
32)coco dataset https://cocodataset.org/#explore

33)huggingface datasets-https://github.com/huggingface/datasets  https://huggingface.co/datasets

34)Big Bad NLP Database-https://datasets.quantumstat.com/

https://github.com/niderhoff/nlp-datasets

35)https://www.edureka.co/blog/25-best-free-datasets-machine-learning/

36)bigquery public dataset ,Google Public Data Explorer

37)inbuilt library data eg:iris dataset,mnist dataset,etc...

tf.data.Datasets for TensorFlow Datasets 

38)data.gov.be ,data.egov.bg/ ,data.gov.cz/english ,portal.opendata.dk,govdata.de,opendata.riik.ee,data.gov.ie,data.gov.gr,datos.gob.es,data.gouv.fr,data.gov.hr

dati.gov.it,data.gov.cy,opendata.gov.lt,data.gov.lv,data.public.lu,data.gov.mt,data.overheid.nl,data.gv.at,danepubliczne.gov.pl,dados.gov.pt,data.gov.ro,podatki.gov.si

data.gov.sk,avoindata.fi,oppnadata.se,https://data.adb.org/ ,https://data.iadb.org/ ,https://www.weforum.org/agenda/2018/03/latin-america-smart-cities-big-data/

https://data.fivethirtyeight.com/ , https://wiki.dbpedia.org/ ,https://www.europeandataportal.eu/en ,https://data.europa.eu/ ,https://www.census.gov/,

https://www.who.int/data/gho ,https://data.unicef.org/open-data/ ,http://data.un.org/ ,https://data.oecd.org/ ,https://data.worldbank.org/  

39.Awesome Public Dataset- https://github.com/awesomedata/awesome-public-datasets

40.Datasets for Machine Learning on Graphs-https://ogb.stanford.edu/

41.Big Bad NLP Database-https://datasets.quantumstat.com/

42.30 largest tensorflow datasets-https://lionbridge.ai/datasets/tensorflow-datasets-machine-learning/

43. coco dataset-https://cocodataset.org/#home

Google Open images-https://opensource.google/projects/open-images-dataset  https://storage.googleapis.com/openimages/web/index.html

50+ Object Detection Datasets-https://medium.com/towards-artificial-intelligence/50-object-detection-datasets-from-different-industry-domains-1a53342ae13d

   70+ Image Classification Datasets from different Industry domains-https://medium.com/towards-artificial-intelligence/70-image-classification-datasets-from-different-industry-domains-part-2-cd1af6e48eda
   
tensorflow_datasets.object_detection - https://storage.googleapis.com/openimages/web/index.html

https://github.com/google-research-datasets/Objectron/  https://ai.googleblog.com/2020/11/announcing-objectron-dataset.html?m=1 

http://idd.insaan.iiit.ac.in/   http://database.mmsp-kn.de/koniq-10k-database.html

https://ai.googleblog.com/2020/11/announcing-objectron-dataset.html

ImageNet data -http://image-net.org/

ApolloScape Dataset-http://apolloscape.auto/

44.https://github.com/fivethirtyeight/data

45.Recommender Systems Datasets-https://cseweb.ucsd.edu/~jmcauley/datasets.html

46.indiadataportal-https://indiadataportal.com/

47.US Government Open Dataset: https://www.data.gov/

https://censusreporter.org/   https://data.census.gov/cedsci/

48.AWS Public Data Sets:https://registry.opendata.aws/    https://aws.amazon.com/opendata/?wwps-cards.sort-by=item.additionalFields.sortDate&wwps-cards.sort-order=desc

49.https://the-eye.eu/public/AI/pile_preliminary_components/

  Reddit -https://www.reddit.com/r/datasets/
  
  wikipedia-https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
  
  http://opendata.cern.ch/  ,  https://www.imf.org/en/Data
  
  Global Health Observatory data repository-https://apps.who.int/gho/data/node.main
  
  CERN Open Data Portal-http://opendata.cern.ch/
  
50.openblender- https://www.openblender.io/#/welcome

51.Top 10 Datasets For Cybersecurity Projects- https://analyticsindiamag.com/top-10-datasets-for-cybersecurity-projects/

52.Datasets from Web Crawl Data (nlp)-http://data.statmt.org/cc-100/

53.https://www.springboard.com/blog/free-public-data-sets-data-science-project/

54.NASA - https://nasa.github.io/data-nasa-gov-frontpage/ace 

55.Academic Torrents,GitHub Datasets,CERN Open Data Portal,Global Health Observatory Data Repository

56.32 Data Sets to Uplift your Skills in Data Science-https://blog.datasciencedojo.com/data-sets-data-science-skills/?utm_content=144243072&utm_medium=social&utm_source=linkedin&hss_channel=lcp-3740012

57.OpenDaL-https://opendatalibrary.com/

Data Is Plural-https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFgqIY8fQFMemwKL2c64vk/edit#gid=0

VisualData-https://www.visualdata.io/discovery

https://medium.com/towards-artificial-intelligence/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f
 
58.Pandas Data Reader-https://pandas-datareader.readthedocs.io/en/latest/remote_data.html

59.ieee-dataport-https://ieee-dataport.org/datasets

https://medium.com/towards-artificial-intelligence/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f

https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/data/datasets.md#datasets-and-sources-of-raw-data

60.Faker is a Python package that generates fake data for you-https://github.com/joke2k/faker

61.Text Data Annotator Tool - Datasaur  https://datasaur.ai/

62.Google Analytics cost data import https://segmentstream.com/google-analytics?utm_source=twitter&utm_medium=cpc&utm_campaign=ga_costs_import_en&utm_content=guide

63.https://lionbridge.ai/services/crowdsourcing/    https://lionbridge.ai/     https://www.clickworker.com/  https://appen.com/  https://www.globalme.net/

64.Azure Open Datasets https://azure.microsoft.com/en-us/services/open-datasets/       https://azure.microsoft.com/en-in/services/open-datasets/catalog/
  
Yelp Open Dataset  https://www.yelp.com/dataset

https://data.world/

ODK Open Data Kit- https://getodk.org/

World Bank Open Data https://data.worldbank.org/

https://analyticsindiamag.com/10-biggest-data-breaches-that-made-headlines-in-2020/

https://data.mendeley.com/

https://github.com/iamtekson/geospatial-data-download-sites

https://eugeneyan.com/writing/data-discovery-platforms/

65.https://medium.com/towards-artificial-intelligence/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f

https://github.com/MTG/freesound-datasets

2.Feature engineering

Data cleaning-Pyjanitor-https://analyticsindiamag.com/beginners-guide-to-pyjanitor-a-python-tool-for-data-cleaning/

Remove duplicate data in dataset

a.Handle missing value

 Types of missing value 
 
 i.missing completely at random(no correlation b/w missing and observed data) we can delete no disturbance of data distribution
 
 ii.missing at random (randomness in missing data, missing value have correlation by data) we can't delete because disturbance of data distribution
 
 iii.missing not at random  (there is reason for missing value and directly related to value)

 1.if missing data too small then delete it a.row deletion b.column deletion c.pairwise deletion
 
 2.replace by statistical method mean(influenced by outiler),median(not influenced by outiler),mode
 
 3.apply classifier algorithm to predict missing value
 
 4.knn imputer
 
 5.apply unsupervised 
 
 6.Random Sample Imputation
 
 7.Adding a variable to capture NAN(missing term)
 
 8.Arbitrary Value Imputation
 
 9.hot deck Imputation
 
 10.regression Imputation
 
 11.End of Distribution Imputation
 
 12.Arbitrary Value Imputation
 
 13.Frequent Category Imputation
 
 14.MICE Imputation
 
 https://stefvanbuuren.name/fimd/want-the-hardcopy.html

b.Handle imbalance

 1.Under Sampling - mostly not prefer because lost of data
 
 2.Over Sampling  (RandomOverSampler (here new points create by same dot)) ,  SMOTETomek(new points create by nearest point so take long time),BorderLine Smote,KMeans Smote,SVM Smote,ADASYN,Smote-NC
 
 https://towardsdatascience.com/7-over-sampling-techniques-to-handle-imbalanced-data-ec51c8db349f
 
 3.class_weight give more importance(weight) to that small class
 
 4.use Stratified kfold to keep the ratio of classess constantly
 
 5.Weighted Neural Network

c.Remove noise data

d.Format data

e.Handle categorical data Ordinal,Nominal,cyclic,binary categorical variables

 1.One Hot Encoding
 
 2.Count Or Frequency Encoding
 
 3.Target Guided Ordinal Encoding
 
 4.Mean Encoding
 
 5.Probability Ratio Encoding
 
 6.label encoding
 
 7.probability ratio encoding
 
 8.woe(Weight_of_evidence)
 
 9.one hot encoding with multi category (keep most frequently repeated only)
 
 10.feature hashing 
 
 11.sparse csr matrix
 
 12.entity embeddings
 
 13.binary encoding
 
 14.Rare label encoding
 
 15.Leave-one-out(Loo) encoding

f.Scaling of data

   1.Normalisation  

   2.Standardization
 
   3.Robust Scaler not influenced by outliers because using of median,IQR
   
   4. Min Max Scaling
   
   5.Mean normalization
   
   6.maximum absolute scaling

Q-Q plot or Shapiro-Wilk Normality Test is used to check whether feature is guassian or normal distributed required for linear regression,logistic regression to Improve performance if not distributed then use below methods to bring it guassian distribution

       a.Guassian Transformation
    
       b.Logarithmic Transformation
    
       c.Reciprocal Trnasformation
    
       d.Square Root Transformation
    
       e.Exponential Transdormation
    
       f.BoxCOx Transformation
    
       g.log(1+x) Transformation
       
       h.johnson

g.Remove low variance feature by using VarianceThreshold

h.Same variable(only 1 variable) in feature then remove feature

i.Outilers removing outilers depond on problem we are solving

  2 type of outilers available: Global outiler, Local outiler

  eg: incase of fraud detection outilers are very important
  
  methods to find outiler: Standard Deviation,zscore,boxplot,scatter plot,IQR,TensorFlow_Data_Validation
  
  Automatic Outlier Detection:Isolation Forest,Local Outlier Factor,Minimum Covariance Determinant,Robust Random Cut Forest,DBScan Clustering
  
  outiler treatment: mean/median/random imputation,drop,discretization (binning)
  
  if outiler present then use robust scaling
  
  https://medium.com/towards-artificial-intelligence/outlier-detection-and-treatment-a-beginners-guide-c44af0699754

j.Anomaly

 clustering techniques to find it
 
 Isolation Forest(for Big Data)

k.Sampling techniques

 a.biased sampling
 
 b.unbiased sampling

3.Exploratory Data Analysis(eda)

Explore the dataset by using  python or microsoft excel or tableau or powerbi, etc...

Data visualization (Matplotlib,Seaborn,Plotly,pyqtgraph,Bokeh,Pygal,Dash,Pydot,Geoplotlib,ggplot,visualizer,etc...)

Scatterplot,multi line plot,bubble chart,bar chart,histogram,boxplot,distplot,bubble charts,area plot,heat map,index plot,violin plot,time series plot,density plot,dot plot,strip plot,plotly,Choropleth Map,Kepler,PDF,Kernel density function,networkx,Scatter_matrix,Bootstrap_plot,functionvis,Higher-Dimensional Plots,3-D Plots,Word Clouds

๐—ž๐—ฒ๐—ฟ๐—ฎ๐˜€ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜ƒ๐—ถ๐˜€๐˜‚๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ผ๐—ฟ(ann-visualizer)- ๐—ฝ๐—ถ๐—ฝ๐Ÿฏ ๐—ถ๐—ป๐˜€๐˜๐—ฎ๐—น๐—น ๐—ด๐—ฟ๐—ฎ๐—ฝ๐—ต๐˜ƒ๐—ถ๐˜‡

univariate and bivariate and multivariate analysis

model visualization Tensorboard,netron,playground tensorflow,plotly,TensorDash,Dash,Microscope,Lucid

distributions(discerte,continous)

data distributions-normal distribution,Standard Normal Distribution,Student's t-Distribution,Bernoulli Distribution,Binomial Distribution,Poisson Distribution,๏‚ทUniform Distribution,F Distribution,Covariance and Correlation

Types of Statistics  

1.Descriptive

2.Inferential

Types of data

1) Categorical (nomial,ordinal)
 
2) Numerical   (discerte,continous)

random variable(discerte random variable ,continous random variable)

Central Limit Theorem,Bayes Theorem,Confidence Interval,Hypothesis Testing,z test, t test,f test,Confidence Interval,1 tail test, 2 tail test,chisquare test,anova test,A/B testing

4.Feature selection

1.Filter methods (correleation,chisquare  test,Ttest,anova test,hypothesis test,information gain etc...)

2.Wrapper methods (forward selection,backwaed elimination,stepwise selection etc...)

3.Embedded method (lasso,ridge regression,elasticnet etc...)

4.Feature Importance

   a.ExtraTreesClassifier,ExtraTreesregressor

   b.SelectKBest

   c.Logistic Regression

   d.Random_forest_importance
   
   e.decision tree
   
   f.Linear Regression
   
   g.xgboost

5.curse of dimensionality (as dimension increases performance decreases)

6.highly correleated features then can take any 1 feature (multicollinearity)

7.dimension reduction

8.lasso regression to penalise unimportant features

9.threshold based method 

10.model based selection

11.Mutual Information Feature Selection

12.remove features with very low variance (quasi constant feature dropping)

13.Univariate  feature selection

14.importance of feature (random forest importance)

15.feature importance with decision trees

16.PyImpetus

17.drop constant features (variance=0)

18.variance inflation factor(vif)

19.Recursive Feature Elimination

https://www.analyticsvidhya.com/blog/2020/10/a-comprehensive-guide-to-feature-selection-using-wrapper-methods-in-python/

5.Data splitting

 Splitting ratio of data deponds on size of dataset available

 Training data,Validation data,Testing data

6.Model selection

Machine learning

A.Supervised learning (have label data)

 1.Regression (output feature in continous data form)
 
   linear regression,polynomial regression,Robust Regression,support vector regression,Decision Tree Regression,Random Forest Regression,
   
   least square method,Random Forest Regression,xgboost,ridge(L2 Regularization),lasso(L1 Regularization (more sparse)),catboost,gradientboosting,adaboost,
   
   elsatic net,light gbm,ordinary least squares,cart
   
   use cases:

 2.Classification (output feature in categorical data form)
 
    Logistic Regression,K-Nearest Neighbors,Support Vector Machine,Kernel SVM,Naive Bayes,Decision Tree Classification,
    
    Random Forest Classification,xgboost,adaboost,Gradient Boost,catboost,gaussian NB,LGBMClassifier,LinearDiscriminantAnalysis, Extreme Gradient Boosting Machine, passive aggressive classifier algorithm,cart,c4.5,c5.0

B.Unsupervised learning(no label(target) data)

 1.Dimensionality reduction - PCA,SVD,LDA,som,tsne,plsr,pcr,autoencoders,kpca,lsa

 2.Clustering :https://scikit-learn.org/stable/modules/clustering.html

 3.Association Rule Learning - support,lift,confidence,aprior,elcat,Fp-growth,Fp-tree construction, association_rules

 4.Recommendation system -
 
     a.collaborative Recommendation system (model based, memory based(item based,user based))  user-item interaction matrix
    
     b.content based Recommendation system 
     
     similarity based(user-user similarity,item-item similarity)
     
     matrix factorization
     
     c.utility based Recommendation system 
     
     d.knowledge based Recommendation system 
     
     e.demographic based Recommendation system 
     
     f.hybrid based Recommendation system 
     
     g.Average Weighted Recommendation
     
     h.using K Nearest Neighbor
     
     i.cosine distance recommender system
     
     j.TensorFlow Recommenders https://www.tensorflow.org/recommenders
     
     k.suprise baseline model
     
     https://analyticsindiamag.com/top-open-source-recommender-systems-in-python-for-your-ml-project/

C.Ensemble methods

 1.Stacking models

 2.Bagging models

 3.Boosting models
 
 4.Blending
 
 5.Voting (Hard Voting,Soft Voting)

D.Reinforcement learning

  2 types a)model free   b)model based

  agent,environment,policy(On-Policy vs Off-Policy),reward function,value function,state,action,episode,actor-critic

  agent apply action to environment get corresponding reward so that it learn environment
  
  1.Q-Learning
  
  2.Deep Q-Learning
  
  3.Deep Convolutional Q-Learning
  
  Deep Deterministic Policy Gradient
  
  4.Twin Delayed DDPG,DQN
  
  5.A3C  (Actor Critic)
  
  6.Advantage weighted actor critic (AWAC). 
  
  7.XCS
  
  8.genetic algorithm,sarsa
  
  https://simoninithomas.github.io/deep-rl-course/
  
   Environments-OpenAI Gym, DeepMind Lab, Unity ML-Agents
   
   https://analyticsindiamag.com/8-best-free-resources-to-learn-deep-reinforcement-learning-using-tensorflow/
   
   https://neptune.ai/blog/best-reinforcement-learning-tutorials-examples-projects-and-courses
   
   https://neptune.ai/blog/best-reinforcement-learning-tutorials-examples-projects-and-courses?utm_source=twitter&utm_medium=tweet&utm_campaign=blog-best-reinforcement-learning-tutorials-examples-projects-and-courses
   
   Open AI Gym - https://gym.openai.com/
   
   KerasRL https://github.com/keras-rl/keras-rl
   
   pyqlearning
   
   tensorforce https://tensorforce.readthedocs.io/en/latest/index.html  
   
   rl_coach https://github.com/IntelLabs/coach#installation        MushroomRL https://mushroomrl.readthedocs.io/en/latest/
   
   TFAgents  https://github.com/tensorflow/agents (https://www.tensorflow.org/agents)   https://deepmind.com/blog/article/trfl     
   
   Stable Baselines  https://github.com/openai/baselines
   
   https://neptune.ai/blog/the-best-tools-for-reinforcement-learning-in-python?utm_source=twitter&utm_medium=tweet&utm_campaign=blog-the-best-tools-for-reinforcement-learning-in-python

Semi-Supervised Learning-small amount of labeled data with a large amount of unlabeled data during training

E.Deep-learning (use when have huge data and data is highly complex and state of art for unstructured data)

Frameworks:Pytorch,Tensorflow,Keras,caffe,theano,MXNet,Matlab,Microsoft Cognitive Toolkit,opacus(Train PyTorch models with Differential Privacy)

1.Multilayer perceptron(MLP)

 1.Regression task

 2.Classification task

2.Convolutional neural network ( use for image data)

 1.Classification of image
 
   create own model,Lenet,Alexnet,Resenet,GoogleNet,Inception,Vgg,Efficient,Nasnet,STN,nasneta,senet,amoebanetc
 
 2.Localization of object in image
 
 3.Object detection and object segmentation 
 
   rcnn,fastrcnn,fastercnn,TensorFlow Object Detection,yolo v1,yolo v2,yolo v3,yolo v4,fast yolo,yolo tiny,yolo lite,yolo tiny++,yolo act++,
   
   maskrcnn,ssd,detectron,detectron2,mobilenet,retinanet,R-fcn,detr facebook,U-net,UNet++ 
   
   3 kind of object segmentation are available semantic segmentation,instance segmentation,panoptic segmentation
   
   PyTorch based low code object detection-https://github.com/alankbi/detecto
   
   https://awesomeopensource.com/project/hoya012/deep_learning_object_detection
 
 4.objecttracking  (mean shit and optical flow and kalman filter)
 
   Tracktor++,Trackrcnn,Jde,DeepSORT 
 
 5.Deepdream,Neural style transfer, Pose estimation 
 
 CNNs 'see' - FilterVisualizations, Heatmaps,Saliency Maps,Heat Map Visualizations
 
 imageai.Detection for Object detection
 
 DEEP LEARNING METHODS FOR 2D :OpenPose,DeepPose,MultiPoseNet,AlphaPose,VIBE,DeeperCut,Mask RCNN,DeepCut,Convolutional Pose Machines,PoseNet
 
 3D POSE ESTIMATION
 
 DEEP LEARNING METHODS FOR 3D:3D human pose estimation= 2D pose estimation + matching,Integral Human Pose Regression,Towards 3D Human Pose Estimation in the

Wild: a Weakly-supervised Approach,A Simple Yet Effective Baseline for 3d Human Pose Estimation,

 Data Augmentation apply to increase size of dataset and performance of model
 
 low code object detection -  detecto  https://github.com/alankbi/detecto 
 
 Object Detection with 10 lines of code-https://www.datasciencecentral.com/profiles/blogs/object-detection-with-10-lines-of-code

3.Recurrent neural network (use when series of data)

 1.RNN
 
 2.GRU
 
 3.LSTM (have memory cell,forget gate  etc..)
 
 all above 3 models have bidirectional also based on problem statement use bidirectional models

4.Generative adversarial network https://poloclub.github.io/ganlab/ https://developers.google.com/machine-learning/gan/training

 Cycle gan,Dcgan,SRGAN,InfoGAN,stargan,attan gan,stylegan,,PixelRNN,DiscoGAN,lsGAN,Conditional GAN(Pix2Pix),Progressive GANs( produces higher resolution images,Image-to-Image Translation),Face Inpainting,Super-resolution
 
 https://github.com/hindupuravinash/the-gan-zoo

5.Autoencoder

  1.sparse Autoencoder
  
  2.denoising Autoencoder
  
  3.Contractive Autoencoder
  
  4.stacked Autoencoder
  
  5.deep Autoencoder
  
  6.variational autoencoder

6.BoltzmannMachines,Restricted Boltzmann Machine,deep belief network,deep BoltzmannMachines

7.Self Organizing Maps (SOM)

8.Natural language processing

 Clean data(removing stopwords depond on problem ,lowering data,tokenization,postagging,stemmimg or lemmatization depond on problem,skipgram,n-gram,chunking)
 
 Nltk,spacy,genism,textblob,inltk,Pattern,stanza,OpenNLP,polygot,corenlp,polyglot,PyDictionary,Huggiing face,spark nlp,allen nlp,rasa nlu,Megatron,texthero,Flair,textacy,finetune,gluon-nlp,VnCoreNLP  libraries
 
 NLU,NLG,NER,text summarization,Sentiment Analysis,Text Classifications,machine translation,chat bot,Text Generation,Speech Recognition
  
 1.bag of words
 
 2.Tfidf
 
 3.wordembedding
    
    a.using pretrained model 
      
      i)word2vec( cbow,skipgram)
      
      ii)glove
      
      iiI)fasttext
    
    b.creating own embedding  (use when have huge data)
    
      i)word2vec library
      
      ii)keras embedding 
    
 4.Document embedding-Doc2vec
  
 5.sentence embedding

   sense2vec,SENT2VEC,Universal sentence encoder
 
 6.using rnn,lstm,gru
 
   for above 3 models have bidirectional also
 
 7.Encoder and Decoder(sequence to sequence), ProphetNet(new pretrained seq2seq model)
  
 8.attention 
 
   self attention,Global Attention,Multi-Head Attention,Local Attention (monotonic,predictive)    https://github.com/uzaymacar/attention-mechanisms
 
 9.Transformer (big breakthrough in NLP) - http://jalammar.github.io/illustrated-transformer/  
 
    Shrinking Transformers (reduce size)  1.quantization,distillation,pruning,
  
 10.BERT,Quantized MobileBERT,ALBERT,DistilBERT,ELMo,ROBERTA,XLNet,XLM-RoBERTa,T5,DISTILBERT,GPT,GPT2,GPT3,PRADO,PET,BORT
 
    http://jalammar.github.io/    http://jalammar.github.io/illustrated-bert/   http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
    
 11.Speech
   
    speech to text   
    
    text to speech
    
    Acoustic model,Speaker diarisation,apis
    
 googletrans (google Translator)
    
 https://medium.com/towards-artificial-intelligence/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0
 
 MT5-https://venturebeat.com/2020/10/26/google-open-sources-mt5-a-multilingual-model-trained-on-over-101-languages/?utm_content=144321587&utm_medium=social&utm_source=linkedin&hss_channel=lcp-3740012
 
 VADER does not require any training data https://pypi.org/project/vaderSentiment/  https://analyticsindiamag.com/sentiment-analysis-made-easy-using-vader/
 
 APPLICATIONS OF MACHINE TRANSLATIO-Text-to-text,Text-to-speech,Speech-to-text,Speech-to-speech,Image (of words)-to-text
 
 Google-GNMT (Tensorflow),Facebook-fairseq (Torch),Amazon-Sockeye (MXNet),NEMATUS (Theano),THUMT (Theano),OpenNMT (PyTorch),StanfordNMT (Matlab),DyNet-lamtram(CMU),EUREKA(MangoNMT

classification,clustering,recommender systems,topic modelling,sentiment analysis,semantic analysis,summarization,machine translation,conversational interface,named entity recognition

F.Time Series

  here data split is different (train,test,validate)
  
  here handling missing data different 
  
  generally used  to impute data in Time Series
  
  1.ffill
  
  2.bfill
  
  3.do mean of previous or future x samples and impute
  
  4.take previous season value and impute (data with trend)
  
  5.mean,mode,median,random sample imputation (data without trend and without seasonality)
  
  6.linear interpolation(data with trend and without seasonality)
  
  7.seasonal +interpolation(data with trend and with seasonality)
  
  here model selection deponds on different property of data like stationary,trend,seasonality,cyclic
  
  adfuller test  for  Stationarity
  
  models 
  
  1.Arima , auto arima ,seasonal arima
  
  2.Autoregressive 
  
  3.Moving average,Exponential Moving average,Exponential Smoothing
  
  4.Lstm(neural network)
  
  5.Autoregressive
  
  6.Navie forecasts
  
  7.Smoothing (moving average,exponential smoothing)
  
  8.Facebook prophet (note:expceted date column as ds and target column as y)
  
  9.Holts winter,Holts linear trend
  
  10.AutoTS-https://analyticsindiamag.com/hands-on-guide-to-autots-effective-model-selection-for-multiple-time-series/
  
  11.Temporal Convolutional Neural
  
  12.Atspy For Automating The Time-Series Forecasting-https://analyticsindiamag.com/hands-on-guide-to-atspy-for-automating-the-time-series-forecasting/
  
  13.Darts-https://analyticsindiamag.com/hands-on-guide-to-darts-a-python-tool-for-time-series-forecasting/
  
  14.Bayesian Neural Network 
  
  15.PyFlux (easy way to compare different models)-https://analyticsindiamag.com/pyflux-guide-python-library-for-time-series-analysis-and-prediction/
  
  16.Orbit , DeepAR ,NeuralProphet(https://github.com/ourownstory/neural_prophet    https://ourownstory.github.io/neural_prophet/model-overview/)
  
  best article-https://www.analyticsvidhya.com/blog/2018/02/time-series-forecasting-methods/,
  
  https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/
  
  https://www.machinelearningplus.com/time-series/time-series-analysis-python/
  
  https://github.com/Apress/hands-on-time-series-analylsis-python
  
  https://otexts.com/fpp2/simple-methods.html
      
  https://analyticsindiamag.com/top-time-series-deep-learning-methods/

G.Semi supervised learning,Self-Supervised Learning,Multi-Instance Learning

H.Active learning,Multi-Task Learning,Online Learning

I.Transfer learning(Inductive Transfer learning(similar domain,different task),Unsupervised Transfer Learning(different task,different domain but similar enough) ,Transductive Transfer Learning(similar task,different domain))

https://github.com/artix41/awesome-transfer-learning

https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a

J.Deep dream,Style transfer

K.One-shot learning,Zero-shot learning

TYPES OF ACTIVATION FUNCTIONS: LINEAR ACTIVATION,RELU,LEAKY RELU,SIGMOID ACTIVATION,TANH ACTIVATION,elu,PReLU,Softmax,Swish,Softplus

Optimizer- Gradient Descent(Batch Gradient Descent,Stochastic Gradient Descent,Mini batch Gradient Descent),sgd with momentum,Adagrad,RMSProp,Adam,AdaBelief

Regularization- L1, L2, dropout, early stopping, and data augmentation,batch normalisation,tree purning

Hyperparameters Number of hidden layers,Dropout,activation function,Weights initialization , learning rate,epoch, iterations and batch size

Hyperparameter tuning

a.GridSearchCV (check every given parameter so take long time)

b.RandomizedSearchCV (search randomly narrow down our time)

c.Bayesian Optimization , Hyperopt

d.Sequential Model Based Optimization(Tuning a scikit-learn estimator with skopt)

e.Optuna

f.Genetic Algorithms

g.Keras tuner

h.Scikit-Optimize

https://towardsdatascience.com/10-hyperparameter-optimization-frameworks-8bc87bc8b7e3

Cross validation techniques- https://towardsdatascience.com/understanding-8-types-of-cross-validation-80c935a4976d

 1.Loocv
 
 2.Kfoldcv
 
 3.Stratfied cross validation
 
 4.Time Series cross-validation
 
 5.Holdout cross-validation
 
 6.Repeated cross-validation

Tensorboard,Neptune to visualization of model performance

Distributed Training with TensorFlow

6.Testing model

Generally used metrics

 Always check bias variance tradeoff to know how model is performing
 
 Model can be overfitting(low bias,high variance),underfitting(high bias,high variance),good fit(low bias,low variance)
 
1.Regression task - mean-squared-error, Root-Mean-Squared-Error,mean-absolute error, Rยฒ, Adjusted Rยฒ,Cross-entropy loss,Mean percentage error 

2.Classification task-Accuracy,confusion matrix,Precision,Recall,F1 Score,Binary Crossentropy,Categorical Crossentropy,AUC-ROC curve,log loss,Average precision,Mean average precision

3.Reinforcement learning - generally  use rewards

4.Incase of machine translation use bleu score

5.Clustering then use External: Adjusted Rand index, Jaccard Score, Purity Score    Internal:silhouette_score, Davies-Bouldin Index, Dunn Index

6.Object Detection loss-localization loss,classification loss,Focal Loss,IOU,L2 loss

7.Distance Metrics - Euclidean Distance,Manhattan Distance,Minkowski Distance,Hamming Distance

metric-Built-in metrics, Custom metric without external parameters,Custom metric with external parameters,Subclassing custom metric layer

https://medium.com/swlh/custom-loss-and-custom-metrics-using-keras-sequential-model-api-d5bcd3a4ff28

loss-Built-in loss, Custom loss without external parameters,Custom loss with external parameters,Subclassing loss layer

Docker and Kubernetes

7.deployment

1.Azure

2.Heroku

3.Amazon Web Services

4.Google cloud platform

MODEL DEPLOYMENT USING TF SERVING
 
TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines https://www.tensorflow.org/tfx

Models visualization using Tensorboard,netron, TensorBoard.dev

Python web Frameworks for App Development- Flask,Streamlit,fastapi,Django,Web2py,Pyramid,CherryPy,Voila,Kivy and Kivymd  https://analyticsindiamag.com/top-8-python-tools-for-app-development/

Web-Based GUI (Gradio)- https://analyticsindiamag.com/guide-to-gradio-create-web-based-gui-applications-for-machine-learning/

https://github.com/gradio-app/gradio

Tensorflow lite:Use of tensorflow lite to reduce size of model https://www.tensorflow.org/lite https://codelabs.developers.google.com/codelabs/recognize-flowers-with-tensorflow-on-android-beta/#0 https://tfhub.dev/s?deployment-format=lite https://www.tensorflow.org/lite/examples https://www.tensorflow.org/lite/microcontrollers https://www.tensorflow.org/lite/models

model optimization (architecture)

TinyML https://blog.tensorflow.org/2020/08/the-future-of-ml-tiny-and-bright.html

Post-training Quantization in TensorFlow Lite https://www.tensorflow.org/lite/performance/post_training_quantization

pruning

leverage of model architecture

Quantization:Use Quantization to reduce size of model

8.Mointoring model

CI CD pipeline used- circleci , jenkins

In real world project use pipeline -https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

1.easy debugging

2.better readability

BIG DATA: hadoop,apache spark

research paper-https://arxiv.org/ ,https://arxiv.org/list/cs.LG/recent, https://www.kaggle.com/Cornell-University/arxiv

code for Research Papers-https://chrome.google.com/webstore/detail/find-code-for-research-pa/aikkeehnlfpamidigaffhfmgbkdeheil

Summarise Research Papers - https://www.semanticscholar.org/

programming language for data science is Python,R,Julia,Java,Scala,JAVA SCRIPT(Tensorflow.js)

IDE:jupyter notebook,spyder,pycharm,visual studio

BEST ONLINE COURSES

1.COURSERA

2.UDEMY

3.EDX

4.DATACAMP

5.Udacity

BEST YOUTUBE CHANNEL TO FOLLOW

1.Krish Naik-https://www.youtube.com/user/krishnaik06

2.Codebasics-https://www.youtube.com/channel/UCh9nVJoWXmFb7sLApWGcLPQ  

3.Abhishek thakur-https://www.youtube.com/user/abhisheksvnit

4.AIEngineering-https://www.youtube.com/channel/UCwBs8TLOogwyGd0GxHCp-Dw

5.Ineuron-https://www.youtube.com/channel/UCb1GdqUqArXMQ3RS86lqqOw

6.Ken jee-https://www.youtube.com/c/KenJee1/featured       

7.3Blue1Brown-https://www.youtube.com/c/3blue1brown/featured

8.The AI Guy -https://www.youtube.com/channel/UCrydcKaojc44XnuXrfhlV8Q 

9.Unfold Data Science-https://www.youtube.com/channel/UCh8IuVJvRdporrHi-I9H7Vw

BEST BLOGS TO FOLLOW

1.Towards data science-https://towardsdatascience.com/

2.Analyticsvidhya-https://www.analyticsvidhya.com/blog/?utm_source=feed&utm_medium=navbar

3.Medium-https://medium.com/

4.Machinelearningmastery-https://machinelearningmastery.com/blog/

5.ML+  -https://www.machinelearningplus.com/

BEST RESOURCES

1.paperswithcode-https://paperswithcode.com/methods

2.madewithml-https://madewithml.com/topics/ https://madewithml.com/courses/applied-ml-in-production/

Weights & Biases-https://wandb.ai/gallery sotabench-https://sotabench.com/

3.Deep learning-https://course.fullstackdeeplearning.com/#course-content

4.pytorch deep learning-https://atcold.github.io/pytorch-Deep-Learning/

PyTorch Lightning-https://github.com/PyTorchLightning/pytorch-lightning

jax- https://github.com/google/jax

incubator-mxnet - https://github.com/apache/incubator-mxnet

ignite-https://github.com/pytorch/ignite

fastText - https://github.com/facebookresearch/fastText

5.deep-learning-drizzle-https://deep-learning-drizzle.github.io/ https://deep-learning-drizzle.github.io/index.html

6.Fastaibook-https://github.com/fastai/fastbook , https://course.fast.ai/

neptune.ai-https://docs.neptune.ai/index.html

7.TopDeepLearning-https://github.com/aymericdamien/TopDeepLearning

8.NLP-progress-https://github.com/sebastianruder/NLP-progress

9.EasyOCR-https://github.com/JaidedAI/EasyOCR

10.Awesome-pytorch-list-https://github.com/bharathgs/Awesome-pytorch-list https://shivanandroy.com/awesome-nlp-resources/

11.free-data-science-books-https://github.com/chaconnewu/free-data-science-books

12.arcgis-https://github.com/Esri/arcgis-python-api

13.data-science-ipython-notebooks-https://github.com/donnemartin/data-science-ipython-notebooks

14.julia-https://github.com/JuliaLang/julia , https://docs.julialang.org/en/v1/

15.google-research-https://github.com/google-research/google-research

16.reinforcement-learning-https://github.com/dennybritz/reinforcement-learning

17.keras-applications-https://github.com/keras-team/keras-applications , https://github.com/keras-team/keras

18.opencv-https://github.com/opencv/opencv

19.transformers-https://github.com/huggingface/transformers

20.code implementations for research papers-https://chrome.google.com/webstore/detail/find-code-for-research-pa/aikkeehnlfpamidigaffhfmgbkdeheil

21.regarding satellite images - Geo AI,Arcgis

ersi arcgis-https://www.esri.com/en-us/arcgis/about-arcgis/overview

earthcube-https://www.earthcube.eu/

22.Monk_Object_Detection-https://github.com/Tessellate-Imaging/Monk_Object_Detection

23.NLP-progress - https://github.com/sebastianruder/NLP-progress

24.interview-question-data-science-https://github.com/iNeuronai/interview-question-data-science-

25.recommenders-https://github.com/microsoft/recommenders

26.Awesome-NLP-Resources -https://github.com/Robofied/Awesome-NLP-Resources https://shivanandroy.com/awesome-nlp-resources/ https://github.com/keon/awesome-nlp

27.Tool for visualizing attention in the Transformer model-https://github.com/jessevig/bertviz

28.TransCoder-https://github.com/facebookresearch/TransCoder

29.Tessellate-Imaging-https://github.com/Tessellate-Imaging/monk_v1

Monk_Object_Detection-https://github.com/Tessellate-Imaging/Monk_Object_Detection/tree/master/application_model_zoo

Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials- https://github.com/TarrySingh/Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials

30.Machine-Learning-with-Python-https://github.com/tirthajyoti/Machine-Learning-with-Python

31.huggingface contain almost all nlp pretrained model and all tasks related to nlp field

https://github.com/huggingface https://github.com/huggingface/transformers https://huggingface.co/transformers/ https://huggingface.co/transformers/master/ https://github.com/huggingface/tokenizers

32.multi-task-NLP-https://github.com/hellohaptik/multi-task-NLP

33.gpt-2 - https://github.com/openai/gpt-2

34.Powerful and efficient Computer Vision Annotation Tool (CVAT)-https://github.com/openvinotoolkit/cvat, https://github.com/abreheret/PixelAnnotationTool

https://github.com/UniversalDataTool/universal-data-tool http://www.robots.ox.ac.uk/~vgg/software/via/

35.Data augmentation for NLP-https://github.com/makcedward/nlpaug

36.awesome Data Science-https://github.com/academic/awesome-datascience

37.mlops-https://github.com/visenger/awesome-mlops

38.gym-https://github.com/openai/gym

39.Super Duper NLP Repo-https://notebooks.quantumstat.com/ https://models.quantumstat.com/ https://miro.com/app/board/o9J_kqndLls=/ https://datasets.quantumstat.com/

https://notebooks.quantumstat.com/?utm_campaign=NLP%20News&utm_medium=email&utm_source=Revue%20newsletter

40.papers summarizing the advances in the field-https://github.com/eugeneyan/ml-surveys

41.deep-translator-https://github.com/nidhaloff/deep-translator

42.detext-https://github.com/linkedin/detext

43.nlpaug-https://github.com/makcedward/nlpaug

44.ipython-sql-https://github.com/catherinedevlin/ipython-sql

45.libra-https://github.com/Palashio/libra

46.opencv-https://github.com/opencv/opencv

47.learnopencv-https://github.com/spmallick/learnopencv , https://www.learnopencv.com/

48.math is fun-https://www.mathsisfun.com/ , https://pabloinsente.github.io/intro-linear-algebra, https://hadrienj.github.io/posts/Deep-Learning-Book-Series-Introduction/

49.DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ - https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

50.Spark Release 3.0.1-https://spark.apache.org/releases/spark-release-3-0-1.html

51.for more cheatsheets-https://github.com/FavioVazquez/ds-cheatsheets , https://medium.com/swlh/the-ultimate-cheat-sheet-for-data-scientists-d1e247b6a60c

52.text2emotion-https://pypi.org/project/text2emotion/

53.ExploriPy-https://analyticsindiamag.com/hands-on-tutorial-on-exploripy-effortless-target-based-eda-tool/

54.TCN-https://github.com/philipperemy/keras-tcn

55.deeplearning-models-https://github.com/rasbt/deeplearning-models

56.earthengine-py-notebooks-https://github.com/giswqs/earthengine-py-notebooks

57.NLP-progress -https://github.com/sebastianruder/NLP-progress

58.numerical-linear-algebra -https://github.com/fastai/numerical-linear-algebra

59.Super Duper NLP Repo- https://notebooks.quantumstat.com/

60.reinforcement learning by using PyTorch-https://github.com/SforAiDl/genrl

61.chatbot- from scratch,google dialogflow,rasa nlu,azure luis, chatterbot,Amazon lex,Wit.ai,Luis.ai,IBM Watson etc...

https://blog.ubisend.com/optimise-chatbots/chatbot-training-data

  1. No Code Machine Learning / Deep Learning

Teachable Machine-https://teachablemachine.withgoogle.com/

Microsoft Lobe -https://lobe.ai/

WEKA - https://www.cs.waikato.ac.nz/ml/weka/

Monk_Gui-https://github.com/Tessellate-Imaging/Monk_Gui

ENNUI-https://math.mit.edu/ennui/ https://github.com/martinjm97/ENNUI https://www.youtube.com/watch?v=4VRC5k0Qs2w

Knime https://www.knime.com/

Accord.net http://accord-framework.net/

Rapid Miner https://rapidminer.com/

opennn https://www.opennn.net/

64.tensorflow development-https://blog.tensorflow.org/

TensorFlow Hub (trained ready-to-deploy machine learning models in one place) - https://tfhub.dev/

TensorBoard.dev - https://tensorboard.dev/

tutorials-https://www.tensorflow.org/tutorials https://www.tensorflow.org/guide

TensorFlow Graphics - https://www.tensorflow.org/graphics Lattice-https://www.tensorflow.org/lattice

TensorFlow Probability-https://www.tensorflow.org/probability TensorFlow Privacy- tensorflow-privacy

63.Data Science in the Cloud-Amazon SageMaker,Amazon Lex,Amazon Rekognition,Azure Machine Learning (Azure ML) Services,Azure Service Bot framework,Google Cloud AutoML

64.platforms to build and deploy ML models -Uber has Michelangelo,Google has TFX,Databricks has MLFlow,Amazon Web Services (AWS) has Sagemaker

65.Time Complexity Of Machine Learning Models -https://www.thekerneltrip.com/machine/learning/computational-complexity-learning-algorithms/

66.ML from scratch-https://dafriedman97.github.io/mlbook/content/introduction.html

67.turn-on visual training for most popular ML algorithms https://github.com/lucko515/ml_tutor https://pypi.org/project/ml-tutor/

68.mlcourse.ai is a free online- https://mlcourse.ai/

69.using pretrained model provided by tfhub- https://tfhub.dev/

70.Deep-Learning-with-PyTorch- https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf

71.MIT 6.S191 Introduction to Deep Learning-http://introtodeeplearning.com/

72.R for Data Science-https://r4ds.had.co.nz/ ,Fundamentals of Data Visualization-https://clauswilke.com/dataviz/

74.machine learning in JavaScript-https://www.tensorflow.org/js https://www.tensorflow.org/js/models https://tensorflow-js-object-detection.glitch.me/

TensorFlow.jl Julia with TensorFlow https://malmaud.github.io/tfdocs/ https://malmaud.github.io/TensorFlow.jl/latest/tutorial.html

Sonnet is a library built on top of TensorFlow 2 https://github.com/deepmind/sonnet

TensorFlow Federated (TFF) ( facilitate open research and experimentation with Federated Learning)-https://www.tensorflow.org/federated

TFX is an end-to-end platform for deploying production ML pipelines https://www.tensorflow.org/tfx https://github.com/tensorflow/tfx

Federated Learning -https://www.tensorflow.org/federated/tutorials/federated_learning_for_image_classification

Neural Structured Learning-https://www.tensorflow.org/neural_structured_learning/tutorials/graph_keras_mlp_cora

Responsible AI-https://www.tensorflow.org/resources/responsible-ai

https://www.tensorflow.org/graphics

75.free list of AI/ Machine Learning Resources/Courses-https://www.marktechpost.com/free-resources/

https://www.theinsaneapp.com/2020/11/free-machine-learning-data-science-and-python-books.html

65 Machine Learning and Data books for free- https://towardsdatascience.com/springer-has-released-65-machine-learning-and-data-books-for-free-961f8181f189

https://github.com/chaconnewu/free-data-science-books

http://introtodeeplearning.com/

76.Code for Research Papers-https://chrome.google.com/webstore/detail/find-code-for-research-pa/aikkeehnlfpamidigaffhfmgbkdeheil

77.Natural Language Processing 365- https://ryanong.co.uk/natural-language-processing-365/

78.Top Computer Vision Google Colab Notebooks- https://www.qblocks.cloud/creators/computer-vision-google-colab-notebooks

79.For practice -https://www.confetti.ai/exams

80.Yellowbrick-https://towardsdatascience.com/introduction-to-yellowbrick-a-python-library-to-explain-the-prediction-of-your-machine-learning-d63ecee10ecc

81.Mathematics of Machine Learning,deep learning-https://towardsdatascience.com/the-mathematics-of-machine-learning-894f046c568

https://github.com/hrnbot/Basic-Mathematics-for-Machine-Learning

https://towardsdatascience.com/the-roadmap-of-mathematics-for-deep-learning-357b3db8569b

https://towardsai.net/p/data-science/how-much-math-do-i-need-in-data-science-d05d83f8cb19

https://www.mltut.com/how-to-learn-math-for-machine-learning-step-by-step-guide/

https://www.datasciencecentral.com/profiles/blogs/free-online-book-machine-learning-from-scratch

https://www.youtube.com/playlist?list=PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a https://github.com/jonkrohn/ML-foundations

82.Googleai-https://ai.google/education

83.ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

PyBrain is a modular Machine Learning Library for Python

84.Best Online Courses for Machine Learning and Data Science-https://www.mltut.com/best-online-courses-for-machine-learning-and-data-science/

AI Expert Roadmap-https://i.am.ai/roadmap/#data-science-roadmap

85.FastAPI-https://fastapi.tiangolo.com/deployment/deta/

86.Yann LeCunโ€™s Deep Learning Course at CDS-https://cds.nyu.edu/deep-learning/ https://atcold.github.io/pytorch-Deep-Learning/

https://atcold.github.io/pytorch-Deep-Learning/

87.Four Important Computer Vision Annotation Tools https://heartbeat.fritz.ai/4-important-computer-vision-annotation-tools-you-need-to-know-in-2020-9f964931ed7

88.Python Data Science Handbook https://jakevdp.github.io/PythonDataScienceHandbook/

89.for low code object detection (detecto)- https://github.com/alankbi/detecto

90.1 line for hundreds of NLP models and algorithms- https://github.com/JohnSnowLabs/nlu

91.AudioFeaturizer when deal with audio data- https://pypi.org/project/AudioFeaturizer/

liborsa library https://librosa.org/doc/latest/index.html

MAGENTA-https://magenta.tensorflow.org/

92.Palladium-https://palladium.readthedocs.io/en/latest/

93.KNIME-https://www.knime.com/

94.Facebook Open Sourced New Frameworks to Advance Deep Learning Research https://www.kdnuggets.com/2020/11/facebook-open-source-frameworks-advance-deep-learning-research.html

95.PYTORCH - https://pytorch.org/ https://pytorch.org/ecosystem/ https://pytorch.org/tutorials/ https://pytorch.org/docs/stable/index.html https://github.com/pytorch/pytorch

Lightning https://pytorchlightning.ai/community#projects

๐—ข๐—ฝ๐—ฎ๐—ฐ๐˜‚๐˜€ (๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐—ฃ๐˜†๐—ง๐—ผ๐—ฟ๐—ฐ๐—ต ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜๐—ถ๐—ฎ๐—น ๐—ฝ๐—ฟ๐—ถ๐˜ƒ๐—ฎ๐—ฐ๐˜†)-https://opacus.ai/

96.Atlas web-based dashboard -https://www.atlas.dessa.com/

97.Pytest (test code) https://docs.pytest.org/en/latest/index.html (test code)

98.keras- https://keras.io/ https://keras.io/api/ https://keras.io/examples/

99.High-Performance Jupyter Notebook - BlazingSQL Notebooks https://blazingsql.com/notebooks

100.CV-pretrained-model- https://github.com/balavenkatesh3322/CV-pretrained-modelCV-pretrained-model-

101.Kubeflow Machine Learning Toolkit for Kubernetes https://www.kubeflow.org/

102.Daily AI updates to your inbox- https://sago-ai.news/#/

103.Three API styles - Sequential Model,functional API,Model subclassing

104.Deep Learning Toolkit for Medical Image Analysis -https://github.com/DLTK/DLTK

106.Interpret The ML Model lime - https://lime-ml.readthedocs.io/en/latest/

SHAP https://medium.com/towards-artificial-intelligence/explain-your-machine-learning-predictions-with-kernel-shap-kernel-explainer-fed56b9250b8

107.deep-learning-drizzle -https://deep-learning-drizzle.github.io/

108.Machine Learning University - https://aws.amazon.com/machine-learning/mlu/

109.mlflow https://mlflow.org/ An open source platform for the machine learning lifecycle

https://neptune.ai/

https://azure.microsoft.com/en-us/services/machine-learning/

https://github.com/VertaAI/modeldb

Follow leaders in the field to update yourself in the field

1.Linkedin

2.Twitter

CPU/GPU/TPU

1.Google cloab (FREE)

2.Kaggle kernel(read terms and conditions before use) (FREE)

3.Paperspace Gradient(read terms and conditions before use)

4.knime - https://www.knime.com/(read terms and conditions before use)

5.RapidMiner (read terms and conditions before use)

https://github.com/zszazi/Deep-learning-in-cloud

So what next ?

participate online competition and do project and apply to intership ,job,solving real world problems, etc...

applications of data science in many industry

1.E-commerce- Identifying consumers,Recommending Products,Analyzing Reviews

2.Manufacturing- Predicting potential problems,Monitoring systems,Automating manufacturing units, Maintenance Scheduling,Anomaly Detection

3.Banking- Fraud detection,Credit risk modeling,Customer lifetime value

4.Healthcare- Medical image analysis, Drug discovery,Bioinformatics,Virtual Assistants,image segmentation

5.Transport- Self-driving cars,Enhanced driving experience,Car monitoring system,Enhancing the safety of passengers

6.Finance- Customer segmentation,Strategic decision making,Algorithmic trading,Risk analytics

7.Marketing (Added from comments Credits: Jawad Ali)- LTV predictions,Predictive analytics for customer behavior,Ad targeting

and many more fields - https://www.topbots.com/enterprise-ai-companies-2020/ , https://venturebeat.com/2020/10/21/the-2020-data-and-ai-landscape/

Research blogs

1.https://ai.facebook.com/ https://ai.facebook.com/blog/

2.https://ai.googleblog.com/

3.https://deepmind.com/blog https://deepai.org/definitions

4.https://openai.com/blog/

5.https://www.malongtech.com/en/research.html

6.https://blogs.nvidia.com/blog/tag/artificial-intelligence/

7.https://blog.tensorflow.org/

8.https://pytorch.org/blog/

https://www.kdnuggets.com/2020/01/top-10-ai-ml-articles-to-know.html

RESEARCH LABS IN THE WORLD

https://ai.facebook.com/ https://ai.googleblog.com/ https://research.google/ https://ai.google/research/

1.The Alan Turing Institute:https://www.turing.ac.uk/

2.J.P. Morgan AI Research Lab:https://www.jpmorgan.com/insights/tec...

3.Oxford ML Research Group:http://www.robots.ox.ac.uk/~parg/proj...

4.Microsoft Research Lab- AI:https://www.microsoft.com/en-us/resea...

5.Berkeley AI Research:https://bair.berkeley.edu/

6.LIVIA:https://en.etsmtl.ca/Unites-de-recher...

7.MIT Computer Science and Artificial :https://www.csail.mit.edu/

online competitions:

1.Kaggle-https://www.kaggle.com/

2.hackerearth-https://www.hackerearth.com/challenges/

3.machinehack-https://www.machinehack.com/

4.analyticsvidhya-https://datahack.analyticsvidhya.com/contest/all/

5.zindi-https://zindi.africa/competitions

6.crowdai-https://www.crowdai.org/

7.driven data-https://www.drivendata.org/

8.dockship-https://dockship.io/

9.SIGNATE Competition- https://signate.jp/about?rf=competition_about

9.International Data Analysis Olympiad (IDAHO)

10.Codalab

11.Iron Viz

12.Data Science Challenges

13.Tianchi Big Data Competition

Some useful content :

  1. H20.ai automl, google automl,google ml kit(https://developers.google.com/ml-kit) ,Azure Cognitive Services,Azure Machine Learning Service,Azure Machine Learning Studio,Google Cloud Platform,Weka,Microsoft Cognitive Toolkit,Google Cloud AutoML,DataRobot AutoML,Databricks AutoML,Azure ML,azure machine learning studio,IBM Watson

https://neptune.ai/blog/best-machine-learning-as-a-service-platforms-mlaas?utm_source=twitter&utm_medium=tweet&utm_campaign=blog-best-machine-learning-as-a-service-platforms-mlaas

https://codegnan.com/blog/35-best-data-sciecne-tools-for-beginners-to-master/

  1. Tpot

  2. autopandas

  3. AutoGluon https://analyticsindiamag.com/how-to-automate-machine-learning-tasks-using-autogluon/

  4. autosklearn,autokeras,LightAutoML (https://github.com/sberbank-ai-lab/LightAutoML)

  5. Auto_ViML - Automatically Build Multiple ML Models with a Single Line of Code (https://github.com/AutoViML/Auto_ViML)

    ๐—ฎ๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ฒ ๐—บ๐—ผ๐˜€๐˜ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐˜€๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ https://github.com/Muhammad4hmed/GML

    CodeLess https://pypi.org/project/codeless/ https://github.com/porky5191/codeless_demo_project

  6. AutoViz - Automatically Visualize any dataset, any size with a single line of code (https://github.com/AutoViML/AutoViz)

  7. hyperopt

  8. sweetviz (EDA purpose) - https://pypi.org/project/sweetviz/

  9. pandasprofiling(display whole EDA) - https://pypi.org/project/pandas-profiling/

  10. autokeras,AutoSklearn,Neural Network Intelligence

    FeatureTools automated feature engineering.

    MLBox,Lightwood,mindsdb(machine learning models using SQL queries),mljar-supervised,Ludwig(deep learning models without the need to write code)

    AdaNet is a lightweight TensorFlow-based framework

  11. pycaret- https://pycaret.org/

mindsdb Machine Learning in 5 Lines of Code https://mindsdb.com/

12.Auto_Timeseries by auto_ts : Automatically build ARIMA, SARIMAX, VAR, FB Prophet and ML Models on Time Series data (https://github.com/AutoViML/Auto_TS)

13.AutoNLP_Sentiment_Analysis by autoviml - (https://github.com/AutoViML/Auto_ViML)

14.automl lazypredict https://github.com/shankarpandala/lazypredict

AutoFeat-https://analyticsindiamag.com/guide-to-automatic-feature-engineering-using-autofeat/

15.bamboolib or pandas-ui or pandas-summary or pandas_visual_analysis or Dtale(get code also) (python package for easy data exploration & transformation)

Automating EDA using Pandas Profiling, Sweetviz and Autoviz,DataPrep,vaex,Datapane,Sweetviz,PandasGUI,Datatable,Dora,Pywedge,D-Tale,lux,Dabl,Pretty pandas,AWS Glue DataBrew,speedML,edaviz,Altair

https://github.com/mstaniak/autoEDA-resources

ExploriPy import EDA-https://analyticsindiamag.com/hands-on-tutorial-on-exploripy-effortless-target-based-eda-tool/

Lens- Statistical Analysis of Data https://analyticsindiamag.com/hands-on-tutorial-on-lens-python-tool-for-swift-statistical-analysis/

Datacleaner-https://analyticsindiamag.com/tutorial-on-datacleaner-python-tool-to-speed-up-data-cleaning-process/

Datacleaner :dora ,Voilร  -Jupyter Notebooks quickly into standalone web applications , Plotly Dash - for more advanced and production level dashboards

featurewiz(Select the best features from your data set fast with a single line of code) - https://github.com/AutoViML/featurewiz

Panel - web apps

16.CUPY (array process parallel in gpu) https://pypi.org/project/cupy/

17.Dabl-automate the known 80% of Data Science which is data preprocessing, data cleaning, and feature engineering https://pypi.org/project/dabl/

18.dask (parallel comptataion) https://docs.dask.org/en/latest/ https://medium.com/rapids-ai/reading-larger-than-memory-csvs-with-rapids-and-dask-e6e27dfa6c0f#cid=av01_so-nvsh_en-us

Modin , Vaex , Dask,cuDF

19.dataprep (Understand your data with a few lines of code in seconds)

data-preparation-tools - https://improvado.io/blog/data-preparation-tools

20.Dora library is another data analysis library designed to simplify exploratory data analysis. https://pypi.org/project/Dora/

21.FastAPI is a modern, fast (high-performance), web framework for building APIs. https://fastapi.tiangolo.com/

22.faster Hyper Parameter Tuning(sklearn-nature-inspired-algorithms) https://pypi.org/project/sklearn-nature-inspired-algorithms/

23.FlashText (A library faster than Regular Expressions for NLP tasks) https://pypi.org/project/flashtext/

24.Guietta (tool that makes simple GUIs simple) https://pypi.org/project/guietta/

pandas-visual-analysis -https://analyticsindiamag.com/hands-on-guide-to-pandas-visual-analysis-way-to-speed-up-data-visualization/

25.hummingbird (make code fastly exexcute) https://pypi.org/project/Hummingbird/

CUML- increase the speed of training your machine learning model https://towardsdatascience.com/train-your-machine-learning-model-150x-faster-with-cuml-69d0768a047a

https://docs.rapids.ai/api/cuml/stable/

26.memory-profiler (tell memory consumption line by line) https://pypi.org/project/memory-profiler/

27.numexpr (incerease speed of execution of numpy) https://github.com/pydata/numexpr

28.pandarallel (simple and efficient tool to parallelize your pandas computation on all your CPUs) https://pypi.org/project/pandarallel/

29.PDFTableExtract(by PyPDF2) https://github.com/ashima/pdf-table-extract

Camelot-https://towardsdatascience.com/extracting-tabular-data-from-pdfs-made-easy-with-camelot-80c13967cc88

30.PyImpuyte(Python package that simplifies the task of imputing missing values in big datasets) https://pypi.org/project/PyImpuyte/

31.libra(Automates the end-to-end machine learning process in just one line of code) https://pypi.org/project/libra/

32.debug code by puyton -m pdp -c continue

33.cURL (This is a useful tool for obtaining data from any server via a variety of protocols including HTTP.) https://stackabuse.com/using-curl-in-python-with-pycurl/

34.csvkit https://pypi.org/project/csvkit/

35.IPython IPython gives access to enhanced interactive python from the shell.

36.pip install faker (Create our own Dataset) https://pypi.org/project/Faker/

37.Python debugger %pdb

38.๐šŸ๐š˜๐š’๐š•๐šŠ-From notebooks to standalone web applications and dashboards https://voila.readthedocs.io/en/stable/ https://github.com/voila-dashboards/voila

39.๐š๐šœ๐š•๐šŽ๐šŠ๐š›๐š— for timeseries data https://github.com/tslearn-team/tslearn

40.texthero text-based dataset in Pandas Dataframe quickly and effortlessly https://github.com/jbesomi/texthero

41.๐š”๐šŠ๐š•๐šŽ๐š’๐š๐š˜(web-based visualization libraries like your Jupyter Notebook with zero dependencies) https://pypi.org/project/kaleido/

42.Vaex- Reading And Processing Huge Datasets in seconds https://github.com/vaexio/vaex

43.Uberโ€™s Ludwig is an Open Source Framework for Low-Code Machine Learning https://eng.uber.com/introducing-ludwig/

44.Google's TAPAS, a BERT-Based Model for Querying Tables Using Natural Language https://github.com/google-research/tapas

45.RAPIDS open GPU Data Science https://rapids.ai/

RAPIDS cuML

46.pyforest Lazy-import of all popular Python Data Science libraries. Stop writing the same imports over and over again. https://pypi.org/project/pyforest/0.1.1/

47.Modin Get faster Pandas with Modin https://github.com/modin-project/modin

48.Text2Code for Jupyter notebook - https://github.com/deepklarity/jupyter-text2code , https://towardsdatascience.com/data-analysis-made-easy-text2code-for-jupyter-notebook-5380e89bb493

49.Openrefine Tool-For Data Preprocessing Without Code https://analyticsindiamag.com/openrefine-tutorial-a-tool-for-data-preprocessing-without-code/

50.Microsoft Releases Latest Version Of DeepSpeed deep learning optimisation library known as DeepSpeed- https://github.com/microsoft/DeepSpeed

https://analyticsindiamag.com/microsoft-releases-latest-version-of-deepspeed-its-python-library-for-deep-learning-optimisation/

51.4-pandas-tricks-https://towardsdatascience.com/4-pandas-tricks-that-most-people-dont-know-86a70a007993

52.tkinter to deploy machine learning model-https://analyticsindiamag.com/complete-tutorial-on-tkinter-to-deploy-machine-learning-model/

53.autoplotter is a python package for GUI based exploratory data analysis-https://github.com/ersaurabhverma/autoplotter

54.3 NLP Interpretability Tools For Debugging Language Models-https://www.topbots.com/nlp-interpretability-tools/

55.New Algorithm For Training Sparse Neural Networks (RigL)-https://analyticsindiamag.com/rigl-google-algorithm-neural-networks/

56.Read Data from pdf and Word-PyPDF2,PDFMiner,PDFQuery,tabula-py,pdflib for Python,PDFTables,PyFPDF2

OpenCV to Extract Information From Table Images-https://analyticsindiamag.com/how-to-use-opencv-to-extract-information-from-table-images/

57.Text Annotation-https://towardsdatascience.com/tortus-e4002d95134b

58.GDMix, A Framework That Trains Efficient Personalisation Models - https://analyticsindiamag.com/linkedin-open-sources-gdmix-a-framework-that-trains-efficient-personalisation-models/

59.Learn Machine Learning Concepts Interactively-https://towardsdatascience.com/learn-machine-learning-concepts-interactively-6c3f64518da2

60.Folium, Python Library For Geographical Data Visualization-https://analyticsindiamag.com/hands-on-tutorial-on-folium-python-library-for-geographical-data-visualization/

61.GPU Technology Conference (GTC) Keynote Oct 2020-https://www.youtube.com/watch?v=Dw4oet5f0dI&list=PLZHnYvH1qtOYOfzAj7JZFwqtabM5XPku1

62.jiant nlp task-https://github.com/nyu-mll/jiant

63.painted your machine learning model-https://koaning.github.io/human-learn/

64.Vector AI-https://github.com/vector-ai/vectorai

65.NVIDIA NeMo(for Conversational AI)-https://github.com/NVIDIA/NeMo

66.Deep Learning Models Without Coding(DeepCognition)-https://analyticsindiamag.com/how-to-use-deepcognition-to-build-drag-and-drop-deep-learning-models-without-coding/

67.100 Machine Learning Projects-https://medium.com/@amankharwal/100-machine-learning-projects-aff22b22dd6e

68.Question generation using Natural Language Processing-https://github.com/ramsrigouthamg/Questgen.ai

69.PixelLib(image segmentation,Blur Background,Gray Background,Background Colour Change,Background Change)-https://github.com/ayoolaolafenwa/PixelLib

70.High-Resolution 3D Human Digitization-https://shunsukesaito.github.io/PIFuHD/

71.AI model that translates 100 languages without relying on English data - https://ai.facebook.com/blog/introducing-many-to-many-multilingual-machine-translation/

72.800 free textbooks - https://open.umn.edu/opentextbooks

73.TensorDash is an application that lets you remotely monitor your deep learning model's metrics and notifies you when your model training is completed or crashed.

https://github.com/CleanPegasus/TensorDash

74.YellowBrick -select features, tune hyperparameters, select the best models, and understand the performance metrics.

75.Freely Available Python Books-https://rajukumarmishrablog.com/freely-available-python-books/

Collection of Python Cheat Sheets- https://rajukumarmishrablog.com/collection-of-python-cheat-sheets/

76.Add External Data to Your Pandas Dataframe - https://towardsdatascience.com/add-external-data-to-your-pandas-dataframe-with-a-one-liner-f060f80daaa4

https://www.openblender.io/#/welcome

77.visualize the model architecture-https://github.com/PerceptiLabs/PerceptiLabs

78.Train Conversational AI in 3 lines of code with NeMo and Lightning-https://towardsdatascience.com/train-conversational-ai-in-3-lines-of-code-with-nemo-and-lightning-a6088988ae37

79.Machine Learning for Healthcare by mit-https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-s897-machine-learning-for-healthcare-spring-2019/

80.pydot is an interface to Graphviz ,AutoGraph-Easy control flow for graphs,Neo4j-Graph Data Science Library,pyRDF2Vec-Representations of Entities in a Knowledge Graph,igraph

https://www.kdnuggets.com/2019/05/60-useful-graph-visualization-libraries.html

81.HTML tables into Google Sheets -https://towardsdatascience.com/import-html-tables-into-google-sheets-effortlessly-f471eae58ac9

82.Gradio - take input frpm user https://gradio.app/getting_started

  1. Mito, an editable spreadsheet inside your Jupyter Notebook. - https://trymito.io/

84.Google Introduces Document AI (DocAI) https://www.marktechpost.com/2020/11/05/google-introduces-document-ai-docai-platform-for-automated-document-processing/

85.100 Machine Learning Projects-https://amankharwal.medium.com/100-machine-learning-projects-aff22b22dd6e

86.https://towardsdatascience.com/25-hot-new-data-tools-and-what-they-dont-do-31bf23bd8e56

87.Opacus: A high-speed library for training PyTorch models-https://ai.facebook.com/blog/introducing-opacus-a-high-speed-library-for-training-pytorch-models-with-differential-privacy

88.lazynlp https://github.com/chiphuyen/lazynlp

89.yfinance to get finance data

90.Pseudo-Labeling (deal with small datasets)https://towardsdatascience.com/pseudo-labeling-to-deal-with-small-datasets-what-why-how-fd6f903213af

91.Project List A - Comparatively Easy Wine Quality Analysis,Boston Housing Prediction,Spam Email Classification,Survival Prediction - Titanic Disaster,Stock Market Prediction Class of Flower Prediction,Bigmart Sales Prediction,Air Pollution Prediction,IMDB Prediction,Optimizing Product Price,Web Traffic Time Series Forecasting,Insurance Purchase Prediction,Tweet Classification

Project List B - Comparatively Difficult,Domain-Specific Chatbot,Fake News Detection,Human Action Recognition,Video Classification,Driver Drowsiness Detection,Medical Report Gen Using CT Scans,Sign Language Detection,Image Caption Generator,Celebrity Voice Prediction,Speech Emotion Recognition,Job Recommendation System,Interest Level in Rental Properties,Google Ads Keywords Generator

https://www.analyticsvidhya.com/blog/2018/05/24-ultimate-data-science-projects-to-boost-your-knowledge-and-skills/

https://medium.com/the-innovation/130-machine-learning-projects-solved-and-explained-605d188fb392

https://data-flair.training/blogs/machine-learning-datasets/# https://amankharwal.medium.com/20-machine-learning-projects-on-future-prediction-with-python-93932d9a7f7f

https://thecleverprogrammer.com/2020/06/01/work-on-data-science-projects/ https://thecleverprogrammer.com/2020/11/15/machine-learning-projects/

https://medium.com/coders-camp/20-deep-learning-projects-with-python-3c56f7e6a721 https://amankharwal.medium.com/12-machine-learning-projects-on-object-detection-46b32adc3c37

https://thecleverprogrammer.com/2020/11/15/machine-learning-projects/ https://amankharwal.medium.com/7-python-gui-projects-for-beginners-87ae2c695d78

https://amankharwal.medium.com/20-machine-learning-projects-for-portfolio-81e3dbd167b1 https://amankharwal.medium.com/4-chatbot-projects-with-python-5b32fd84af37

https://amankharwal.medium.com/30-python-projects-solved-and-explained-563fd7473003 https://www.aiquotient.app/projects

https://medium.com/coders-camp/20-machine-learning-projects-on-nlp-582effe73b9c

  1. Visual Programming (Orange) https://orange.biolab.si/

93.The Linux Command Handbook-https://www.freecodecamp.org/news/the-linux-commands-handbook/

94.130 Machine Learning Projects Solved and Explained-https://medium.com/the-innovation/130-machine-learning-projects-solved-and-explained-605d188fb392

95.DataBrew-do drag-and-drop data cleansing

96.stratascratch- https://www.stratascratch.com/

97.5 ways to celebrate TensorFlow's 5th birthday-https://blog.google/technology/ai/5-ways-celebrate-tensorflows-5th-birthday/

98.TensorFlow.js: Machine Learning in Javascript https://blog.tensorflow.org/2018/03/introducing-tensorflowjs-machine-learning-javascript.html

99.Language Interpretability Tool open-source platform for visualization and understanding of NLP models - https://pair-code.github.io/lit/

100.Deep Learning Hardware Guide https://towardsdatascience.com/another-deep-learning-hardware-guide-73a4c35d3e86

101.johnsnowlabs- https://nlp.johnsnowlabs.com/ https://nlp.johnsnowlabs.com/docs/en/quickstart https://nlp.johnsnowlabs.com/docs/en/licensed_release_notes

103.Edit a spreadsheet Generate Python https://trymito.io/?source=twitter1

104.Clarifai-https://www.clarifai.com/ https://analyticsindiamag.com/clarifai/

105.rapidly build and deploy machine learning models https://analyticsindiamag.com/top-10-datarobot-alternatives-one-must-know/

106.Hive Data full-stack AI https://thehive.ai/hive-data

107.real-time remote service to get the Keras callbacks to the telegram including the details of metrics https://github.com/ksdkamesh99/TensorGram

108.Language Interpretability Tool - https://pair-code.github.io/lit/demos/

109.Docly will handle the comments http://thedocly.io/

110.machine-learning-roadmap-2020 https://whimsical.com/machine-learning-roadmap-2020-CA7f3ykvXpnJ9Az32vYXva

111.Django models https://www.deploymachinelearning.com/#create-django-models https://www.deploymachinelearning.com/

112.freecodecamp - https://www.freecodecamp.org/learn

113.image_to_string (pytesseract)

Extract Tables in PDFs to pandas DataFrames - tabula-py

114.NLP Pipelines in a single line of code https://medium.com/analytics-vidhya/nlp-pipelines-in-a-single-line-of-code-500b3266ac7b

115.Best and Worst Cases of Machine-Learning Models https://medium.com/towards-artificial-intelligence/best-and-worst-cases-of-machine-learning-models-part-1-36cdb9296611

https://www.youtube.com/watch?v=mlumJPFvooQ&list=PLZoTAELRMXVM0zN0cgJrfT6TK2ypCpQdY

116.aitextgen #for ai text generation

117.http://introtodeeplearning.com/ http://cs231n.stanford.edu/ http://web.stanford.edu/class/cs224n/index.html#schedule https://www.youtube.com/playlist?list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A https://www.youtube.com/playlist?list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A https://www.youtube.com/playlist?list=PLwRJQ4m4UJjPiJP3691u-qWwPGVKzSlNP https://www.youtube.com/playlist?list=PLoROMvodv4rMC6zfYmnD7UG3LVvwaITY5

I will be so happy that this repository helps you. Thank you for reading.

                                                    HAPPY LEARNING

complete-life-cycle-of-a-data-science-project's People

Contributors

achuthasubhash avatar rsesha avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.