Journey of Machine Learning and Deep Learning

Books & Resources	Status of Completion
1. Natural Language Processing with Python	✅
2. Practical Natural Language Processing
3. Fast AI NLP Course

Projects and Notebooks
1. Named Entity Recognition using spaCy

Day1 of MachineLearningDeepLearning

Natural Language Processing: It is a subfield of linguistics, computer science, and artificial intelligence concerned with interactions between computers and human language, in particulars how to program computers to process and analyze large amount of natural language data. In my journey of MachineLearning DeepLearning, I am brushing up on my skills of NLP. Today I got an overview of different text preprocessing processes such as Tokenization, Stopword removal, Stemming, Lemmatization and I implemented a few of them. I hope you will gain some insights and hope you will also spend time learning the topics. Excited about the days ahead!

Day2 of MachineLearningDeepLearning

Text Preprocessing: It is a crucial step that involves cleaning and transforming raw text data into a format that can be easily analyzed and understood by machine learning models. Today I have learned more about preprocessing steps like One Hot Encoding (OHE), Bag of Words, N-grams and implemented them in code. Here, I have shared the notes about text representation techniques in the snapshot and hope you will also spend time learning the topics. Excited about the days ahead!

Day3 of MachineLearningDeepLearning

Text Representation: It is the process of converting raw text data into structured format that can be analyzed and processed by machine learning algorithms and goal is to capture meaning and structure of text data in a way that enables the ML algorithm to make accurate predictions or classifications. Today I learned about preprocessing steps, One Hot Encoding (OHE), TF–IDF, Word Embedding,Word2Vec, CBOW, skip grams, also implemented them in code. And explored Gensim library. Here, I have shared the notes about text representation techniques in the snapshot and hope you will also spend time learning the topics. Excited about the days ahead!

Day4 of MachineLearningDeepLearning

Part of Speech Tagging: POS Tagging is the process of assigning grammatical information to words in a sentence, such as nouns, verbs, adjectives, adverbs, pronoun, prepositions, conjunction, and interjection. Its purpose is to analyze the text to understand the meaning of words and the relationships between them in given sentences or texts. Today I learned POS Tagging, Emission Probability, Transition Probability, Hidden Markov Models, Viterbi Algorithm and revised the previous topics which I have covered. I hope you will gain some insights and hope you will also spend time learning the topics. Excited about the days ahead!

Day5 of MachineLearningDeepLearning

Recurrent Neural Network: A recurrent neural network (RNN) is a type of artificial neural network that is designed to process sequential data, where the current input depends not only on the current time step but also on the previous inputs. RNNs can be used for a variety of tasks, including language modeling, speech recognition, and image captioning. Today I learned about types of RNN, Forward and Backward Propagation in RNN, LSTM RNN, and its architecture, and a few more topics. I hope you will gain some insights and hope you will also spend time learning the topics. Excited about the days ahead!

Day6 of MachineLearningDeepLearning

Sequence to Sequence Learning: A Seq2Seq is a type of neural network architecture used for tasks involving sequential data like machine translation, text summarization, etc. The major components of Seq2Seq are: Encoder and Decoder. Today I learned about Seq2Seq Learning, Encoder, Decoder, Problems with encoder & decoder, as well as it's solution. Also today I read two research paper. I hope you will gain some insights and hope you will also spend time learning the topics. Excited about the days ahead!
Paper:
- Sequence to Sequence Learning with Neural Networks
- NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

Day7 of MachineLearningDeepLearning

Attention is all you need" is a newer approach that has shown promising results in machine translation and other natural language processing tasks. It relies solely on self-attention mechanisms and does not use any recurrent or convolutional neural networks. This approach has several advantages, such as improved parallelism and reduced computation time, and has achieved state-of-the-art results on several benchmarks. Today, I acquired knowledge on Encoder, Decoder, Seq2Seq Learning, self-attention, embedding layer, and positional encoding. Additionally, I read the paper "Attention is All You Need", and although I struggled to comprehend most of it, I gained a theoretical understanding of its content. I am planning to apply this acquired knowledge in coding tomorrow. Excited about the days ahead!
Reference:
- The Illustrated Transformers

Day8 of MachineLearningDeepLearning

The Transformer model consists of an encoder and a decoder, both of which use multi-head self-attention layers and feedforward neural networks. The Transformer architecture relies exclusively on self-attention mechanisms to process input sequences and produce output sequences, without any recurrent or convolutional layers. Today I did revise all the previous topics which I have read. Here I have presented the implementation of Pytorch Transformers from Scratch. PS: I usually work with Tensorflow, so I'm not very familiar with PyTorch. However, today I enjoyed writing code with PyTorch. Excited about the days ahead!
Reference:
- Aladdin Persson

Day9 of MachineLearningDeepLearning

Natural Language Processing typically uses large bodies of linguistic data or corpora. A text corpus is a large body of text. In my MachineLearning DeepLearning journey, from today I have started reading Natural Language Processing with Python book where I learned the basics of NLP, and also I explored Gutenberg Corpus, Brown Corpus, Returs Corpus, Inaugural Address Corpus & Corpora in Other Languages & also explored NLTK library. Here, I have shown how to access corpora using NLTK in a simple way. I hope you will gain some insights and hope you will also spend time learning the topics. Excited about the days ahead!

Day10 of MachineLearningDeepLearning

The most important source of texts is undoubtedly the Web. It's convenient to have existing text collections to explore. On my journey of MachineLearningDeepLearning, Today I have learned about WordNet, The WordNet Hierarchy, lexical relations, semantic similarity, and ways to process the raw text (dealing with HTML, Processing Search Engine Results, Processing RSS Feeds). Here, I have presented the ways to process the raw text from the web. I hope you will gain some insights and hope you will also spend time learning the topics. Excited about the days ahead!

Day11 of MachineLearningDeepLearning

Text in files will be in a particular encoding, so we need some mechanism for translating it into Unicode translation into Unicode is called decoding. Conversely, to write out Unicode to a file or a terminal, we need to translate it into a suitable encoding. On my journey of MachineLearningDeepLearning today I learned about Text Processing with Unicode, Regular Expression for Detecting Words Patterns ( Basic Meta Characters, Finding Word Stems). Here, I have presented how to use regular expressions to identify patterns in a snapshot. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead!

Day12 of MachineLearningDeepLearning

Tokenization is the segmentation of a text into basic units or tokens such as words and punctuation. Tokenization based on whitespace is inadequate for many applications because it bundles punctuation together with words. NLTK provides an off-the-shelf tokenizer nltk.word_tokenize().
Lemmatization is a process that maps the various forms of a word (such as appeared, appears) to the canonical or citation form of the word, also known as the lexeme or lemma (e.g., appear).
Regular expressions are a powerful and flexible method of specifying patterns. On my journey of MachineLearningDeepLearning, Today I continued exploring Regular Expressions like Extracting Word Pieces, Finding Word Stems, Searching for Tokenized Text, and Normalizing Text (Stemming and Lemmatization). Also, I read about Tokenization i.e. an instance of a more general problem of segmentation. Here, I have presented the implementation of Tokenization, Stemming, and more in the below snapshot. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead!

Day13 of MachineLearningDeepLearning

Part of Speech Tagging: The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS tagging or simply tagging. Parts of speech are also known as word classes or lexical categories. The collection of tags used for a particular task is known as a tag set. A part-of-speech tagger processes a sequence of words and attaches a part-of-speech tag to each word. On my journey of MachineLearningDeepLearning, Today I have learned about ways of Automatic Tagging like Default Tagger, Regular Expression Tagger, The Lookup Tagger, & N-Gram Tagging like Unigram Tagging and Bigram Tagging also explored Tagged Corpora. Here, I have presented the implementation of Default Tagger, Regular Expression Tagger, Lookup Tagger as well as Unigram and Bigram Tagger in the below snapshot. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead!

Day14 of MachineLearningDeepLearning

Supervised Classification: Classification is the task of choosing the correct class label for a given input. In basic classification tasks, each input is considered in isolation from all other inputs, and the set of labels is defined in advance. A classifier is called supervised if it is built based on training corpora containing the correct label for each input. On my journey of MachineLearningDeepLearning, Today I learned about classification, ways of choosing the right features, Overfitting & Underfitting, Document Classification Classification, Naive Bayes Classification, and many more. Here, I have presented the implementation of documentation classification & sentence segmentation in the below snapshot. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead!

Day15 of MachineLearningDeepLearning

In Natural Language Processing, a pipeline refers to a sequence of processing steps that transform raw text input into a form that is useful for a specific task, such as sentiment analysis, text classification, or named entity recognition. The NLP pipeline typically involves several stages, including tokenization, part-of-speech (POS) tagging, parsing, semantic analysis, and machine learning or deep learning algorithms. On my journey of MachineLearningDeepLearning, Today I started reading the book Practical Natural Language Processing where I learned the generic pipeline for data-driven NLP system development. I explored the ways of Data Acquisition, Text Extraction and Cleanup, HTML Parsing and Cleanup, Unicode Normalization, Spelling Correction, and System-Specific Error Correction. Here, I have presented the implementation of HTML Parsing and Cleanup, Unicode Normalization, and a few more in the below snapshot. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead!

Day16 of MachineLearningDeepLearning

Feature Engineering: Feature engineering, also known as feature extraction, encompasses a variety of techniques aimed at achieving this objective. The aim of this process is to transform the textual attributes into a numerical vector that can be comprehended by machine learning algorithms. The two different approaches are taken in practice for feature engineering in a) a classical NLP and traditional ML pipeline and b) a DL pipeline. On my journey of MachineLearningDeepLearning, Today I completed reading chapter 2 from the book Practical Natural Language Processing where I learned about word tokenization, stemming and lemmatization, Code mixing, and transliteration. Also explored feature engineering for classical NLP and Deep Learning Based NLP. Here, I have presented the implementation of word tokenization, the removal of stop words and digits, and a few more. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead!

Day17 of MachineLearningDeepLearning

Text Classification: Text classification is the task of assigning one or more categories to a given piece of text from a larger set of possible categories. It can be used to organize, structure, and categorize text data from various sources, such as emails, documents, social media, etc... Some common applications of text classification are sentiment analysis, topic labeling, spam detection, and intent detection. On my journey of MachineLearningDeepLearning, Today I learned about The Pipeline for Building Text Classifications Systems. Here, I have presented the implementation of Text Classification in the Economic News Dataset and applied some pre-processing techniques and applied CountVectorizer for transforming text documents into a matrix of tokens and implemented Naive Bayes, Logistic Regression, and Support Vector Machines algorithm in the given snapshot. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead!
```
One typically follows these steps when building a text classification system:
1. Collect or create a labeled dataset suitable for the task.
2. Split the dataset into two (training and test) or three parts: training, validation
(i.e., development), and test sets, then decide on evaluation metric(s).
3. Transform raw text into feature vectors.
4. Train a classifier using the feature vectors and the corresponding labels from the
training set.
5. Using the evaluation metric(s) from Step 2, benchmark the model performance
on the test set.
6. Deploy the model to serve the real-world use case and monitor its performance.
```

Day18 of MachineLearningDeepLearning

Information Extraction: Information Extraction refers to the NLP task of extracting relevant information from text documents. It is used in a wide range of real-world applications, from news articles to social media, and even receipts. The overarching goal of IE is to extract ‘knowledge’ from the text, and each of these tasks provides different information to do that. IE tasks require deeper NLP pre-processing followed by models developed for those specific tasks. IE tasks are typically evaluated in terms of precision, recall, and F1 scores using standard evaluation sets. Today, as I continue my journey of MachineLearningDeep Learning, I explored into the topic of Information Extraction.

Day19 of MachineLearningDeepLearning

Named Entity Recognition: Named Entity Recognition is a sub-task of information extraction. It deals with finding and classifying named entities mentioned in unstructured text. These entities are classified into predefined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary, values, and percentages. NER is used in a variety of applications, such as information retrieval, question answering, and text summarization, among others. On my journey of Machine Learning and Deep Learning, Today I learned about Named Entity Recognition with spaCy using Python and Language Processing Pipelines using spaCy. I have presented the implementation of Named Entity Recognition in a document using the spacy large model in the below snapshot. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead!

Day20 of MachineLearningDeepLearning

Topic Modeling: Topic modeling is a technique that’s used to address the problem of finding latent topics in a large collection of documents. It involves identifying the underlying themes or concepts that pervade a collection of text and grouping them into categories. This can be useful for tasks like document classification, information retrieval, and recommendation systems. Today, I learned about Topic Modeling, Single Value Decomposition, Non-negative Matrix Factorization, Stemming, and Lemmatization. I have presented the implementation of Topic Modeling below in the snapshot. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead.

Day21 of MachineLearningDeepLearning

Pretraining RoBERTa Model: The initial BERT models brought innovative features to the initial transformer models, whereas RoBERTa increases the performance of transformers for downstream tasks by improving the mechanics of the pretraining process. KantaiBERT is a Robustly Optimized BERT Pretraining Approach (RoBERTa)-like model based on the architecture of BERT. Today, I learned about RoBERTa Model and tried to build KantaiBERT Model from Scratch, exploring Transformer. While pre-training the model, I used KantaiBERT to train a tokenizer on the dataset then, saved the tokenizer and created a customized dataset, trained the RoBERTa model, and saved it. I have presented the implementation of Pretraining the KantaiBERT Model from scratch below in the snapshot. I hope you will gain some insights and hope you will spend time learning the topics. Excited about the days ahead.

regmi-saugat / machinelearning_deeplearning Goto Github PK

machinelearning_deeplearning's Introduction

Journey of Machine Learning and Deep Learning

machinelearning_deeplearning's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent