ankittaxak5717,Ankit Kumar,github

amazonfoodreviews

Amazon Fine Food Reviews Analysis Data Source: https://www.kaggle.com/snap/amazon-fine-food-reviews The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon. Number of reviews: 568,454 Number of users: 256,059 Number of products: 74,258 Timespan: Oct 1999 - Oct 2012 Number of Attributes/Columns in data: 10 Attribute Information: 1.Id 2 ProductId - unique identifier for the product 3 UserId - unqiue identifier for the user 4 ProfileName 5 HelpfulnessNumerator - number of users who found the review helpful 6 HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not 7 Score - rating between 1 and 5 8 Time - timestamp for the review 9 Summary - brief summary of the review 10 Text - text of the review Objective: Given a review, determine whether the review is positive (Rating of 4 or 5) or negative (rating of 1 or 2). [Q] How to determine if a review is positive or negative? [Ans] We could use the Score/Rating. A rating of 4 or 5 could be cosnidered a positive review. A review of 1 or 2 could be considered negative. A review of 3 is nuetral and ignored. This is an approximate and proxy way of determining the polarity (positivity/negativity) of a review.

ankittaxak5717.github.io

awesome-deep-learning

A curated list of awesome Deep Learning tutorials, projects and communities.

awesome-deep-learning-music

List of articles related to deep learning applied to music

awesome-machine-learning

A curated list of awesome Machine Learning frameworks, libraries and software.

awesome-rnn

Recurrent Neural Network - A curated list of resources dedicated to RNN

awesome_opensetrecognition_list

A curated list of papers & resources linked to open set recognition, out-of-distribution, open set domain adaptation and open world recognition

cancerdaignosis

Personalized cancer diagnosis 1. Business Problem 1.1. Description Source: https://www.kaggle.com/c/msk-redefining-cancer-treatment/ Data: Memorial Sloan Kettering Cancer Center (MSKCC) Download training_variants.zip and training_text.zip from Kaggle. Context: Source: https://www.kaggle.com/c/msk-redefining-cancer-treatment/discussion/35336#198462 Problem statement : Classify the given genetic variations/mutations based on evidence from text-based clinical literature. 1.2. Source/Useful Links https://www.forbes.com/sites/matthewherper/2017/06/03/a-new-cancer-drug-helped-almost-everyone-who-took-it-almost-heres-what-it-teaches-us/#2a44ee2f6b25 https://www.youtube.com/watch?v=UwbuW7oK8rk https://www.youtube.com/watch?v=qxXRKVompI8 No low-latency requirement. Interpretability is important. Errors can be very costly. Probability of a data-point belonging to each class is needed. 2. Machine Learning Problem Formulation 2.1. Data Source: https://www.kaggle.com/c/msk-redefining-cancer-treatment/data We have two data files: one conatins the information about the genetic mutations and the other contains the clinical evidence (text) that human experts/pathologists use to classify the genetic mutations. Both these data files are have a common column called ID Data file's information: training_variants (ID , Gene, Variations, Class) training_text (ID, Text) 2.1.2. Example Data Point training_variants ID,Gene,Variation,Class 0,FAM58A,Truncating Mutations,1 1,CBL,W802*,2 2,CBL,Q249E,2 ... training_text ID,Text

classical-piano-composer

crimedetection

It is a multi-label classification problem Multi-label Classification: Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document.There are crime charges(labels) for every article description in the dataset. Our tasks is find crime charges for future descriptions. Credit: http://scikit-learn.org/stable/modules/multiclass.html

deep-learning-for-tracking-and-detection

Collection of papers and other resources for object tracking and detection using deep learning

deep-learning-with-python

Example projects I completed to understand Deep Learning techniques with Tensorflow.

deep-metric-learning

CS231N: Project

deep_metric

Deep Metric Learning

deeplearning

Deep learning Assignment of coursera course by Andrew ng

docker-flow-proxy

Docker Flow Proxy

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

gan-yhat

GAN implementations

generative_adversarial_networks_live

habermancancerdatavisualisation

3. Plotting for Exploratory data analysis (EDA) (3.1) Haberman's Cancer Survival Dataset Dataset Description

lstm_imdb

Lstm_Imdb-Reviews prediction

magpie

Deep neural network framework for multi-label text classification

malwaredetection

Microsoft Malware detection 1.Business/Real-world Problem 1.1. What is Malware? The term malware is a contraction of malicious software. Put simply, malware is any piece of software that was written with the intent of doing harm to data, devices or to people. Source: https://www.avg.com/en/signal/what-is-malware 1.2. Problem Statement In the past few years, the malware industry has grown very rapidly that, the syndicates invest heavily in technologies to evade traditional protection, forcing the anti-malware groups/communities to build more robust softwares to detect and terminate these attacks. The major part of protecting a computer system from a malware attack is to identify whether a given piece of file/software is a malware. 1.3 Source/Useful Links Microsoft has been very active in building anti-malware products over the years and it runs it’s anti-malware utilities over 150 million computers around the world. This generates tens of millions of daily data points to be analyzed as potential malware. In order to be effective in analyzing and classifying such large amounts of data, we need to be able to group them into groups and identify their respective families. This dataset provided by Microsoft contains about 9 classes of malware. , Source: https://www.kaggle.com/c/malware-classification

models

Models and examples built with TensorFlow

multi-label-text-classification

Mutli-label text classification using ConvNet and graph embedding (Tensorflow implementation)

netflixmoviereviews

Netflix is all about connecting people to the movies they love. To help customers find those movies, they developed world-class movie recommendation system: CinematchSM. Its job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. Netflix use those predictions to make personal movie recommendations based on each customer’s unique tastes. And while Cinematch is doing pretty well, it can always be made better. Now there are a lot of interesting alternative approaches to how Cinematch works that netflix haven’t tried. Some are described in the literature, some aren’t. We’re curious whether any of these can beat Cinematch by making better predictions. Because, frankly, if there is a much better approach it could make a big difference to our customers and our business. Credits: https://www.netflixprize.com/rules.html

ankittaxak5717 Goto Github PK

Ankit Kumar's Projects

Recommend Projects

Recommend Topics

Recommend Org