ankittaxak5717 Goto Github PK
Name: Ankit Kumar
Type: User
Location: kolkata,India
Name: Ankit Kumar
Type: User
Location: kolkata,India
Amazon Fine Food Reviews Analysis Data Source: https://www.kaggle.com/snap/amazon-fine-food-reviews The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon. Number of reviews: 568,454 Number of users: 256,059 Number of products: 74,258 Timespan: Oct 1999 - Oct 2012 Number of Attributes/Columns in data: 10 Attribute Information: 1.Id 2 ProductId - unique identifier for the product 3 UserId - unqiue identifier for the user 4 ProfileName 5 HelpfulnessNumerator - number of users who found the review helpful 6 HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not 7 Score - rating between 1 and 5 8 Time - timestamp for the review 9 Summary - brief summary of the review 10 Text - text of the review Objective: Given a review, determine whether the review is positive (Rating of 4 or 5) or negative (rating of 1 or 2). [Q] How to determine if a review is positive or negative? [Ans] We could use the Score/Rating. A rating of 4 or 5 could be cosnidered a positive review. A review of 1 or 2 could be considered negative. A review of 3 is nuetral and ignored. This is an approximate and proxy way of determining the polarity (positivity/negativity) of a review.
A curated list of awesome Deep Learning tutorials, projects and communities.
List of articles related to deep learning applied to music
A curated list of awesome Machine Learning frameworks, libraries and software.
Recurrent Neural Network - A curated list of resources dedicated to RNN
A curated list of papers & resources linked to open set recognition, out-of-distribution, open set domain adaptation and open world recognition
Personalized cancer diagnosis 1. Business Problem 1.1. Description Source: https://www.kaggle.com/c/msk-redefining-cancer-treatment/ Data: Memorial Sloan Kettering Cancer Center (MSKCC) Download training_variants.zip and training_text.zip from Kaggle. Context: Source: https://www.kaggle.com/c/msk-redefining-cancer-treatment/discussion/35336#198462 Problem statement : Classify the given genetic variations/mutations based on evidence from text-based clinical literature. 1.2. Source/Useful Links https://www.forbes.com/sites/matthewherper/2017/06/03/a-new-cancer-drug-helped-almost-everyone-who-took-it-almost-heres-what-it-teaches-us/#2a44ee2f6b25 https://www.youtube.com/watch?v=UwbuW7oK8rk https://www.youtube.com/watch?v=qxXRKVompI8 No low-latency requirement. Interpretability is important. Errors can be very costly. Probability of a data-point belonging to each class is needed. 2. Machine Learning Problem Formulation 2.1. Data Source: https://www.kaggle.com/c/msk-redefining-cancer-treatment/data We have two data files: one conatins the information about the genetic mutations and the other contains the clinical evidence (text) that human experts/pathologists use to classify the genetic mutations. Both these data files are have a common column called ID Data file's information: training_variants (ID , Gene, Variations, Class) training_text (ID, Text) 2.1.2. Example Data Point training_variants ID,Gene,Variation,Class 0,FAM58A,Truncating Mutations,1 1,CBL,W802*,2 2,CBL,Q249E,2 ... training_text ID,Text
It is a multi-label classification problem Multi-label Classification: Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document.There are crime charges(labels) for every article description in the dataset. Our tasks is find crime charges for future descriptions. Credit: http://scikit-learn.org/stable/modules/multiclass.html
Collection of papers and other resources for object tracking and detection using deep learning
Example projects I completed to understand Deep Learning techniques with Tensorflow.
CS231N: Project
Deep Metric Learning
Deep learning Assignment of coursera course by Andrew ng
Docker Flow Proxy
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
GAN implementations
3. Plotting for Exploratory data analysis (EDA) (3.1) Haberman's Cancer Survival Dataset Dataset Description
Lstm_Imdb-Reviews prediction
Deep neural network framework for multi-label text classification
Microsoft Malware detection 1.Business/Real-world Problem 1.1. What is Malware? The term malware is a contraction of malicious software. Put simply, malware is any piece of software that was written with the intent of doing harm to data, devices or to people. Source: https://www.avg.com/en/signal/what-is-malware 1.2. Problem Statement In the past few years, the malware industry has grown very rapidly that, the syndicates invest heavily in technologies to evade traditional protection, forcing the anti-malware groups/communities to build more robust softwares to detect and terminate these attacks. The major part of protecting a computer system from a malware attack is to identify whether a given piece of file/software is a malware. 1.3 Source/Useful Links Microsoft has been very active in building anti-malware products over the years and it runs it’s anti-malware utilities over 150 million computers around the world. This generates tens of millions of daily data points to be analyzed as potential malware. In order to be effective in analyzing and classifying such large amounts of data, we need to be able to group them into groups and identify their respective families. This dataset provided by Microsoft contains about 9 classes of malware. , Source: https://www.kaggle.com/c/malware-classification
Models and examples built with TensorFlow
Mutli-label text classification using ConvNet and graph embedding (Tensorflow implementation)
Netflix is all about connecting people to the movies they love. To help customers find those movies, they developed world-class movie recommendation system: CinematchSM. Its job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. Netflix use those predictions to make personal movie recommendations based on each customer’s unique tastes. And while Cinematch is doing pretty well, it can always be made better. Now there are a lot of interesting alternative approaches to how Cinematch works that netflix haven’t tried. Some are described in the literature, some aren’t. We’re curious whether any of these can beat Cinematch by making better predictions. Because, frankly, if there is a much better approach it could make a big difference to our customers and our business. Credits: https://www.netflixprize.com/rules.html
TensorFlow Neural Machine Translation Tutorial
OpenCV 3 with Python 3 2018
PyTorch implementations of Generative Adversarial Networks.
PyTorch Tutorial for Deep Learning Researchers
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.