Code Monkey home page Code Monkey logo

Ankit Kumar's Projects

amazonfoodreviews icon amazonfoodreviews

Amazon Fine Food Reviews Analysis Data Source: https://www.kaggle.com/snap/amazon-fine-food-reviews The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon. Number of reviews: 568,454 Number of users: 256,059 Number of products: 74,258 Timespan: Oct 1999 - Oct 2012 Number of Attributes/Columns in data: 10 Attribute Information: 1.Id 2 ProductId - unique identifier for the product 3 UserId - unqiue identifier for the user 4 ProfileName 5 HelpfulnessNumerator - number of users who found the review helpful 6 HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not 7 Score - rating between 1 and 5 8 Time - timestamp for the review 9 Summary - brief summary of the review 10 Text - text of the review Objective: Given a review, determine whether the review is positive (Rating of 4 or 5) or negative (rating of 1 or 2). [Q] How to determine if a review is positive or negative? [Ans] We could use the Score/Rating. A rating of 4 or 5 could be cosnidered a positive review. A review of 1 or 2 could be considered negative. A review of 3 is nuetral and ignored. This is an approximate and proxy way of determining the polarity (positivity/negativity) of a review.

awesome-rnn icon awesome-rnn

Recurrent Neural Network - A curated list of resources dedicated to RNN

awesome_opensetrecognition_list icon awesome_opensetrecognition_list

A curated list of papers & resources linked to open set recognition, out-of-distribution, open set domain adaptation and open world recognition

cancerdaignosis icon cancerdaignosis

Personalized cancer diagnosis 1. Business Problem 1.1. Description Source: https://www.kaggle.com/c/msk-redefining-cancer-treatment/ Data: Memorial Sloan Kettering Cancer Center (MSKCC) Download training_variants.zip and training_text.zip from Kaggle. Context: Source: https://www.kaggle.com/c/msk-redefining-cancer-treatment/discussion/35336#198462 Problem statement : Classify the given genetic variations/mutations based on evidence from text-based clinical literature. 1.2. Source/Useful Links https://www.forbes.com/sites/matthewherper/2017/06/03/a-new-cancer-drug-helped-almost-everyone-who-took-it-almost-heres-what-it-teaches-us/#2a44ee2f6b25 https://www.youtube.com/watch?v=UwbuW7oK8rk https://www.youtube.com/watch?v=qxXRKVompI8 No low-latency requirement. Interpretability is important. Errors can be very costly. Probability of a data-point belonging to each class is needed. 2. Machine Learning Problem Formulation 2.1. Data Source: https://www.kaggle.com/c/msk-redefining-cancer-treatment/data We have two data files: one conatins the information about the genetic mutations and the other contains the clinical evidence (text) that human experts/pathologists use to classify the genetic mutations. Both these data files are have a common column called ID Data file's information: training_variants (ID , Gene, Variations, Class) training_text (ID, Text) 2.1.2. Example Data Point training_variants ID,Gene,Variation,Class 0,FAM58A,Truncating Mutations,1 1,CBL,W802*,2 2,CBL,Q249E,2 ... training_text ID,Text

crimedetection icon crimedetection

It is a multi-label classification problem Multi-label Classification: Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document.There are crime charges(labels) for every article description in the dataset. Our tasks is find crime charges for future descriptions. Credit: http://scikit-learn.org/stable/modules/multiclass.html

deeplearning icon deeplearning

Deep learning Assignment of coursera course by Andrew ng

fairseq icon fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

magpie icon magpie

Deep neural network framework for multi-label text classification

malwaredetection icon malwaredetection

Microsoft Malware detection 1.Business/Real-world Problem 1.1. What is Malware? The term malware is a contraction of malicious software. Put simply, malware is any piece of software that was written with the intent of doing harm to data, devices or to people. Source: https://www.avg.com/en/signal/what-is-malware 1.2. Problem Statement In the past few years, the malware industry has grown very rapidly that, the syndicates invest heavily in technologies to evade traditional protection, forcing the anti-malware groups/communities to build more robust softwares to detect and terminate these attacks. The major part of protecting a computer system from a malware attack is to identify whether a given piece of file/software is a malware. 1.3 Source/Useful Links Microsoft has been very active in building anti-malware products over the years and it runs it’s anti-malware utilities over 150 million computers around the world. This generates tens of millions of daily data points to be analyzed as potential malware. In order to be effective in analyzing and classifying such large amounts of data, we need to be able to group them into groups and identify their respective families. This dataset provided by Microsoft contains about 9 classes of malware. , Source: https://www.kaggle.com/c/malware-classification

models icon models

Models and examples built with TensorFlow

netflixmoviereviews icon netflixmoviereviews

Netflix is all about connecting people to the movies they love. To help customers find those movies, they developed world-class movie recommendation system: CinematchSM. Its job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. Netflix use those predictions to make personal movie recommendations based on each customer’s unique tastes. And while Cinematch is doing pretty well, it can always be made better. Now there are a lot of interesting alternative approaches to how Cinematch works that netflix haven’t tried. Some are described in the literature, some aren’t. We’re curious whether any of these can beat Cinematch by making better predictions. Because, frankly, if there is a much better approach it could make a big difference to our customers and our business. Credits: https://www.netflixprize.com/rules.html

nmt icon nmt

TensorFlow Neural Machine Translation Tutorial

pytorch-gan icon pytorch-gan

PyTorch implementations of Generative Adversarial Networks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.