Code Monkey home page Code Monkey logo

applied-ml's Introduction

applied-ml

Curated papers, articles, and blogs on data science & machine learning in production. ⚙️

Some people collect stamps. I collect these. 😅

made With Love contributions welcome

Have a new ML project and need a starting point? These resources on ML applied in production share:

  • How the problem is framed 🔎(e.g., personalization as recsys vs. search vs. sequences)
  • What machine learning techniques worked ✅ (and sometimes, what didn't ❌)
  • Why it works, the science behind it with research, literature, and references 📂
  • What real-world results were achieved (so you can better assess ROI ⏰💰📈)

Table of Contents

  1. Data Quality
  2. Data Engineering
  3. Classification
  4. Regression
  5. Recommendation
  6. Search/Ranking
  7. Natural Language Processing
  8. Sequence Modelling
  9. Forecasting
  10. Computer Vision
  11. Reinforcement Learning
  12. Anomaly Detection
  13. Graph
  14. Optimization
  15. Information Extraction
  16. Weak Supervision
  17. Validation and A/B Testing
  18. Practices
  19. Failures

Data Quality

  1. Monitoring Data Quality at Scale with Statistical Modeling Uber
  2. An Approach to Data Quality for Netflix Personalization Systems Netflix
  3. Automating Large-Scale Data Quality Verification Amazon
  4. Meet Hodor — Gojek’s Upstream Data Quality Tool Gojek
  5. Reliable and Scalable Data Ingestion at Airbnb Airbnb

Data Engineering

  1. Zipline: Airbnb’s Machine Learning Data Management Platform Airbnb
  2. Sputnik: Airbnb’s Apache Spark Framework for Data Engineering Airbnb
  3. Feast: Bridging ML Models and Data Gojek
  4. Open Sourcing Amundsen: A Data Discovery And Metadata Platform Lyft
  5. Making Netflix’s Data Infrastructure Cost-Effective Netflix
  6. Shopify's Data Science & Engineering Foundations Shopify
  7. Metacat: Making Big Data Discoverable and Meaningful at Netflix Netflix

Classification

  1. High-Precision Phrase-Based Document Classification on a Modern Scale LinkedIn
  2. Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing WalmartLabs
  3. Large-scale Item Categorization for e-Commerce DianPing, eBay
  4. Categorizing Products at Scale Shopify
  5. Learning to Diagnose with LSTM Recurrent Neural Networks Google
  6. Discovering and Classifying In-app Message Intent at Airbnb Airbnb
  7. How We Built the Good First Issues Feature GitHub
  8. Testing Firefox More Efficiently with Machine Learning Mozilla
  9. Prediction of Advertiser Churn for Google AdWords Google
  10. Teaching machines to triage Firefox bugs Mozilla

Regression

  1. Using Machine Learning to Predict Value of Homes On Airbnb Airbnb
  2. Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions Better
  3. Using Machine Learning to Predict the Value of Ad Requests Twitter

Recommendation

  1. Amazon.com Recommendations: Item-toItem Collaborative Filtering Amazon
  2. Recommending Complementary Products in E-Commerce Push Notifications with Mixture Models Alibaba
  3. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba Alibaba
  4. Session-based Recommendations with Recurrent Neural Networks Telefonica
  5. How 20th Century Fox uses ML to predict a movie audience (paper) 20th Century Fox
  6. Deep Neural Networks for YouTube Recommendations YouTube
  7. Personalized Recommendations for Experiences Using Deep Learning TripAdvisor
  8. E-commerce in Your Inbox: Product Recommendations at Scale Yahoo
  9. Product Recommendations at Scale Yahoo
  10. Powered by AI: Instagram’s Explore recommender system Facebook
  11. Artwork Personalization at Netflix Netflix
  12. To Be Continued: Helping you find shows to continue watching on Netflix Netflix
  13. Learning a Personalized Homepage Netflix
  14. Calibrated Recommendations Netflix
  15. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Uber
  16. How Music Recommendation Works — And Doesn’t Work Spotify
  17. Music recommendation at Spotify Spotify
  18. Recommending Music on Spotify with Deep Learning Spotify
  19. For Your Ears Only: Personalizing Spotify Home with Machine Learning Spotify
  20. Reach for the Top: How Spotify Built Shortcuts in Just Six Months Spotify
  21. The Evolution of Kit: Automating Marketing Using Machine Learning Shopify
  22. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits Spotify
  23. Using Machine Learning to Predict what File you Need Next (Part 1) Dropbox
  24. Using Machine Learning to Predict what File you Need Next (Part 2) Dropbox
  25. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1) LinkedIn
  26. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2) LinkedIn
  27. A recommender system in 30 lines of Clojure Findka
  28. How TikTok recommends videos #ForYou ByteDance

Search/Ranking

  1. Amazon Search: The Joy of Ranking Products Amazon
  2. How Lazada Ranks Products to Improve Customer Experience and Conversion Lazada
  3. Using Deep Learning at Scale in Twitter’s Timelines Twitter
  4. Machine Learning-Powered Search Ranking of Airbnb Experiences Airbnb
  5. Applying Deep Learning To Airbnb Search Airbnb
  6. Managing Diversity in Airbnb Search Airbnb
  7. Ranking Relevance in Yahoo Search Yahoo
  8. An Ensemble-Based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy Etsy
  9. Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? Amazon
  10. The AI Behind LinkedIn Recruiter search and recommendation systems LinkedIn
  11. AI at Scale in Bing Microsoft
  12. Query Understanding Engine in Traveloka Universal Search Traveloka
  13. Search-based User Interest Modeling with Lifelong Sequential Behavior Data for CTR Prediction Alibaba
  14. The Secret Sauce Behind Search Personalisation GoJek

Embeddings

  1. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba Alibaba
  2. Embeddings@Twitter Twitter
  3. Listing Embeddings in Search Ranking (Paper) Airbnb
  4. Understanding Latent Style Stitch Fix

Natural Language Processing

  1. Abusive Language Detection in Online User Content Yahoo
  2. How natural language processing helps LinkedIn members get support easily LinkedIn
  3. Building Smart Replies for Member Messages LinkedIn
  4. Smart Reply: Automated Response Suggestion for Email Google
  5. SmartReply for YouTube Creators Google
  6. Using Neural Networks to Find Answers in Tables Google
  7. A Scalable Approach to Reducing Gender Bias in Google Translate Google
  8. Assistive AI Makes Replying Easier Microsoft
  9. AI Advances to Better Detect Hate Speech Facebook
  10. A State-of-the-Art Open Source Chatbot Facebook
  11. A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs Facebook
  12. Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting Amazon
  13. How Gojek Uses NLP to Name Pickup Locations at Scale GoJek
  14. Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want Stitch Fix
  15. The State-of-the-art Open-Domain Chatbot in Chinese and English Baidu

Sequence Modelling

  1. Recommending Complementary Products in E-Commerce Push Notifications with Mixture Models Alibaba
  2. Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction Alibaba
  3. Learning to Diagnose with LSTM Recurrent Neural Networks Google
  4. Deep Learning for Understanding Consumer Histories Zalando
  5. Continual Prediction of Notification Attendance with Classical and Deep Network Approaches Telefonica

Forecasting

  1. Forecasting at Uber: An Introduction Uber
  2. Engineering Extreme Event Forecasting at Uber with RNN Uber
  3. Under the Hood of Gojek’s Automated Forecasting Tool GoJek

Computer Vision

  1. Categorizing Listing Photos at Airbnb Airbnb
  2. Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb Airbnb
  3. Powered by AI: Advancing product understanding and building new shopping experiences Facebook
  4. Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning Dropbox
  5. How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors Deepomatic
  6. A Neural Weather Model for Eight-Hour Precipitation Forecasting Google
  7. Converting text to images for product discovery Amazon
  8. How Disney uses PyTorch for animated character recognition Disney
  9. Machine Learning-based Damage Assessment for Disaster Relief Google
  10. RepNet: Counting Repetitions in Videos Google

Reinforcement Learning

  1. Deep Reinforcement Learning for Sponsored Search Real-time Bidding Alibaba
  2. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning Alibaba
  3. Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising Alibaba
  4. Productionizing Deep Reinforcement Learning with Spark and MLflow Zynga
  5. Building AI Trading Systems Denny Britz

Anomaly Detection

  1. Detecting Performance Anomalies in External Firmware Deployments Netflix
  2. Detecting and preventing abuse on LinkedIn using isolation forests LinkedIn
  3. Uncovering Insurance Fraud Conspiracy with Network Learning Ant Financial
  4. How does spam protection work on Stack Exchange? Stack Exchange

Graph

  1. Retail Graph — Walmart’s Product Knowledge Graph Walmart
  2. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Uber
  3. AliGraph: A Comprehensive Graph Neural Network Platform Alibaba

Optimization

  1. How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats Uber
  2. Next-Generation Optimization for Dasher Dispatch at DoorDash DoorDash
  3. Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3) Lyft
  4. The Data and Science behind GrabShare Carpooling Grab

Information Extraction

  1. Unsupervised Extraction of Attributes and Their Values from Product Description Rakuten
  2. Information Extraction from Receipts with Graph Convolutional Networks Nanonets
  3. Using Machine Learning to Index Text from Billions of Images Dropbox
  4. Extracting Structured Data from Templatic Documents Google

Weak Supervision

  1. Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale Google
  2. Osprey: Weak Supervision of Imbalanced Extraction Problems without Code Intel
  3. Overton: A Data System for Monitoring and Improving Machine-Learned Products Apple
  4. Bootstrapping Conversational Agents with Weak Supervision IBM

Validation and A/B Testing

  1. The Reusable Holdout: Preserving Validity in Adaptive Data Analysis Google
  2. A/B Testing with Hierarchical Models in Python Domino
  3. Detecting Interference: An A/B Test of A/B Tests LinkedIn
  4. Building Inclusive Products Through A/B Testing LinkedIn
  5. Experimenting to solve cramming Twitter
  6. Announcing a New Framework for Designing Optimal Experiments with Pyro Uber
  7. Enabling 10x More Experiments with Traveloka Experiment Platform Traveloka
  8. Large scale experimentation at StitchFix (Paper) Stitch Fix

Practices

  1. Practical Recommendations for Gradient-Based Training of Deep Architectures Yoshua Bengio
  2. Machine Learning: The High Interest Credit Card of Technical Debt Google
  3. Rules of Machine Learning: Best Practices for ML Engineering Google
  4. Hidden Technical Debt in Machine Learning Systems Google
  5. On Challenges in Machine Learning Model Management Amazon
  6. Machine Learning in production: the Booking.com approach Booking
  7. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com Booking
  8. Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department Stitch Fix

Failures

  1. 160k+ High School Students Will Graduate Only If a Model Allows Them to International Baccalaureate
  2. When It Comes to Gorillas, Google Photos Remains Blind Google

HitCount

applied-ml's People

Contributors

eugeneyan avatar gczh avatar marco-c avatar nilesh-patil avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.