Code Monkey home page Code Monkey logo

ai-platform's Introduction

AI Platform

Background

Artificial intelligence (AI) gig economy is here. Many companies including Uber, DoorDash, Fiverr built economies by relying on contingent workers, available on-demand. The great paradox of on-demand services is that they can be easily automated using AI technologies.

  • Where do AI systems come from? Research and development (R&D)
  • What do AI systems do today? Perform tasks (i.e. provide services)
  • What will AI systems be able to do? Automate human tasks (and more)
  • What is R&D? Tasks to automate
  • Where does AI R&D automation lead? Recursive technology improvement
  • Where does that lead? AI services, which includes the service of developing new services

"The AI explosion will be enabled by AI-driven AI developments. As AI technologies are being recursively improved, new AI services and new AI tasks will be solved."

-- Slava Kurilyak, Founder / CEO at Produvia

About

ai-platform

AI Platform aims to automate AI R&D tasks. Our vision is to create machine learning models to solve various computer science tasks. Our mission is to achieve automation of AI technologies.

"We need to standarize AI solutions by focusing on solving AI R&D tasks while developing machine learning models that are reusable and easily accessible by all."

-- Slava Kurilyak, Founder / CEO at Produvia

What is our approach?

We are developing service-centered or task-focused machine learning models. These models, or AI services, solve distinct tasks or functions.

Examples of AI tasks include:

  • semantic segmentation (computer visions)
  • machine translation (natural language processing)
  • word embeddings (methodology)
  • recommendation systems (miscellaneous)
  • speech recognition (speech)
  • atari games (playing games)
  • link prediction (graphs)
  • time series classification (time series)
  • audio generation (audio)
  • visual odometry (robots)
  • music information retrieval (music)
  • dimensionality reduction (computer code)
  • decision making (reasoning)
  • knowledge graphs (knowledge base)
  • adversarial attack (adversarial)

Proposed Folder Structure:

We store machine learning models using the following folder structure:

/tasks/area/task

For example:

/tasks/computer-vision/image-classification

Installation

To get the entire project up and running locally:

Clone the repo:

$ git clone https://github.com/produvia/ai-platform.git
$ cd ai-platform

Running Locally

This is an example of how to run object detection on the picture containing zebra:

$ cd tasks/computer-vision/object-detection/
$ mlflow run . -P photo_name=zebra.jpg

Project Components

For more information, check out documentation for the different services:

  • /tasks - for compilation of AI tasks

Supported Programming Languages

AI Platform supports various programming languages:

Python R Java

Supported ML Development Frameworks

AI Platform supports various machine learning frameworks and libraries:

Tensorflow Pytorch Keras Apache Spark Scikit Learn H2O

Supported ML Deployment Frameworks

AI Platform supports various machine learning deployment frameworks and libraries:

Conda Docker MLeap SageMaker Azure ML Google Cloud

Common Questions

What is the difference between AI Platform and MLflow?

AI Platform is an open source platform for automating machine learning models (aka tasks) while MLflow is open source platform for the machine learning lifecycle. AI Platform focuses on automating tasks while MLflow focuses on managing and deploying models. AI Platform focuses on automating model building. MLflow focuses on automating post-modeling. AI Platform is dependent on MLflow. MLflow is not dependent on AI Platform.

AI Platform:

AI Platform Tasks: Automate computer science tasks using AI, machine learning, deep learning and data science.

MLflow:

MLflow Tracking: Log parameters, code, and results in machine learning experiments and compare them using an interactive UI.

MLflow Projects: A code packaging format for reproducible runs using Conda and Docker, so you can share your ML code with others.

MLflow Models: A model packaging format and tools that let you easily deploy the same model (from any ML library) to batch and real-time scoring on platforms such as Docker, Apache Spark, Azure ML and AWS SageMaker.

What is the difference between AI Platform and AutoML?

AI Platform aims to make machine learning models reusable and easily accessible by all.

Automated machine learning (autoML) is a computer science methodology which aims to make machine learning more accessible. AutoML automatically generates data analysis pipelines, performs neural architecture search, hyperparameter optimization and automated feature engineering.

What is the difference between AI tasks and human tasks?

AI tasks are computer tasks solved using AI and machine learning. Unautomated AI R&D tasks are human tasks. Advances in AI will enable incremental speedup and automation of human tasks.

Datasets

Uploading Your Own Dataset

Do you have your own data? We recommend that you upload your own dataset onto a public bucket via AWS S3, or Google Drive and include a LICENSE file which describes usage rights (commercial or non-commercial).

Finding an Existing Dataset

Are you looking for a dataset? We recommend that you check out DataSetList.com which includes the biggest machine learning datasets for computer vision (CV), natural language processing (NLP), question answer (QA), audio, and medical industries. You can also search for particular datasets using Google Dataset Search.

Contributing

Code

Do you want to contribute towards AI Platform?

  1. Fork this repo (how to guide).
  2. Develop a new machine learning model or improve an existing one. Review existing tasks and outstanding issues and pull requests.
  3. Integrate MLflow into your code.
  4. Create a pull request from your fork (how to guide).

Ideas

Do you have an idea, feature request, or task request for AI Platform?

  1. Create a new issue
  2. Email the Produvia team at [email protected]

Project Dependencies

ai-platform-mlflow

  • At AI Platform, we use MLflow for model tracking and model deployment. MLflow is an open source platform for machine learning lifecycle. Familiarize yourself with MLflow by going through the following resources:
    1. Review MLflow examples hosted on Github.
    2. Watch MLflow intro video on YouTube by Matei Zaharia, Co-founder and Chief Technologist at Databricks.
    3. Watch MLflow videos on YouTube

About Produvia

Since 2013, Produvia has partnered with companies from all industries to accelerate the adoption of AI and machine learning technologies.

ai-platform's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ai-platform's Issues

Machine Translation

Goal(s)

  • Translate a sentence in a source language to a different target language

Input(s)

  • Sentence

Output(s)

  • Sentence

Objective Function(s)

  • TBD

Image Generation

Goal(s)

  • Generate a new image based on an existing category

Input(s)

  • Category

Output(s)

  • Image

Objective Function(s)

  • TBD

Speech Recognition

Goal(s)

  • Recognize speech within audio and convert it into text

Input(s)

  • Audio

Output(s)

  • Text

Objective Function(s)

  • TBD

Speech Synthesis

Goal(s)

  • Generating speech from text

Input(s)

  • Audio

Output(s)

  • Text

Objective Function(s)

  • TBD

Anomaly Detection

Goal(s)

  • Identify unusual patterns (aka outliers) that do not conform to expected behavior

Input(s)

  • TBD

Output(s)

  • TBD

Objective Function(s)

  • TBD

Use CNN to classify six signs 0 to 5.

Network which I will use:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED.
I will implement this CNN and use SIGNS dataset representing numbers from 0 to 5. I will use Tensorflow for its implementation. And approximate accuracy which I will get will be around:
train accuracy : 93-99%
test accuracy : 78-99%

(I got a match on angelList and you asked me to do a coding assignment, so is implementing this model will be sufficient for me to clear coding round?)

Text Generation

Goal(s)

  • Generate text which appears indistinguishable to human-written text

Input(s)

  • Parameters

Output(s)

  • Sentence

Objective Function(s)

  • TBD

Audio Generation

Goal(s)

  • Generate raw audio such as speech

Input(s)

  • TBD

Output(s)

  • TBD

Objective Function(s)

  • TBD

Few-Shot Learning

Goal(s)

  • Generalize from new tasks of limited supervised experience using prior knowledge

Input(s)

  • TBD

Output(s)

  • TBD

Objective Function(s)

  • TBD

Meta-Learning

Goal(s)

  • Develop a "learning to learn" machine learning algorithm

Input(s)

  • TBD

Output(s)

  • TBD

Objective Function(s)

  • TBD

Inspiration

Consider three approaches to developing meta-models:

  1. Transfer prior knowledge about what generally works well
  2. Reason about model performance across tasks
  3. Start from models trained earlier on similar tasks

Source: Learning to Learn (NeurIPS 2018)

Node Classification

Goal(s)

  • Determine the labelling of samples (represented as nodes) by looking at the labels of their neighbours

Input(s)

  • TBD

Output(s)

  • TBD

Objective Function(s)

  • TBD

Facial Expression Recognition

Goal(s)

  • Classify the expressions on face images into various categories such as anger, fear, surprise, sadness, happiness, etc.

Input(s)

  • Image or video

Output(s)

  • Category

Objective Function(s)

  • TBD

Entity Extraction

Goal(s)

  • Choose the right extractor for each entity

Input(s)

  • Sentence

Output(s)

  • Category

Objective Function(s)

  • TBD

Related Tasks

  • See Intent Classification (Issue: #95)

Object Detection

Goal(s)

  • Detect instances of semantic objects in images and videos

Input(s)

  • Image or video

Output(s)

  • Bounding box and label

Objective Function(s)

  • TBD

Managerial Components

Goal

  • Develop managerial components of delegation and communication between AI tasks

Inspiration

  • See MLflow multistep workflows (documentation or code examples) for how multiple machine learning models can communicate with each other

Recommendation Systems

Goal(s)

  • Produce a list of recommendations for a user

Input(s)

  • Any input

Output(s)

  • List

Objective Function(s)

  • TBD

Language Modelling

Goal(s)

  • Predict the next word or character in a document

Input(s)

  • Document

Output(s)

  • Document

Objective Function(s)

  • TBD

Time Series Analysis

Goal(s)

  • Analyze time series data in order to extract meaningful statistics and other characteristics of the data

Input(s)

  • Number

Output(s)

  • TBD

Objective Function(s)

  • TBD

Face Detection

Goal(s)

  • Detect faces in a photo or video and distinguish them from other objects

Input(s)

  • Photo or video

Output(s)

  • Bounding box around faces

Objective Function(s)

  • TBD

Age Estimation

Goal(s)

  • Estimate the age of a person from an image or video

Input(s)

  • Image or video

Output(s)

  • Number

Objective Function(s)

  • TBD

Audio Tagging

Goal(s)

  • TBD

Input(s)

  • TBD

Output(s)

  • TBD

Objective Function(s)

  • TBD

Gender Prediction

Goal(s)

  • Predict the gender of a person (male or female)

Input(s)

  • Image or video

Output(s)

  • Category

Objective Function(s)

  • TBD

Automation of Search and Discovery

Goal(s)

  • Develop automation for search and discovery of AI tasks, including algorithms, loss functions, and training methods
  • Guide developers in defining or selecting AI tasks
  • Provide feedback on AI-system behaviors

Possible Solution

  • Develop an intent classification (Issue #95) and entity extraction (Issue #96) models to understand user's goals
  • Develop a recommendation system (Issue #97) to suggest AI tasks
  • Develop a text generation (Issue #98) model to provide feedback to users

Scalability

Goals

As a developer, I want to figure out how to manage 100s or 1,000s of AI tasks, so that I can scale the ai-platform better.

As a developer, I want to experiment and create 10s or 100s of variations of each AI task, so that I can determine the most performant machine learning algorithm and method for solving each task.

Motivation

We currently store each AI task into a separate folder (see /tasks folder). This allows us to develop independent machine learning models and conduct experiments on each task using MLflow. The current folder structure also aligns with MLflow storage format. As the ai-platform repository grows in size, we need to better scale the management of independent AI tasks.

Possible Solutions

  1. Use distributed storage, such as: AWS S3, DBFS, or Google Drive, to minimize input (models or datasets) and output (models).

  2. Use submodules, such as: Git Submodules or Google Repo, to manage multi­-repository integrations in Git.

  3. Use virtual filesystems, such as: VFS for Git (developed by Microsoft) or Git LFS (developed by Atlassian), to enable git to operate at enterprise-scale. Microsoft wrote about it's motivations to to launch VFS for Git: here and here. Atlassian wrote a tutorial about Git LFS: here.

Additional Considerations

MLflow was designed to scale to large data sets, large output files (for example, models), and large numbers of experiments (MLflow Docs, 2019).

  1. An individual MLflow run can execute on a distributed cluster, for example, using Apache Spark. You can launch runs on the distributed infrastructure of your choice and report results to a Tracking Server to compare them. MLflow includes a built-in API to launch runs on Databricks.
  2. MLflow supports launching multiple runs in parallel with different parameters, for example, for hyperparameter tuning. You can simply use the Projects API to start multiple runs and the Tracking API to track them.
  3. MLflow Projects can take input from, and write output to, distributed storage systems such as AWS S3 and DBFS. MLflow can automatically download such files locally for projects that can only run on local files, or give the project a distributed storage URI if it supports that. This means that you can write projects that build large datasets, such as featurizing a 100 TB file.

Let's consider dataset sizes used for training machine learning models. Imagine ai-platform contains 100 models and each dataset is 100 GB (i.e. image dataset). Storing datasets on ai-platform will require 10 TB. If ai-platform contains 1,000 models, that translates to 100 TB.

Let's consider machine learning model sizes used to solve tasks. Imagine ai-platform contains 100 models and each model is 100 MB. Storing models on ai-platform will require 10 GB. If ai-platform contains 1,000 models, that translates to 100 GB.

Speaker Verification

Goal(s)

  • Verify the identity of a person from characteristics of the voice

Input(s)

  • Audio

Output(s)

  • Text

Objective Function(s)

  • TBD

Time Series Forecasting

Goal(s)

  • Predict future values based on previously observed values

Input(s)

  • Number

Output(s)

  • Number

Objective Function(s)

  • TBD

Question Answering

Goal(s)

  • Answer a user's question by finding short text segments on the web or some other collection of documents

Input(s)

  • Sentence

Output(s)

  • Sentence

Objective Function(s)

  • TBD

AutoML

Goal(s)

  • Automate machine learning processes using neural network search, hyperparameter optimization, automated feature engineering

Inspiration

Word Embeddings

Goal(s)

  • Map words or phrases from the vocabulary to vectors of real numbers

Input(s)

  • TBD

Output(s)

  • TBD

Objective Function(s)

  • TBD

Time Series Classification

Goal(s)

  • Classify the data points over the time based on its' behavior

Input(s)

  • Numbers

Output(s)

  • Category

Objective Function(s)

  • TBD

Text Classification

Goal(s)

  • Assign a sentence or document an appropriate category

Input(s)

  • Sentence or document

Output(s)

  • Category

Objective Function(s)

  • TBD

Denoising

Goal(s)

  • Remove noise from an image

Input(s)

  • Image

Output(s)

  • Image

Objective Function(s)

  • TBD

Tasks Database

As we scale AI Platform, we created a database to track machine learning models and all model implementations differentiated by universally unique identifiers (UUIDs).

We will keep track of tasks, models and model UUIDs using SQLite. This dataset can be found on:

/tasks.db

We are open to suggestions on how to improve this system.

Keyword Spotting

Goal(s)

  • Identify keywords in utterances

Input(s)

  • Audio

Output(s)

  • Text

Objective Function(s)

  • TBD

Graph Embedding

Goal(s)

  • Learn a mapping from a network to a vector space, while preserving relevant network properties

Input(s)

  • TBD

Output(s)

  • TBD

Objective Function(s)

  • TBD

Learning To Rank

Goal(s)

  • TBD

Input(s)

  • TBD

Output(s)

  • TBD

Objective Function(s)

  • TBD

Sentiment Analysis

Goal(s)

  • Classify the polarity of a given text

Input(s)

  • Text

Output(s)

  • Text

Objective Function(s)

  • TBD

Super Resolution

Goal(s)

  • Increase image resolution x2 or x4 while maintaining or even improving image quality

Input(s)

  • Image

Output(s)

  • Image

Objective Function(s)

  • TBD

Transfer Learning

Goal(s)

  • Transfer knowledge from a well-labeled domain to a similar but different domain with limited or no labels

Input(s)

  • Model weights trained on one task

Output(s)

  • Model weights trained on another task

Objective Function(s)

  • TBD

Image Classification

Goal(s)

  • Predict a single label (or a distribution over labels) for a given image

Input(s)

  • Image

Output(s)

  • Category

Objective Function(s)

  • TBD

Intent Classification

Goal(s)

  • Describe how user messages should be categorized

Input(s)

  • Sentence

Output(s)

  • Category

Objective Function(s)

  • TBD

Dataset Licenses

Dataset Licenses

Each dataset (public or private) must include a LICENSE file which identifies the usage and permission rights to the data.

Commercial Licenses

Examples of Commercial licenses include:

  1. Research and commercial
Research and commercial licenses available
  1. CC BY 3.0
Attribution 3.0 International (CC BY 3.0) -
--
  | You are free to:
  | Share - copy and redistribute,
  | Adapt - remix, transform, and build upon, even commercially,
  | Under the following terms:
  | Attribution - you must give appropriate credit.
  1. CC-0
CC-0 - No Copyright
  1. CC-BY-SA 3.0
Attribution-ShareAlike International -
--
  | You are free to:
  | Share - copy and redistribute,
  | Adapt - remix, transform, and build upon, even commercialy,
  | Under the following terms:
  | Attribution - you must give approprate credit,
  | ShareAlike - if you make changes, you must distribute your contributions.
  1. CC BY 4.0
Attribution 4.0 International (CC BY 4.0) -
--
  | You are free to:
  | Share - copy and redistribute,
  | Adapt - remix, transform, and build upon, even commercially,
  | Under the following terms:
  | Attribution - you must give appropriate credit.
  1. MIT
MIT - You are free to: use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the work. Under the following terms: the work is provided "as is", you must include copyright and the license in all copies or substantial uses of the work.
  1. GPL
GPL - You are free to: copy, distribute and modify the software as long as you track changes/dates in source files. Under the following terms: any modifications to or software including (via compiler) GPL-licensed code must also be made available under the GPL along with build & install instructions.
  1. Non-commercial & commercial
Non-commercial and commercial licenses available
  1. CC BY-NC-SA 4.0 or commercial
Attribution-NonCommercial-ShareAlike 4.0 International - You are free to:
  | Share - copy and redistribute,
  | Adapt - remix, transform, and build upon, even commercially,
  | Under the following terms:
  | Attribution - you must give appropriate credit,
  | NonCommercial - you may not use the material for commercial purposes,
  | ShareAlike - if you make changes, you must distribute your contributions.

Non-Commercial Licenses

Examples of non-commercial licenses include:

  1. Non-commercial
Can only be used for research and educational purposes. Commercial use is prohibited.
  1. CC-BY-NC-SA 4.0
Attribution-NonCommercial-ShareAlike International -
--
  | You are free to:
  | Share - copy and redistribute,
  | Adapt - remix, transform, and build upon, even commercially,
  | Under the following terms:
  | Attribution - you must give appropriate credit,
  | NonCommercial - you may not use the material for commercial purposes,
  | ShareAlike - if you make changes, you must distribute your contributions.
  1. Various
The dataset contains data from several sources, check the links on the website for individual licenses
  1. Not Found
License information not found

Semantic Segmentation

Goal(s)

  • Recognize and understand what is in an image at the pixel-level

Input(s)

  • Image

Output(s)

  • Image

Objective Function(s)

  • TBD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.