Code Monkey home page Code Monkey logo

mlpro's Introduction

CI Documentation Status PyPI version Anaconda-Version Badge Anaconda-Downloads Badge PyPI Total Downloads PyPI Last Month Downloads

MLPro - The Integrative Middleware Framework for Standardized Machine Learning in Python

MLPro provides complete, standardized, and reusable functionalities to support your scientific research, educational tasks or industrial projects in machine learning.

Key Features

a) Open, modular and extensible architecture

  • Overarching software infrastructure (mathematics, data management and plotting, UI framework, logging, ...)
  • Fundamental ML classes for adaptive models and their training and hyperparameter tuning

b) MLPro-RL: Sub-Package for Reinforcement Learning

  • Powerful Environment templates for simulation, training and real operation
  • Templates for single-agents, model-based agents (MBRL) with action planning to multi-agents (MARL)
  • Advanced training/tuning funktionalities with separate evaluation and progress detection
  • Growing pool of reuseable environments of automation and robotics

c) MLPro-GT: Sub-Package for Native Game Theory and Dynamic Games

  • Templates for native game theory regardless number of players and type of games
  • Templates for multi-players in dynamic games, including game boards, players, and many more
  • Reuse of advanced training/tuning classes and multi-agent environments of sub-package MLPro-RL

d) Numerous executable self study examples

e) Integration of established 3rd party packages

MLPro provides wrapper classes for:

  • Environments of OpenAI Gym and PettingZoo
  • Policy Algorithms of Stable Baselines 3
  • Hyperparameter tuning with Hyperopt

Documentation

The Documentation is available here: https://mlpro.readthedocs.io/

Development

  • Consequent object-oriented design and programming (OOD/OOP)
  • Quality assurance by test-driven development
  • Hosted and managed on GitHub
  • Agile CI/CD approach with automated test and deployment
  • Clean code paradigma

Project and Team

Project MLPro was started in 2021 by the Group for Automation Technology and Learning Systems at the South Westphalia University of Applied Sciences, Germany.

MLPro is designed and developed by Detlef Arend, Steve Yuwono, M Rizky Diprasetya, and further contributors.

How to contribute

If you want to contribute, please read CONTRIBUTING.md

mlpro's People

Contributors

budiatmadjajawill avatar detlefarend avatar laxmikantbaheti avatar marlonloeppenberg avatar mlpro-admin avatar rizkydiprasetya avatar steveyuwono avatar syamrajsatheesh avatar yehiaibrahim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mlpro's Issues

Wrapper classes for OpenAI Gym/Petting Zoo: New method get_cycle_limit()

The interface of class Environment has been extended by a new method get_cycle_limit() that returns the recommended cycle limit for training episodes. The method can be used to configure a proper training process.

In native env implementations it is a static value in the new internal constant C_CYCLE_LIMIT and has to be defined by the creator of a new env.

OpenAI Gym/Petting Zoo provide this value as well. So the new method get_cycle_limit() of the related wrapper classes can be implemented by returning this value from the env registry.

Tasks:

  • Wrapper OpenAI Gym: implementation of method get_cycle_limits()
  • Wrapper Petting Zoo: implementation of method get_cycle_limits()
  • Adaption of related HowTo-files

Requirements:
Branch agent-and-env-model-improvements and the related issues.

See also:
Gym Registry

New Wrapper Class WrEnvMLPro2Gym

A new wrapper class WrEnvMLPro2Gym is to be implemented. It inherits the interface of the OpenAI Gym env and wrapps a MLPro single agent environment to be used as OpenAI Gym env.

New Exploration class for off-policy exploration

By default, off-policy samples the action with some exploration algorithm so called, epsilon greedy. This is just a basic exploration algorithm. There is another exploration algorithm exist in the world. That is why, I think we need an Exploration class to provide different type of exploration algorithm.

SciUI scenario "Reinforcement Learning"

Layout and features to be discussed:

  • Plots of states, actions, reward and overall episode/training statistics
  • Host frames/tabs for agent visualisation and specific parameters
  • Host frames/tabs for environment visualisation and specific parameters

Agent: Adaptation Algorithm

The adaption algorithm of an agent orchestrates various internal adaption steps. Involved components are:

  • Policy
  • Optional environment model
  • Optional adaptive preprocessing steps (e.g. normalization)

To be discussed...

Environment model - Dynamic latency/Asynchronous processing

In version 1.0 of the reinforcement learning model pool the Environment class supports a dynamic latency that can be changed dynamically by using the set_latency() method. But there is no algorithm behind that so that it is up to the environment implementation to change the latency.
The environment/action model shall be extended by an optional latency for each action component. The dynamic latency of the environment is computed as follows:

Compute the maximum latency of all action components for each agent
Determine the minimum agent latency
The environment waits/simulates this dynamic latency duration, determines a new state and returns a reward for the regarding agent...

Finally, the described extension enables asynchronous action processing.

Related Issues
#672 : Focuses on the latency at the system level.

On-/Off-Policy

Consequences/changes to be worked out. Current state of discussion:

1. On-Policy

  • SAR-Buffer will be cleared after a training episode
  • Data base for policy adaption is the entire SAR-Buffer

2. Off-Policy

  • SAR-Buffer will not be cleared after a training episode
  • Data base for policy adaption is a sample of the SAR-Buffer

Wrapper Gym: Detection/mapping of Box spaces

Gym state spaces of type Box shall be mapped to a one-dimensional space where the (one-dimensional) elements contain of just a refrence to any kind of data object (and some additional meta data). This issue depends on the extension of the Dimension class in bf-math...

Remodelling SAR-Buffer

As discussed on 10th Sep. 2021 the State/Action/Reward-Buffer shall be extended by a new functionality to extract a sample following a specific algorithm. In consequence the related class becomes a pool class...

  • New method extract_sample()
  • New subpool folder 'sarbuffers'
  • Reference implementation 'Random sampling' (further implementations for BER, HER postponed)

BF-Math: Data object extension

To deal with big data objects like images, point clouds, ... the classes Dimension and Element have to be extended:

  • 1. Extension of class Dimension

A new base set type 'DO'(=Data Objects) shall be addded.

  • 2. New class DataObject

def init(self, p_data, *p_meta_data)
def get_data()
def get_meta_data()
... maybe the new universal wrapper for data of any type

  • 3. Extension of class Element

Proposal: internal data storage will be changed from np.arry to list
Element components of base set type DO will be stored of type DataObject

Further hints: in own applications the framework should be able to deal with special classes derived from b). For example: images, point clouds. See also: OpenAI Gym Env, space type Box.

Automatic PyPI Deployment

Idea:
A new branch 'released' shall be created. If the core team decides to publish the next release of MLPro the main branch will be merged/pushed/copied to branch 'release'. This triggers an automatic procedure that deploys branch 'released' to the Python Package Index (PyPI)...

  • New branch: 'released'
  • Deployment procedure

BF-Various: Review Logging

Logging should be compatible with established mechanisms in third party packages but especially with Pytorch Tensorboard. Maybe a kind of wrapper mechanism or additional methods for class Log to switch logging to a different destination...

New property class "ScientificObject"

This class marks inherited classes as "scientific objects" with properties like

Title
Description/Abstract
DOI, ISBN, ...
Citation
....

Potential classes to be enriched with the above properties are: Environment, Agent, ExpRateAlgo, Stream, GameBoard, ...

New Wrapper Class WrEnvMLPro2PZoo

A new wrapper class WrEnvMLPro2PZoo is to be implemented. It inherits the interface of the OpenAI Gym env and wrapps a MLPro multi-agent environment to be used as PettingZoo env.

Prepare Docu in ReadTheDocs

  • Check the privacy settings in ReadTheDocs
  • Set up docu project based on GitHub project
  • Schedule an internal appointment (short intro, talk about docu structure and further steps)

Howto xy (RL) - Train UR5 Robot environment with A2C Algorithm

A new sample script with the following parts is to be created in the ./examples/rl folder:

  • New local class ScearioUR5A2C, inherited by class Scenario
  • Method ScenarioUR5A2C._setup(): setup and wire up your UR5 env and an Agent based on Rizky's A2C policy
  • Episodical training by using the class Training

Environment "Double Pendulum"

The double pendulum problem shall be added as a benchmark environment. The mathematics and visualization of it can be adapted from Matplotlib example "The double pendulum problem"

Included tasks are:

  • 1 Preparation of new env Double Pendulum
    • Hyperparameters: gravity, pole lengths, masses, initial state
      (with senseful default settings)
    • State Space: angle, velocity, acceleration of both poles and motor torque
    • Kinematics adapted from Matplotlib example
    • Visualization adapted from Matplotlib example
    • Pseudo-implementation of reward function
  • 2 New Howto RL-020 - Run Double Pendulum with random agent (with default hp settings)
    • Additional feature: empiric determination of boundaries of angle velocities and accelerations
      (by running the RL scenario with data logging for state informations)
  • 3 DP Env: Implementation of an internal Min/Max-normalization of state values based on the found boudaries in Howto RL-020
    (separate protected method _normalize)
  • 4 DP Env: Planning and implementation of a suitable reward function
    A generic approach based on the distance between old/new state to the goal state (zero vector) shall be implemented. The metric of the based state space (method distance) shall be used - independent of the dimensinality of the state space).
  • 5 Documentation on RTD

Prerequisites:
#319

Review Training Mechanisms on Example Howto 09 (SAC with Gym Env)

Currently Howto 09 does not succeed. That raises the following questions:

  • Can we find a better set of (hyper) parameters so that the example succeeds?
  • Do we have a leak in our training process in general (e.g. automatically storing the best policy)?
  • Is MountainCarContinuous the best env to demonstrate the strength of SAC?
  • Do we have to extend our env model by a method that returns the cycle limit per training episode (I am pretty sure that this makes sense)?
  • ...

Class Scenario: optional permanent data logging

Currently class Training collects runtime data only. For supervision of permanent realtime/productive control processes the class Scenario shall be extended by optional permanent data logging. To limit the amount of data the logging should be cyclic e.g. the last n frames of every DataStoring object could be stored in separate files...

Assurance of Clean Code

Clean Code is an internal and external quality feature. It helps to understand/improve/maintain the code and extends therefore the lifecycle of our project. Furthermore it is something a scientific reviewer can easily assess. Last but not least it influences the spread of MLPro: people trust more in code that is understandable/hygenic/unified/stanardized...

  • Style sheet / module template
  • Info slot in a team meeting

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.