aidynamicaction / rcognita Goto Github PK

rcognita is a flexibly configurable framework for agent-enviroment simulation with a menu of predictive and safe reinforcement learning controllers

License: MIT License

Python 99.31% Makefile 0.33% Batchfile 0.36%

reinforcement-learning prediction-model python simulation control-systems

rcognita's People

Contributors

Stargazers

Watchers

Forkers

punkrat0ff kompaso limph0nimph dimadobriy myildirimm musubapy

rcognita's Issues

Inline RST documentation does not follow conventional practices. Info field lists should be utilized.

A great deal of code is currently documented in an unconventional fashion. To be more precise, there seems to be a tendency to use headings to describe attributes, parameters and returned values, as opposed using info field lists.

For instance consider the docstring for rcognita.controllers.ctrl_selector:

    Main interface for various controllers.
        Parameters
        ----------
        mode : : string
            Controller mode as acronym of the respective control method.
        Returns
        -------
        action : : array of shape ``[dim_input, ]``.
            Control action.

The conventional way to produce a docstring bearing such information would be:

    Main interface for various controllers.

    :param str mode: Controller mode as acronym of the respective control method.
    :return: Control action
    :rtype: array of shape ``[dim_input, ]``

Tests framework implementation

Here are some requirements for implementation of Rcognita framework for tests.

A framework should be easy-to-use
It should be provided with comprehensive and clear instructions on how to create tests using this framework
It should cover all currently implemented presets
It should prevent code duplication
There should be an out-of-the-box possibility to generate a reference data for unit-tests

In ctrl_selector some ideologically wrong arguments are passing. Crutch

Fix animation

Fix simulation animation to work in default Python interpreter instead of only ipython

Command line args need generic initialization in order not to generate error notifications in some IDEs

Suggestion (example for one argument):

dt = []
parser.add_argument('--dt', type=float, metavar='dt',
                    default=0.1,
                    help='Controller sampling time.' )

Constructor arguments recorded as "attributes" in class docstrings.

__init__ should have a docstring of its own. The "Attributes" section in class dosctrings is reserved for attributes. Violating this convention really messes up the wiki. We should fix that (preferably by next release I think) and from now on proceed to document new classes conventionally.

Multiple modifications

Code cleaning and refactoring needed for the modifications done throughout Q1, Q2 of 2021, including those done for education.

This concerns:

actor constraints
new methods: JAC-stab, SQL, SQL-stab, SQL-V etc.
critic constraints for the respective methods, e.g., JAC-stab, SQL-stab etc.
new systems
generic main module (make to a class)
model estimation
loggers
ROS integration

and so on

Add parsing of command line arguments in presets

We need a call capability like:

python main_3wrobot_NI.py -ctrl_mode JACS -dt 0.01 ...

Required parameters:

Parameter name	Values	Notes
`ctrl_mode`	string	see description of methods in preset
`dt`	number	controller sampling time
`t1`	number	final time
`x0`	numpy vector	initial state, dimension preset-specific!

Optional parameters, set to default values unless specified otherwise:

Parameter name	Values	Default	Description
`is_log_data`	binary	0
`is_visualization`	binary	1
`is_print_sim_step`	binary	1
`is_est_model`	binary	0	if a model of the env. is to be estimated online
`model_est_stage`	number	1	seconds to learn model until benchmarking controller kicks in
`model_est_period`	number	1*`dt`	model is updated every `model_est_period` seconds
`model_order`	integer	5	order of state-space estimation model
`prob_noise_pow`	number	8	power of probing noise
`uMan`	numpy vector	zeros	manual control action to be fed constant, system-specific!
`Nactor`	integer	3	horizon length (in steps) for predictive controllers
`pred_step_size`	number	`dt`
`buffer_size`	integer	10
`rcost_struct`	string	`quadratic`	structure of running cost function
`R1`	numpy matrix	identity matrix	must have proper dimension
`R2`	numpy matrix	identity matrix	must have proper dimension
`Ncritic`	integer	4	critic stack size (number of TDs)
`gamma`	number	1	discount factor
`critic_period`	number	`dt`	critic is updated every `critic_period` seconds
`critic_struct`	string	`quad-nomix`	structure of critic features
`actor_struct`	string	`quad-nomix`	structure of actor features

This needs to be reflected in the readme, as an example call of an example present. Could probably be translated from this text.

Test everything, incl. the case when N_actor=0

Create a CI with github actions

Formatting tests
Unit-tests
(optional) pre-commit hooks?

minimize test

ROS harnesses

Create a ROS_harnesses.py module to separate a ROS preset and a ROS setting utility

NN model

Make an NN model
Make a torch optimizer for NN model

Estimate SS model move out of controllers

Disturbance dynamics

Right now, the full state vector in the closed loop function of the system interface contain components related to the disturbance, even if the latter is switched off. Need case distinction as:

is_disturb => dim_full_state = dim_state + dim_disturb
not is_disturb => dim_full_state = dim_state

Architecture refactoring

no switch cases inside classes, only on pipeline or configuration level
self.critic_clock into Critic class and, in general, all class-related field put in classes (with corresp. renaming)

Add installation instructions for external dependencies to docs

Ubuntu/Debian:

sudo apt-get install -y build-essential gfortran cmake libopenblas-dev

Arch

pacman -Sy gcc gcc-fortran cmake base-devel openblas

Then after that

pip install scikit-build

conda install scikit-build

Create "estimators" and "observers" modules

Create estimators.py and observers.py modules
Embed the NN estimator written by @kefir8888 to estimators.py
Embed the EKF to observers,py
Connect them to the whole pipeline

Implement tabular method and pipeline

Update variable definitions

Update:

Definitions of command line args
Definitions of variables in rlframe

Check animation: data cursor malfunctioning (?), need clean quit, check keys

Sampling-best optimization method

rename rhs to state_dyn inside the state predictor

Github pages needs update

The documentation needs update according to the actual rsts

trust-constr clean up

@osinenkop
Is it possible to move it

rcognita/rcognita/controllers.py

Lines 1256 to 1260 in 7d09799

    
           critic_opt_method = 'SLSQP' 
        
           if critic_opt_method == 'trust-constr': 
        
               critic_opt_options = {'maxiter': 200, 'disp': False} #'disp': True, 'verbose': 2} 
        
           else: 
        
               critic_opt_options = {'maxiter': 200, 'maxfev': 1500, 'disp': False, 'adaptive': True, 'xatol': 1e-7, 'fatol': 1e-7} # 'disp': True, 'verbose': 2}

outside the module?

new module for state predictors

Add DiscreteStatePredictor() implementation
Move in there EulerStatePredictor from controllers

Adjust code in forked `mpldatacursor`

Please use this branch https://github.com/AIDynamicAction/mpldatacursor/tree/rcognita-v0.1

Implement new naming

Make logger generic

NN estimator in estimators module

Refresh the documentation

Add command line args

Add command line arguments for main script

Environment configuration is very inconvenient

The solution here is a class which has the following structure:

class abstract_config:
    def __init__(self):
        self.name = "some_agent"
    def argument_parser(self):
        pass
    def post_processing(self):
        pass
    def get_env(self):
        pass

It's very intuitive separation of command-line arguments and other arguments together with their post-processing.

Needs installer

Scenarios module

CASADI integration

Completely refactor the code: separate the symbolic case and numerical case and move the code out of the controller.py
Create tests for CASADI integration
Make some benchmarks and create a table of comparison

Rcognita github page update

Separate module for optimizers

Create optimizers.py and implement an abstract class and all current realizations into it.
Test out

Add link to full documentation to readme

Test implementation in jupyter

Test package in jupyter notebook and fix bugs

Turn callback into a method

Refactor presets in a pipeline style

There were currently implemented a framework for testing and reference data generation. To make it possible to test preset and improve readability of the code all presets were implemented using pipeline approach that appeared to be a good pattern for implementation of presets. To transfer Rcognita to the new preset implementation pattern, it is necessary to:

Implement a CLI-interface consistent with the pipeline
Refactor and implement pipelines for all presets

Python3 ROS

Ubuntu 20 on lab's laptop
main_ros.py

Make bat-file for building rst docs under Windows

Implement Monte-Carlo method and pipeline

Need:

System: pendulum
Scenario for Monte-Carlo learning
REINFORCE

Visualizer: as always (like 3wrobot), but upper left screen: pendulum and its trajectory (dotted line like 3wrobot)

Monte-Carlo scenario:

loop over policy gradient updates
each such update needs several episodes (former runs), so loop over episodes
each episode is like the current main loop, i.e., it iterates over steps
when all episodes are done, experience is used to update policy parameters

Policy must be a PDF (probability distro func). Useful policy parametrizations -- see S&B, p. 322 book.
REINFORCE algorithm can also be found there

	critic_opt_method = 'SLSQP'
	if critic_opt_method == 'trust-constr':
	critic_opt_options = {'maxiter': 200, 'disp': False} #'disp': True, 'verbose': 2}
	else:
	critic_opt_options = {'maxiter': 200, 'maxfev': 1500, 'disp': False, 'adaptive': True, 'xatol': 1e-7, 'fatol': 1e-7} # 'disp': True, 'verbose': 2}