tspooner / rsrl Goto Github PK

View Code? Open in Web Editor NEW

171.0 9.0 14.0 1.4 MB

A fast, safe and easy to use reinforcement learning framework in Rust.

Home Page: https://crates.io/crates/rsrl

License: MIT License

Rust 100.00%

rl ai rust reinforcement-learning machine-learning

rsrl's Introduction

RSRL (api)

Reinforcement learning should be fast, safe and easy to use.

Overview

rsrl provides generic constructs for reinforcement learning (RL) experiments in an extensible framework with efficient implementations of existing methods for rapid prototyping.

Installation

[dependencies]
rsrl = "0.8"

Note that rsrl enables the blas feature of its ndarray dependency, so if you're building a binary, you additionally need to specify a BLAS backend compatible with ndarray. For example, you can add these dependencies:

blas-src = { version = "0.2.0", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.6.0", default-features = false, features = ["cblas", "system"] }

See ndarray's README for more information.

Usage

The code below shows how one could use rsrl to evaluate a QLearning agent using a linear function approximator with Fourier basis projection to solve the canonical mountain car problem.

See examples/ for more...

let env = MountainCar::default();
let n_actions = env.action_space().card().into();

let mut rng = StdRng::seed_from_u64(0);
let (mut ql, policy) = {
    let basis = Fourier::from_space(5, env.state_space()).with_bias();
    let q_func = make_shared(LFA::vector(basis, SGD(0.001), n_actions));
    let policy = Greedy::new(q_func.clone());

    (QLearning {
        q_func,
        gamma: 0.9,
    }, policy)
};

for e in 0..200 {
    // Episode loop:
    let mut j = 0;
    let mut env = MountainCar::default();
    let mut action = policy.sample(&mut rng, env.emit().state());

    for i in 0.. {
        // Trajectory loop:
        j = i;

        let t = env.transition(action);

        ql.handle(&t).ok();
        action = policy.sample(&mut rng, t.to.state());

        if t.terminated() {
            break;
        }
    }

    println!("Batch {}: {} steps...", e + 1, j + 1);
}

let traj = MountainCar::default().rollout(|s| policy.mode(s), Some(500));

println!("OOS: {} states...", traj.n_states());

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate and adhere to the angularjs commit message conventions (see here).

License

MIT

rsrl's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger ploppz infinite-joy9l infinite-joy jturner314 alanpoon moxinilian wschella joshhansen clumsy yepman0620 nbro iq-scm oscardarwin

rsrl's Issues

Add config-based experiment infrastructure.

Add state preprocessing module

Handling infinite dimensional domains is problematic. We've found this especially in dealing with way they are defined in the OpenAI gym; see PR #16. Further, in many experiments, the engineer may want to perform extensive preprocessing to the environment state before passing it off to the function approximator.

The answer is to add functionality for handling this explicitly with, ideally, a futures-like pipeline structure; this could be a good opportunity to start incorporating futures in the framework. Of course, it should be opt-in, i.e. the code should work with or without a preprocessor.

Split crate into separate modules

As done by many other crates, it would make sense to split this module into lots of min-crates. These could then be used on their own and also enforces encapsulation of the codebase - which is always a good thing.

For example, one might extract the fa, geometry and domains modules into 3 separate crates. These could then be used by other people in their projects without needing all the boilerplate code in rsrl.

How to write custom environments for training on a custom problem?

First of all I would like to say that this is a great library for reinforcement learning. Thanks for working on this. The examples are also great. But what I felt reading the examples is that I dont have clarity on how to use rsrl for custom environments and problems. Could you please shed some light on this. Thanks for the great work.

Examples don't seem to build

I'm trying out the library for the first time and I'm having trouble getting it to build. rust 1.33:
I pulled the code from here: https://github.com/tspooner/rsrl/blob/master/examples/greedy_gq.rs

error[E0599]: no function or associated item named `scalar_output` found for type `lfa::lfa::LFA<_, _, _>` in the current scope
  --> src/main.rs:20:39
   |
20 |         let v_func = make_shared(LFA::scalar_output(bases.clone()));
   |                                  -----^^^^^^^^^^^^^
   |                                  |
   |                                  function or associated item not found in `lfa::lfa::LFA<_, _, _>`

error[E0599]: no function or associated item named `vector_output` found for type `lfa::lfa::LFA<_, _, _>` in the current scope
  --> src/main.rs:21:39
   |
21 |         let q_func = make_shared(LFA::vector_output(bases, n_actions));
   |                                  -----^^^^^^^^^^^^^
   |                                  |
   |                                  function or associated item not found in `lfa::lfa::LFA<_, _, _>`

error[E0061]: this function takes 2 parameters but 3 parameters were supplied
  --> src/main.rs:24:34
   |
24 |           let policy = make_shared(EpsilonGreedy::new(
   |  __________________________________^
25 | |             Greedy::new(q_func.clone()),
26 | |             Random::new(n_actions),
27 | |             Parameter::exponential(0.3, 0.001, 0.99),
28 | |         ));
   | |_________^ expected 2 parameters

error: aborting due to 3 previous errors

readme.md's versions may cause errors on MacOS

If you are using the following dependencies (as suggested by the readme.md file):

blas-src = { version = "0.2.0", default-features = false, features = ["openblas"] } openblas-src = { version = "0.6.0", default-features = false, features = ["cblas", "system"] }

and are getting the following errors:
error: linking with cc failed: exit status: 1 [huge note of directories] = note: ld: library not found for -lgfortran clang: error: linker command failed with exit code 1 (use -v to see invocation)

Use instead:

'ndarray = { version = "0.15.0", features = ["blas"] }
blas-src = { version = "0.8", features = ["openblas"] }
openblas-src = { version = "0.10", features = ["cblas", "system"] }
'

seems have have solved the issue for me on MacOS.

Complete logging support

We have some logging utilities but they are very skin-deep. We need to be able to pass loggers in a structured manner through to the various components of the experiment, such as the agent, function approximator, domain, etc etc...

In addition to the obvious, this would also be useful to warn users if using agents that have specific unenforced requirements. For example, GreedyGQ requires a stationary behaviour policy but we allow the user to pass any policy type.

Add the option-critic architecture

https://arxiv.org/pdf/1609.05140.pdf

Add support for the RL-Glue protocol.

`Shared<T>` can't be unwrapped or serialized

I'm interested in getting at the internals of a QLearning in order to serialize just the q_func member. It's the only part relevant to my application after training, and so I'd like to unwrap it from the shared pointer so I can serialize just the wrapped function approximation. However, because the Rc<RefCell<_>> contained within Shared is private, it seems to be impossible to unwrap the contained values in a safe manner---only borrowing is permitted. And, because Shared does not implement Serialize or Deserialize, I can't serialize it all as a unit.

As an alternative, I would suggest making Shared a type alias:

type Shared<T> = Rc<RefCell<T>>;

That would make many useful methods available for working with these values, such as Rc::try_unwrap. After unwrapping a function approximator, I could then serialize it if it implements the proper traits, which many (or all?) in the lfa package do.

Happy to send a pull request if you think this is a good direction to move in.

Gradient-free policy search methods

Add the Horde architecture

https://www.cs.swarthmore.edu/~meeden/DevelopmentalRobotics/horde1.pdf

Parameterised action spaces

This has been on my mind for years after reading the paper https://arxiv.org/abs/1511.04143. It's something that could be very useful in a lot of instances and we definitely need to build this into the framework.

Future-based experiment model

`DerefSlice` is difficult to implement

If I understand correctly, DerefSlice is how domain state is mapped to a feature vector for learning. I am finding that the &[f64] borrowed return type is difficult to work with. My game state is not intrinsically a Vec<f64> or similar, so I have to build the vector within deref_slice. Yet because it is a reference, it needs to point to a value that will live longer than the method call. I could put an owned Vec<f64> within the game state struct, but deref_slice takes an immutable &self so I could not update that vector during the deref_slice call before returning it. Which would mean that I have to modify the owned feature vector at some other time, such as by reacting to every mutation of the game state, which would be difficult to enforce.

To work around this I have made my own DerefVec trait which returns a Vec<f64> rather than a slice.

The alternative I see is making deref_slice take a mutable reference to self, but that cascades through the entire API, making every reference to a game state have to be mutable. Maybe that would be okay, though?

The DerefVec approach is a simple change but I could send a pull request if you want. Not helpful performance-wise I'd imagine, but maybe the flexibility would be worth it? Or perhaps you could suggest a better approach than what I have been considering.

Replace calls to Option/Result::unwrap() with proper error handling

Hyper parameter optimisation as a native feature

Extensibility to multi-agent settings

Hello, I'm very new to this repo. Is this easily extensible to multi-agent settings? A bit about the problem I want to codify:

N agents uniformly distributed on the real line, with N odd.
Reward for agent $i$ at timestep $t$ is the magnitude of the intervals that are closest to $i$ and not $j$ forall $j\neq i$.
If two agents share the same interval, they each either half the reward or I'll flip a coin with probability 1/2 (or 1/k for k sharing) to select the agent to reward.

If you're familiar, it's a hotelling model with no Nash equilibrium. Gonna dive into the repo but maybe you can point me to a good place to start! Thanks :)

Undefined symbols for architecture x86_64: "_cblas_sdot", referenced from

I am getting the following error

error: linking with `cc` failed: exit status: 1
  |
  = note: "cc" "-m64" "-arch" "x86_64" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.100y407nzibn0v3o.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.12fpy1ksm5b5771v.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.14whfsbe2lcxoapk.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.192oyids7prk5eak.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1a8guv79wf0w2owl.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1fxtzy95z89s9yds.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1gbs1l25ecmtzmo3.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1ieojwfdnkvtrij2.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1iqk174wrz5bb6g7.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1ngo0ogi26nj7tfs.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1otzxuhrun3jzim7.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1qrk2qr4pbp511z9.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1y1yccjxg0tgog71.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1yzn9rhpag5bpwxn.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.20p8fx2bo2u1s4mj.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2956pe0rphowevhw.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2ak4z2efokpnj3td.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2bz8ycgnn6hhev0h.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2ckm2yzjpcl3jny6.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2fled2w4ooq90zuk.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2hwesdyv23sptms1.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2o8y11fk5tgujask.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2owsj8kl90dlaxxp.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2v2padmultfc015y.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2wse9sp58p7tqbtr.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3dwvkphta04bi868.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3fb2rxk5jm893xyb.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3gd2labfrthxxd50.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3himc58hdgpw3unn.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3ii8jqj9gxl7f31m.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3j3wixlkhztc6z4t.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3kxtq6scafmnga6h.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3osa6k8a1pqx53v4.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3rf93atp0n8njfsv.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3rvs1gkzngol62tu.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3ttuv0ig93sp77rw.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3u5u630lg9kxy47f.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3uceqz4cjnmmfpij.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3y2cd2v75ov8r7ht.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3z7a2debunokcznq.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4gqp4wxhenaovfkw.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4i7h9fqmrfxgd6ez.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4juan1oz6fo9855y.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4svygv1xlv7jqc3c.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4uee5ebjek885baa.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4uk04dte9ypkbwv0.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4uu1ew08p7ffmbq1.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4xr7585xl8e5i4wn.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4yxss856ryk9zxh2.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4z16n4prmh1dz2ej.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4zbclpl7m4cb7k7p.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.55gegb2fvsj89bcp.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5cy9y1snjlixn7t4.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5da7irx8hhx62xvn.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5dc08uor1t355s61.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5govva21qv8w4ut8.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5r1ln9fhqn2gpfq.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.6uj48lwsvm2hat5.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.ansa3210uweletc.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.c0ecfufw1vhhl68.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.gl13bbtwu50ag1q.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.ied7wclj6f5c0ww.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.k8sijrvuy9j0i7u.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.m1djheb0w9nvpvk.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.muggb6girkci1zj.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.tesie8admfr6rbq.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.txtvniqiyhn7lj3.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.udegtedpktr8urt.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.vwropns9g45fwkm.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.z9mh5j704mzse17.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1h90cwm9qwj8iurc.rcgu.o" "-L" "/Users/nbro/Desktop/rsrl/target/debug/deps" "-L" "/Users/nbro/Desktop/rsrl/target/debug/build/special-fun-5e8e40e4208f6444/out" "-L" "/Users/nbro/Desktop/rsrl/target/debug/build/special-fun-5e8e40e4208f6444/out" "-L" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librsrl-1b46111e84e0290f.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librstat-0b8ae75a6e0fde36.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libspecial_fun-35f87730bf5411c3.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum-5c5ed9a8701d379d.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_rational-9049ae633a8bdb91.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_iter-586682b9833ce2cd.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_bigint-4b0e2b6a4cbe708c.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libfailure-1ceba649c9f6c838.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libbacktrace-29a4bced12e05068.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libobject-57fcdcd7d7e9e3cb.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libmemchr-7874de389e22e218.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libaddr2line-7d6c45590d806154.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libgimli-7e508834f36b14c6.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librustc_demangle-0f85c436a3cee7bb.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librsrl_domains-a475d05b4010bc16.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libndarray-3fbc10761f50bd8a.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libitertools-e4aea557565b531c.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libmatrixmultiply-ca40ec3728f4e9d5.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librawpointer-7dadbbfac92495af.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblfa-670d03b4b4804f77.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libspaces-320eca7f8e0cae32.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libitertools-b4b06e031faaf6a2.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libitertools-622a270bdb9ccd96.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libeither-5c484b219f9abba2.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_distr-c66d8a168421c7d6.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand-23194a10b285f06d.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_chacha-ba74cf216790845e.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libppv_lite86-af0aa57e4bfe14bb.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_core-6d6a10f57adb54c6.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libgetrandom-73e7aa3655b4dfdc.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libcfg_if-7e74d31581507915.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libndarray_linalg-9257e937663b298f.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libcauchy-5ff5b9ffdeca56a3.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblapacke-2dc71c5e1ba4988a.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblapacke_sys-7584e61018d0e7f4.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblapack_src-b76f7bfea030cffc.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libblas_src-59200b4e01a5ed4b.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libndarray-95a88a976385f141.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libmatrixmultiply-f0d22bd9fca3a6da.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_complex-fe0f325e0a48da84.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand-2bb3c96a5044258b.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_core-67811649aed310c5.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_core-5fd8ef7f01bd81fb.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libserde-edbb9045479ea3ba.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_integer-9d5ea53ce224789b.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librawpointer-1e4f5908676fe080.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libapprox-8e2afd6254ec436b.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_traits-8133e013eaf58d7a.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libcblas_sys-e4d2e585ba572f04.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblibc-d85ff30af4a50914.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libblas_src-459851aa1265fbbc.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libstd-04b20da5d2b4e02d.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libpanic_unwind-00f5b50d82ace1e3.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libobject-45c041bae4e30a62.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libmemchr-1bc22f5f5be77a23.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libaddr2line-caa784d0cecbd501.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libgimli-0ad46247e89234f6.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/librustc_demangle-68dabd8f37218f7c.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libstd_detect-4c67204728564461.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libhashbrown-eca93a0d11ef9a39.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/librustc_std_workspace_alloc-6ddabe46ef182f8b.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libunwind-7dc1037a76f2c18c.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcfg_if-100dc4191a6287d7.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/liblibc-c596c47fc21af016.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/liballoc-c7163fef4a8cdd33.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/librustc_std_workspace_core-3463abc69f183e66.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcore-12a50039d8929e4e.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcompiler_builtins-a4134bbc9f4f0dab.rlib" "-framework" "Security" "-liconv" "-lSystem" "-lresolv" "-lc" "-lm" "-liconv" "-L" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib" "-o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e" "-Wl,-dead_strip" "-nodefaultlibs"
  = note: Undefined symbols for architecture x86_64:
            "_cblas_sdot", referenced from:
                ndarray::linalg::impl_linalg::_$LT$impl$u20$ndarray..ArrayBase$LT$S$C$ndarray..dimension..dim..Dim$LT$$u5b$usize$u3b$$u20$1$u5d$$GT$$GT$$GT$::dot_impl::hdce8f1e718fcf2c1 in liblfa-670d03b4b4804f77.rlib(lfa-670d03b4b4804f77.lfa.1a38f248-cgu.7.rcgu.o)
            "_cblas_ddot", referenced from:
                ndarray::linalg::impl_linalg::_$LT$impl$u20$ndarray..ArrayBase$LT$S$C$ndarray..dimension..dim..Dim$LT$$u5b$usize$u3b$$u20$1$u5d$$GT$$GT$$GT$::dot_impl::hdce8f1e718fcf2c1 in liblfa-670d03b4b4804f77.rlib(lfa-670d03b4b4804f77.lfa.1a38f248-cgu.7.rcgu.o)
          ld: symbol(s) not found for architecture x86_64
          clang: error: linker command failed with exit code 1 (use -v to see invocation)

when I attempt to run the example q_learning.rs in rsrl/Cargo.toml with the command cargo run --example q_learning.

Specifications

Mac OS version 12.3.1 (Monterey).
rustc 1.60.0
cargo 1.60.0

It's not the first time that this error occurs. Here are other related issues

Here you say that you used to use Mac too.

In rsrl/Cargo.toml you only specify ndarray = "0.13" under [dependencies], and you specify the following

blas-src = { version = "0.4", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.7", default-features = false, features = ["cblas", "system"] }

under [dev-dependencies], so it seems that you wouldn't need to add the previous 2 lines under [dependencies] too. In fact, even if you don't have BLAS or openblas, ndarray should fallback to matrixmultiply (see this), so this error should not even occur. However, in the README, which was updated here, you write (or someone else wrote in a pull request)

so if you're building a binary, you additionally need to specify a BLAS backend compatible with ndarray

An example is indeed a binary, so I also tried to add the following lines

blas-src = { version = "0.4", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.7", default-features = false, features = ["cblas", "system"] }

under [dependencies], but I get the same linking error.

Maybe this is obvious, but don't you need to install OpenBLAS to use it? So, I guess that, if I don't have it installed in my system, I can't use it. If you're using mac, how should you install it? This information probably should be in the README anyway. You could use brew (here), but is this the right thing to do here? I tried to install OpenBLAS via homebrew with brew install openblas, then

export LDFLAGS="-L/usr/local/opt/openblas/lib"
export CPPFLAGS="-I/usr/local/opt/openblas/include"
export PKG_CONFIG_PATH="/usr/local/opt/openblas/lib/pkgconfig"

As suggested at the end of the brew installation, then cargo clean and cargo run --example q_learning , but I still get the same error.

Can't we simply use rsrl without OpenBLAS? I just wanted to try the q-learning example and I don't care about performance now (by the way, why do you use openblas? is it because of performance?).

Roadmap needed

This project seems to be a good starting point for a good RL library.

However, it doesn't seem to have a roadmap and it wouldn't be easy for other people to understand what features should or not be implemented, and why.

If I wanted to contribute to this project, I wouldn't know how to start. Yes, sure, I can look at the code and maybe try to implement some feature, but who knows if you would accept this pull request. It would quite be a waste of time. This project seems to have started as a personal project, but it could become a good library for RL and compete with other RL libraries (like stable-baselines).

Here are some of the questions that the ROADMAP could/should answer.

What is the ultimate goal of this library?
What features do we really want to support?
How do we decide whether a feature is worth implementing or not? Should we use a voting mechanism?
What features have already been implemented?
What are the short and long-term goals?
How do we organize the efforts?
How about the DevOps part?
- Should all features be tested before being released?
- How about the versions of the crate? How do you handle them? What about stability?
- Should we use branches when introducing a new feature?

Can't run examples - linker errors to blas

cargo run --example a2c
-> note: /usr/bin/ld: cannot find -lopenblas

Then I installed openblas instead of blas (Arch Linux). Now it says:

  = note: /usr/bin/ld: /tmp/rsrl/target/debug/deps/liblfa-9d758421e5e2d7cc.rlib(lfa-9d758421e5e2d7cc.lfa.9q6n3def-cgu.1.rcgu.o): in function `ndarray::linalg::impl_linalg::<impl ndarray::ArrayBase<S, ndarray::dimension::dim::Dim<[usize; _]>>>::dot_impl':
          /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:115: undefined reference to `cblas_sdot'
          /usr/bin/ld: /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:115: undefined reference to `cblas_ddot'
          /usr/bin/ld: /tmp/rsrl/target/debug/deps/liblfa-9d758421e5e2d7cc.rlib(lfa-9d758421e5e2d7cc.lfa.9q6n3def-cgu.1.rcgu.o): in function `ndarray::linalg::impl_linalg::<impl ndarray::ArrayBase<S, ndarray::dimension::dim::Dim<[usize; _]>>>::dot_impl':
          /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:115: undefined reference to `cblas_sdot'
          /usr/bin/ld: /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:115: undefined reference to `cblas_ddot'
          /usr/bin/ld: /tmp/rsrl/target/debug/deps/liblfa-9d758421e5e2d7cc.rlib(lfa-9d758421e5e2d7cc.lfa.9q6n3def-cgu.12.rcgu.o): in function `ndarray::linalg::impl_linalg::mat_mul_impl':
          /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:422: undefined reference to `cblas_sgemm'
          /usr/bin/ld: /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:422: undefined reference to `cblas_dgemm'
          collect2: error: ld returned 1 exit status

I tried to disable anything related to blas in Cargo.toml, changed lines look like:

ndarray = { version = "0.12", features = ["serde-1"] }
# ...
blas-src = { version = "0.2", default-features = false, features = [] }
openblas-src = { version = "0.6", default-features = false, features = ["system"] }

But that did not help at all.

qns: Saving each experiment run

Is there something that I can serialize and deserialize for aiding the next learning experiment run?

Add support for neural networks

http://www.arewelearningyet.com/neural-networks/

Replace existing implementations with traits from num_traits crate wherever possible.

Add FQI and variants thereof

See, for example:

Add policy gradient methods and re-visit `policies` module

In order to add policy gradient methods we need a more principled implementation of policies. In particular, to enable the calculation of gradients which are required for directing the update in PG algorithms.

Upgrade policies module.
Implement policy gradient methods.

Evolutionary strategies

We should add some methods for solving RL problems using evolutionary strategies following the work by Open AI (https://arxiv.org/abs/1703.03864). This includes techniques such as:

The list could go on, but this is a good start.

Why Vector type?

It is not obvious on first glance to see that RSRL's Vector<T> type is actually equivalent to ndarray's Array1<A>. Maybe it's best to keep it simple and either use ndarray::Array1, or type Vector<T> = Array1<T>.

Add some simple divergence detection

At the moment agents with large learning rates which lead to divergence will often crash due to NaN values when sampling the policy. This is hard to debug so we need some improved error messages to help users distinguish between coding mistakes and divergence issues.

Agents requiring mutability for evaluation (action selection)

Hi! First of all, big thanks for this crate.

I had some questions regarding agents (or controllers) that require mutability (&mut self) for selecting an action that is currently not supported. An example class of agents are online solvers or planners in partially observable environments. These agents often learn an approximate policy, or some other heuristic, and use that heuristic to guide a policy search or other planning method relevant only to the current 'state'. However, this current state is usually some function of the entire history of actions and observations up until that point.

Thus updating this current state requires mutating some field in the agent, and thus requires a &mut self. This also implies the agent needs some to be 'reset' after an episode, akin to the current handle_terminal of the OnlineLearner trait. Although a copy could also be sufficient to start from the blank 'initial state'.

A concrete example of this is the infinite POMDP [1] (to which my research is related), but in fact it is relevant to any agent that incorporates data from the current episode to have an effect on planning.

Now I was wondering:

Would you be interested in having a compatible API for those agents in this crate?
If so, how would you see an implementation be incorporated here? (I could make a PR)

Change the controller trait to take a &mut self and add a handle_terminal method. Conceptually this generalizes the Controller trait, as every struct implementing the current trait could trivially implement the generalization. But this does not seem to be required for the Deep RL agents this crate is focused on (and as such would dirty the interface).
Add an OnlineController trait with the proposed changes, and implement it for all the controllers (which should be trivial). Conceptually you have an implementation for OnlineController when you have one for Controller. It should be possible to express that using Rust's trait system (playground example).
Anything else you suggest.

Any feedback would be appreciated. To me it seems useful to support this case (albeit in a type alone), as it would allow writing more agents to the type interface this crate provide. But I understand I might be biased.

Thanks in advance.

PS: I have also wondered if it would be useful to separate some types (mostly the learner, domain and controller ones) into a separate crate, as that would allow implementing an agent against these traits without pulling in all the dependencies for all other agents and domains. But I'll keep proposal that for a separate issue.

[1] Doshi-velez, Finale. ‘The Infinite Partially Observable Markov Decision Process’. In Advances in Neural Information Processing Systems 22, edited by Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, 477–485. Curran Associates, Inc., 2009. http://papers.nips.cc/paper/3780-the-infinite-partially-observable-markov-decision-process.pdf.

Revise trace interface

At the moment we handle traces with an enum type that has multiple implementations including replacing and accumulating etc. These cannot be extended. Also, the construction of these objects is clunky at best and even I usually have to look up the constructor definition.

This should be streamlined. Users should be able to extend it themselves. Further, if the user so chooses it would be nice to hide this away and just have sensible defaults, while still allowing more advanced usage.

Not able to run the openai code

When trying to run a code with the openai platform getting the below error

   Compiling rsrl v0.6.0 (/home/nineleaps/thinking/programming-languages/rust-lang/rsrl)
error[E0432]: unresolved imports `crate::geometry::dimensions`, `crate::geometry::RegularSpace`
 --> src/domains/openai/mod.rs:4:5
  |
4 |     dimensions::{Continuous, Discrete},
  |     ^^^^^^^^^^ could not find `dimensions` in `geometry`
5 |     RegularSpace,
  |     ^^^^^^^^^^^^ no `RegularSpace` in the root

error: aborting due to previous error

For more information about this error, try `rustc --explain E0432`.
error: Could not compile `rsrl`.

To learn more, run the command again with --verbose.

Seems like there has been a lot of code changes since the openai features had been integrated.

Implicit compatible function approximation

Find a neat solution to using the policy score function as the features of an LFA instance. The issue at the moment is that the project method only takes a single input. The score function variant would also require the action. There are loads of ways to do this, but we want something that won't require rethinking later down the line.

Add support for non-discrete action spaces

Pretty much everything is already there implemented for this. We just need to put all the type parameters in place which will take a good while.

Adaptive function approximation

At the moment, linear function approximation in rsrl requires stationary feature vectors. However, a lot of research conducted in the area of feature-based representations suggest that learning dependencies online can produce superior results. For example:

To do this we need either a new class, perhaps called AdaptiveLinear, which handles changes in the weight vector or just custom linear function approximators for each of the above.