Code Monkey home page Code Monkey logo

rsrl's Introduction

RSRL (api)

Crates.io Build Status Coverage Status

Reinforcement learning should be fast, safe and easy to use.

Overview

rsrl provides generic constructs for reinforcement learning (RL) experiments in an extensible framework with efficient implementations of existing methods for rapid prototyping.

Installation

[dependencies]
rsrl = "0.8"

Note that rsrl enables the blas feature of its ndarray dependency, so if you're building a binary, you additionally need to specify a BLAS backend compatible with ndarray. For example, you can add these dependencies:

blas-src = { version = "0.2.0", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.6.0", default-features = false, features = ["cblas", "system"] }

See ndarray's README for more information.

Usage

The code below shows how one could use rsrl to evaluate a QLearning agent using a linear function approximator with Fourier basis projection to solve the canonical mountain car problem.

See examples/ for more...

let env = MountainCar::default();
let n_actions = env.action_space().card().into();

let mut rng = StdRng::seed_from_u64(0);
let (mut ql, policy) = {
    let basis = Fourier::from_space(5, env.state_space()).with_bias();
    let q_func = make_shared(LFA::vector(basis, SGD(0.001), n_actions));
    let policy = Greedy::new(q_func.clone());

    (QLearning {
        q_func,
        gamma: 0.9,
    }, policy)
};

for e in 0..200 {
    // Episode loop:
    let mut j = 0;
    let mut env = MountainCar::default();
    let mut action = policy.sample(&mut rng, env.emit().state());

    for i in 0.. {
        // Trajectory loop:
        j = i;

        let t = env.transition(action);

        ql.handle(&t).ok();
        action = policy.sample(&mut rng, t.to.state());

        if t.terminated() {
            break;
        }
    }

    println!("Batch {}: {} steps...", e + 1, j + 1);
}

let traj = MountainCar::default().rollout(|s| policy.mode(s), Some(500));

println!("OOS: {} states...", traj.n_states());

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate and adhere to the angularjs commit message conventions (see here).

License

MIT

rsrl's People

Contributors

joshhansen avatar jturner314 avatar moxinilian avatar tspooner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rsrl's Issues

Add state preprocessing module

Handling infinite dimensional domains is problematic. We've found this especially in dealing with way they are defined in the OpenAI gym; see PR #16. Further, in many experiments, the engineer may want to perform extensive preprocessing to the environment state before passing it off to the function approximator.

The answer is to add functionality for handling this explicitly with, ideally, a futures-like pipeline structure; this could be a good opportunity to start incorporating futures in the framework. Of course, it should be opt-in, i.e. the code should work with or without a preprocessor.

Split crate into separate modules

As done by many other crates, it would make sense to split this module into lots of min-crates. These could then be used on their own and also enforces encapsulation of the codebase - which is always a good thing.

For example, one might extract the fa, geometry and domains modules into 3 separate crates. These could then be used by other people in their projects without needing all the boilerplate code in rsrl.

How to write custom environments for training on a custom problem?

First of all I would like to say that this is a great library for reinforcement learning. Thanks for working on this. The examples are also great. But what I felt reading the examples is that I dont have clarity on how to use rsrl for custom environments and problems. Could you please shed some light on this. Thanks for the great work.

Examples don't seem to build

I'm trying out the library for the first time and I'm having trouble getting it to build. rust 1.33:
I pulled the code from here: https://github.com/tspooner/rsrl/blob/master/examples/greedy_gq.rs

error[E0599]: no function or associated item named `scalar_output` found for type `lfa::lfa::LFA<_, _, _>` in the current scope
  --> src/main.rs:20:39
   |
20 |         let v_func = make_shared(LFA::scalar_output(bases.clone()));
   |                                  -----^^^^^^^^^^^^^
   |                                  |
   |                                  function or associated item not found in `lfa::lfa::LFA<_, _, _>`

error[E0599]: no function or associated item named `vector_output` found for type `lfa::lfa::LFA<_, _, _>` in the current scope
  --> src/main.rs:21:39
   |
21 |         let q_func = make_shared(LFA::vector_output(bases, n_actions));
   |                                  -----^^^^^^^^^^^^^
   |                                  |
   |                                  function or associated item not found in `lfa::lfa::LFA<_, _, _>`

error[E0061]: this function takes 2 parameters but 3 parameters were supplied
  --> src/main.rs:24:34
   |
24 |           let policy = make_shared(EpsilonGreedy::new(
   |  __________________________________^
25 | |             Greedy::new(q_func.clone()),
26 | |             Random::new(n_actions),
27 | |             Parameter::exponential(0.3, 0.001, 0.99),
28 | |         ));
   | |_________^ expected 2 parameters

error: aborting due to 3 previous errors

readme.md's versions may cause errors on MacOS

If you are using the following dependencies (as suggested by the readme.md file):

blas-src = { version = "0.2.0", default-features = false, features = ["openblas"] } openblas-src = { version = "0.6.0", default-features = false, features = ["cblas", "system"] }

and are getting the following errors:
error: linking with cc failed: exit status: 1 [huge note of directories] = note: ld: library not found for -lgfortran clang: error: linker command failed with exit code 1 (use -v to see invocation)

Use instead:

'ndarray = { version = "0.15.0", features = ["blas"] }
blas-src = { version = "0.8", features = ["openblas"] }
openblas-src = { version = "0.10", features = ["cblas", "system"] }
'

seems have have solved the issue for me on MacOS.

Complete logging support

We have some logging utilities but they are very skin-deep. We need to be able to pass loggers in a structured manner through to the various components of the experiment, such as the agent, function approximator, domain, etc etc...

In addition to the obvious, this would also be useful to warn users if using agents that have specific unenforced requirements. For example, GreedyGQ requires a stationary behaviour policy but we allow the user to pass any policy type.

`Shared<T>` can't be unwrapped or serialized

I'm interested in getting at the internals of a QLearning in order to serialize just the q_func member. It's the only part relevant to my application after training, and so I'd like to unwrap it from the shared pointer so I can serialize just the wrapped function approximation. However, because the Rc<RefCell<_>> contained within Shared is private, it seems to be impossible to unwrap the contained values in a safe manner---only borrowing is permitted. And, because Shared does not implement Serialize or Deserialize, I can't serialize it all as a unit.

As an alternative, I would suggest making Shared a type alias:

type Shared<T> = Rc<RefCell<T>>;

That would make many useful methods available for working with these values, such as Rc::try_unwrap. After unwrapping a function approximator, I could then serialize it if it implements the proper traits, which many (or all?) in the lfa package do.

Happy to send a pull request if you think this is a good direction to move in.

`DerefSlice` is difficult to implement

If I understand correctly, DerefSlice is how domain state is mapped to a feature vector for learning. I am finding that the &[f64] borrowed return type is difficult to work with. My game state is not intrinsically a Vec<f64> or similar, so I have to build the vector within deref_slice. Yet because it is a reference, it needs to point to a value that will live longer than the method call. I could put an owned Vec<f64> within the game state struct, but deref_slice takes an immutable &self so I could not update that vector during the deref_slice call before returning it. Which would mean that I have to modify the owned feature vector at some other time, such as by reacting to every mutation of the game state, which would be difficult to enforce.

To work around this I have made my own DerefVec trait which returns a Vec<f64> rather than a slice.

The alternative I see is making deref_slice take a mutable reference to self, but that cascades through the entire API, making every reference to a game state have to be mutable. Maybe that would be okay, though?

The DerefVec approach is a simple change but I could send a pull request if you want. Not helpful performance-wise I'd imagine, but maybe the flexibility would be worth it? Or perhaps you could suggest a better approach than what I have been considering.

Extensibility to multi-agent settings

Hello, I'm very new to this repo. Is this easily extensible to multi-agent settings? A bit about the problem I want to codify:

  • N agents uniformly distributed on the real line, with N odd.
  • Reward for agent $i$ at timestep $t$ is the magnitude of the intervals that are closest to $i$ and not $j$ forall $j\neq i$.
  • If two agents share the same interval, they each either half the reward or I'll flip a coin with probability 1/2 (or 1/k for k sharing) to select the agent to reward.

If you're familiar, it's a hotelling model with no Nash equilibrium. Gonna dive into the repo but maybe you can point me to a good place to start! Thanks :)

Undefined symbols for architecture x86_64: "_cblas_sdot", referenced from

I am getting the following error

error: linking with `cc` failed: exit status: 1
  |
  = note: "cc" "-m64" "-arch" "x86_64" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.100y407nzibn0v3o.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.12fpy1ksm5b5771v.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.14whfsbe2lcxoapk.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.192oyids7prk5eak.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1a8guv79wf0w2owl.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1fxtzy95z89s9yds.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1gbs1l25ecmtzmo3.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1ieojwfdnkvtrij2.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1iqk174wrz5bb6g7.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1ngo0ogi26nj7tfs.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1otzxuhrun3jzim7.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1qrk2qr4pbp511z9.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1y1yccjxg0tgog71.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1yzn9rhpag5bpwxn.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.20p8fx2bo2u1s4mj.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2956pe0rphowevhw.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2ak4z2efokpnj3td.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2bz8ycgnn6hhev0h.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2ckm2yzjpcl3jny6.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2fled2w4ooq90zuk.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2hwesdyv23sptms1.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2o8y11fk5tgujask.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2owsj8kl90dlaxxp.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2v2padmultfc015y.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.2wse9sp58p7tqbtr.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3dwvkphta04bi868.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3fb2rxk5jm893xyb.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3gd2labfrthxxd50.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3himc58hdgpw3unn.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3ii8jqj9gxl7f31m.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3j3wixlkhztc6z4t.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3kxtq6scafmnga6h.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3osa6k8a1pqx53v4.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3rf93atp0n8njfsv.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3rvs1gkzngol62tu.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3ttuv0ig93sp77rw.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3u5u630lg9kxy47f.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3uceqz4cjnmmfpij.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3y2cd2v75ov8r7ht.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.3z7a2debunokcznq.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4gqp4wxhenaovfkw.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4i7h9fqmrfxgd6ez.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4juan1oz6fo9855y.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4svygv1xlv7jqc3c.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4uee5ebjek885baa.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4uk04dte9ypkbwv0.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4uu1ew08p7ffmbq1.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4xr7585xl8e5i4wn.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4yxss856ryk9zxh2.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4z16n4prmh1dz2ej.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.4zbclpl7m4cb7k7p.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.55gegb2fvsj89bcp.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5cy9y1snjlixn7t4.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5da7irx8hhx62xvn.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5dc08uor1t355s61.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5govva21qv8w4ut8.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.5r1ln9fhqn2gpfq.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.6uj48lwsvm2hat5.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.ansa3210uweletc.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.c0ecfufw1vhhl68.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.gl13bbtwu50ag1q.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.ied7wclj6f5c0ww.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.k8sijrvuy9j0i7u.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.m1djheb0w9nvpvk.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.muggb6girkci1zj.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.tesie8admfr6rbq.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.txtvniqiyhn7lj3.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.udegtedpktr8urt.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.vwropns9g45fwkm.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.z9mh5j704mzse17.rcgu.o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e.1h90cwm9qwj8iurc.rcgu.o" "-L" "/Users/nbro/Desktop/rsrl/target/debug/deps" "-L" "/Users/nbro/Desktop/rsrl/target/debug/build/special-fun-5e8e40e4208f6444/out" "-L" "/Users/nbro/Desktop/rsrl/target/debug/build/special-fun-5e8e40e4208f6444/out" "-L" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librsrl-1b46111e84e0290f.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librstat-0b8ae75a6e0fde36.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libspecial_fun-35f87730bf5411c3.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum-5c5ed9a8701d379d.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_rational-9049ae633a8bdb91.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_iter-586682b9833ce2cd.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_bigint-4b0e2b6a4cbe708c.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libfailure-1ceba649c9f6c838.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libbacktrace-29a4bced12e05068.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libobject-57fcdcd7d7e9e3cb.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libmemchr-7874de389e22e218.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libaddr2line-7d6c45590d806154.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libgimli-7e508834f36b14c6.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librustc_demangle-0f85c436a3cee7bb.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librsrl_domains-a475d05b4010bc16.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libndarray-3fbc10761f50bd8a.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libitertools-e4aea557565b531c.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libmatrixmultiply-ca40ec3728f4e9d5.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librawpointer-7dadbbfac92495af.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblfa-670d03b4b4804f77.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libspaces-320eca7f8e0cae32.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libitertools-b4b06e031faaf6a2.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libitertools-622a270bdb9ccd96.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libeither-5c484b219f9abba2.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_distr-c66d8a168421c7d6.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand-23194a10b285f06d.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_chacha-ba74cf216790845e.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libppv_lite86-af0aa57e4bfe14bb.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_core-6d6a10f57adb54c6.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libgetrandom-73e7aa3655b4dfdc.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libcfg_if-7e74d31581507915.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libndarray_linalg-9257e937663b298f.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libcauchy-5ff5b9ffdeca56a3.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblapacke-2dc71c5e1ba4988a.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblapacke_sys-7584e61018d0e7f4.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblapack_src-b76f7bfea030cffc.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libblas_src-59200b4e01a5ed4b.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libndarray-95a88a976385f141.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libmatrixmultiply-f0d22bd9fca3a6da.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_complex-fe0f325e0a48da84.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand-2bb3c96a5044258b.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_core-67811649aed310c5.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librand_core-5fd8ef7f01bd81fb.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libserde-edbb9045479ea3ba.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_integer-9d5ea53ce224789b.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/librawpointer-1e4f5908676fe080.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libapprox-8e2afd6254ec436b.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libnum_traits-8133e013eaf58d7a.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libcblas_sys-e4d2e585ba572f04.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/liblibc-d85ff30af4a50914.rlib" "/Users/nbro/Desktop/rsrl/target/debug/deps/libblas_src-459851aa1265fbbc.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libstd-04b20da5d2b4e02d.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libpanic_unwind-00f5b50d82ace1e3.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libobject-45c041bae4e30a62.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libmemchr-1bc22f5f5be77a23.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libaddr2line-caa784d0cecbd501.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libgimli-0ad46247e89234f6.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/librustc_demangle-68dabd8f37218f7c.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libstd_detect-4c67204728564461.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libhashbrown-eca93a0d11ef9a39.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/librustc_std_workspace_alloc-6ddabe46ef182f8b.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libunwind-7dc1037a76f2c18c.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcfg_if-100dc4191a6287d7.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/liblibc-c596c47fc21af016.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/liballoc-c7163fef4a8cdd33.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/librustc_std_workspace_core-3463abc69f183e66.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcore-12a50039d8929e4e.rlib" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcompiler_builtins-a4134bbc9f4f0dab.rlib" "-framework" "Security" "-liconv" "-lSystem" "-lresolv" "-lc" "-lm" "-liconv" "-L" "/Users/nbro/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib" "-o" "/Users/nbro/Desktop/rsrl/target/debug/examples/q_learning-ffe1683a1ab5ed1e" "-Wl,-dead_strip" "-nodefaultlibs"
  = note: Undefined symbols for architecture x86_64:
            "_cblas_sdot", referenced from:
                ndarray::linalg::impl_linalg::_$LT$impl$u20$ndarray..ArrayBase$LT$S$C$ndarray..dimension..dim..Dim$LT$$u5b$usize$u3b$$u20$1$u5d$$GT$$GT$$GT$::dot_impl::hdce8f1e718fcf2c1 in liblfa-670d03b4b4804f77.rlib(lfa-670d03b4b4804f77.lfa.1a38f248-cgu.7.rcgu.o)
            "_cblas_ddot", referenced from:
                ndarray::linalg::impl_linalg::_$LT$impl$u20$ndarray..ArrayBase$LT$S$C$ndarray..dimension..dim..Dim$LT$$u5b$usize$u3b$$u20$1$u5d$$GT$$GT$$GT$::dot_impl::hdce8f1e718fcf2c1 in liblfa-670d03b4b4804f77.rlib(lfa-670d03b4b4804f77.lfa.1a38f248-cgu.7.rcgu.o)
          ld: symbol(s) not found for architecture x86_64
          clang: error: linker command failed with exit code 1 (use -v to see invocation)
          

when I attempt to run the example q_learning.rs in rsrl/Cargo.toml with the command cargo run --example q_learning.

Specifications

  • Mac OS version 12.3.1 (Monterey).
  • rustc 1.60.0
  • cargo 1.60.0

It's not the first time that this error occurs. Here are other related issues

Here you say that you used to use Mac too.

In rsrl/Cargo.toml you only specify ndarray = "0.13" under [dependencies], and you specify the following

blas-src = { version = "0.4", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.7", default-features = false, features = ["cblas", "system"] }

under [dev-dependencies], so it seems that you wouldn't need to add the previous 2 lines under [dependencies] too. In fact, even if you don't have BLAS or openblas, ndarray should fallback to matrixmultiply (see this), so this error should not even occur. However, in the README, which was updated here, you write (or someone else wrote in a pull request)

so if you're building a binary, you additionally need to specify a BLAS backend compatible with ndarray

An example is indeed a binary, so I also tried to add the following lines

blas-src = { version = "0.4", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.7", default-features = false, features = ["cblas", "system"] }

under [dependencies], but I get the same linking error.

Maybe this is obvious, but don't you need to install OpenBLAS to use it? So, I guess that, if I don't have it installed in my system, I can't use it. If you're using mac, how should you install it? This information probably should be in the README anyway. You could use brew (here), but is this the right thing to do here? I tried to install OpenBLAS via homebrew with brew install openblas, then

  1. export LDFLAGS="-L/usr/local/opt/openblas/lib"
  2. export CPPFLAGS="-I/usr/local/opt/openblas/include"
  3. export PKG_CONFIG_PATH="/usr/local/opt/openblas/lib/pkgconfig"

As suggested at the end of the brew installation, then cargo clean and cargo run --example q_learning , but I still get the same error.

Can't we simply use rsrl without OpenBLAS? I just wanted to try the q-learning example and I don't care about performance now (by the way, why do you use openblas? is it because of performance?).

Roadmap needed

This project seems to be a good starting point for a good RL library.

However, it doesn't seem to have a roadmap and it wouldn't be easy for other people to understand what features should or not be implemented, and why.

If I wanted to contribute to this project, I wouldn't know how to start. Yes, sure, I can look at the code and maybe try to implement some feature, but who knows if you would accept this pull request. It would quite be a waste of time. This project seems to have started as a personal project, but it could become a good library for RL and compete with other RL libraries (like stable-baselines).

Here are some of the questions that the ROADMAP could/should answer.

  • What is the ultimate goal of this library?
  • What features do we really want to support?
  • How do we decide whether a feature is worth implementing or not? Should we use a voting mechanism?
  • What features have already been implemented?
  • What are the short and long-term goals?
  • How do we organize the efforts?
  • How about the DevOps part?
    • Should all features be tested before being released?
    • How about the versions of the crate? How do you handle them? What about stability?
    • Should we use branches when introducing a new feature?

Can't run examples - linker errors to blas

cargo run --example a2c
-> note: /usr/bin/ld: cannot find -lopenblas

Then I installed openblas instead of blas (Arch Linux). Now it says:

  = note: /usr/bin/ld: /tmp/rsrl/target/debug/deps/liblfa-9d758421e5e2d7cc.rlib(lfa-9d758421e5e2d7cc.lfa.9q6n3def-cgu.1.rcgu.o): in function `ndarray::linalg::impl_linalg::<impl ndarray::ArrayBase<S, ndarray::dimension::dim::Dim<[usize; _]>>>::dot_impl':
          /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:115: undefined reference to `cblas_sdot'
          /usr/bin/ld: /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:115: undefined reference to `cblas_ddot'
          /usr/bin/ld: /tmp/rsrl/target/debug/deps/liblfa-9d758421e5e2d7cc.rlib(lfa-9d758421e5e2d7cc.lfa.9q6n3def-cgu.1.rcgu.o): in function `ndarray::linalg::impl_linalg::<impl ndarray::ArrayBase<S, ndarray::dimension::dim::Dim<[usize; _]>>>::dot_impl':
          /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:115: undefined reference to `cblas_sdot'
          /usr/bin/ld: /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:115: undefined reference to `cblas_ddot'
          /usr/bin/ld: /tmp/rsrl/target/debug/deps/liblfa-9d758421e5e2d7cc.rlib(lfa-9d758421e5e2d7cc.lfa.9q6n3def-cgu.12.rcgu.o): in function `ndarray::linalg::impl_linalg::mat_mul_impl':
          /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:422: undefined reference to `cblas_sgemm'
          /usr/bin/ld: /home/ploppz/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.12.1/src/linalg/impl_linalg.rs:422: undefined reference to `cblas_dgemm'
          collect2: error: ld returned 1 exit status

I tried to disable anything related to blas in Cargo.toml, changed lines look like:

ndarray = { version = "0.12", features = ["serde-1"] }
# ...
blas-src = { version = "0.2", default-features = false, features = [] }
openblas-src = { version = "0.6", default-features = false, features = ["system"] }

But that did not help at all.

Add policy gradient methods and re-visit `policies` module

In order to add policy gradient methods we need a more principled implementation of policies. In particular, to enable the calculation of gradients which are required for directing the update in PG algorithms.

  • Upgrade policies module.
  • Implement policy gradient methods.

Evolutionary strategies

We should add some methods for solving RL problems using evolutionary strategies following the work by Open AI (https://arxiv.org/abs/1703.03864). This includes techniques such as:

  • Vanilla Evolutionary Strategy (ES)
  • Parallelised ES
  • Natural ES
  • CMA-ES
  • Novelty Search (NS)
  • Genetic algorithms (maybe)

The list could go on, but this is a good start.

Why Vector type?

It is not obvious on first glance to see that RSRL's Vector<T> type is actually equivalent to ndarray's Array1<A>. Maybe it's best to keep it simple and either use ndarray::Array1, or type Vector<T> = Array1<T>.

Add some simple divergence detection

At the moment agents with large learning rates which lead to divergence will often crash due to NaN values when sampling the policy. This is hard to debug so we need some improved error messages to help users distinguish between coding mistakes and divergence issues.

Agents requiring mutability for evaluation (action selection)

Hi! First of all, big thanks for this crate.

I had some questions regarding agents (or controllers) that require mutability (&mut self) for selecting an action that is currently not supported. An example class of agents are online solvers or planners in partially observable environments. These agents often learn an approximate policy, or some other heuristic, and use that heuristic to guide a policy search or other planning method relevant only to the current 'state'. However, this current state is usually some function of the entire history of actions and observations up until that point.

Thus updating this current state requires mutating some field in the agent, and thus requires a &mut self. This also implies the agent needs some to be 'reset' after an episode, akin to the current handle_terminal of the OnlineLearner trait. Although a copy could also be sufficient to start from the blank 'initial state'.

A concrete example of this is the infinite POMDP [1] (to which my research is related), but in fact it is relevant to any agent that incorporates data from the current episode to have an effect on planning.

Now I was wondering:

  1. Would you be interested in having a compatible API for those agents in this crate?
  2. If so, how would you see an implementation be incorporated here? (I could make a PR)
  • Change the controller trait to take a &mut self and add a handle_terminal method. Conceptually this generalizes the Controller trait, as every struct implementing the current trait could trivially implement the generalization. But this does not seem to be required for the Deep RL agents this crate is focused on (and as such would dirty the interface).
  • Add an OnlineController trait with the proposed changes, and implement it for all the controllers (which should be trivial). Conceptually you have an implementation for OnlineController when you have one for Controller. It should be possible to express that using Rust's trait system (playground example).
  • Anything else you suggest.

Any feedback would be appreciated. To me it seems useful to support this case (albeit in a type alone), as it would allow writing more agents to the type interface this crate provide. But I understand I might be biased.

Thanks in advance.

PS: I have also wondered if it would be useful to separate some types (mostly the learner, domain and controller ones) into a separate crate, as that would allow implementing an agent against these traits without pulling in all the dependencies for all other agents and domains. But I'll keep proposal that for a separate issue.

[1] Doshi-velez, Finale. ‘The Infinite Partially Observable Markov Decision Process’. In Advances in Neural Information Processing Systems 22, edited by Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, 477–485. Curran Associates, Inc., 2009. http://papers.nips.cc/paper/3780-the-infinite-partially-observable-markov-decision-process.pdf.

Revise trace interface

At the moment we handle traces with an enum type that has multiple implementations including replacing and accumulating etc. These cannot be extended. Also, the construction of these objects is clunky at best and even I usually have to look up the constructor definition.

This should be streamlined. Users should be able to extend it themselves. Further, if the user so chooses it would be nice to hide this away and just have sensible defaults, while still allowing more advanced usage.

Not able to run the openai code

When trying to run a code with the openai platform getting the below error

   Compiling rsrl v0.6.0 (/home/nineleaps/thinking/programming-languages/rust-lang/rsrl)
error[E0432]: unresolved imports `crate::geometry::dimensions`, `crate::geometry::RegularSpace`
 --> src/domains/openai/mod.rs:4:5
  |
4 |     dimensions::{Continuous, Discrete},
  |     ^^^^^^^^^^ could not find `dimensions` in `geometry`
5 |     RegularSpace,
  |     ^^^^^^^^^^^^ no `RegularSpace` in the root

error: aborting due to previous error

For more information about this error, try `rustc --explain E0432`.
error: Could not compile `rsrl`.

To learn more, run the command again with --verbose.

Seems like there has been a lot of code changes since the openai features had been integrated.

Implicit compatible function approximation

Find a neat solution to using the policy score function as the features of an LFA instance. The issue at the moment is that the project method only takes a single input. The score function variant would also require the action. There are loads of ways to do this, but we want something that won't require rethinking later down the line.

Adaptive function approximation

At the moment, linear function approximation in rsrl requires stationary feature vectors. However, a lot of research conducted in the area of feature-based representations suggest that learning dependencies online can produce superior results. For example:

  1. Sparse Distributed Memories (SDM)
  2. Adaptive Tile Coding (ATC)
  3. Incremental Feature Dependency Discovery (iFDD)
  4. Bellman-error Basis Functions (BEBF)
  5. Orthogonal Matching Pursuit (OMP)

To do this we need either a new class, perhaps called AdaptiveLinear, which handles changes in the weight vector or just custom linear function approximators for each of the above.

Reduce code duplication with macros

At the moment there are loads of algos that are almost identical in the framework. It can be a bit laborious to do this every time so we should develop some robust macros for implementing families of learning algorithms.

Examples disappeared from master

Hello!

The examples folder seems to be missing in master, and the README.md link points to a 404.
Were the examples moved somewhere else?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.