Code Monkey home page Code Monkey logo

Comments (3)

milanboers avatar milanboers commented on July 28, 2024

I think you could do something like this (pseudo code because I haven't touched Rust in a while 😁):

impl TerminationStrategy for StopAtTerminalState {
  fn should_stop(state) {
    state.actions().is_empty()
  }
}

so the algorithm stops on a terminal state

The problem then is that it may not have discovered much yet, so you'll have to run multiple episodes of the algorithm:

let mut trainer = AgentTrainer::new();

for i in 1..1000 {
    let mut agent = MyAgent { state: MyState { x: 0, y: 0 }};
    trainer.train(&mut agent,
              &QLearning::new(0.2, 0.01, 2.),
              &StopAtTerminalState ::new(),
              &RandomExploration::new());
}

Let me know if it helps.. I think it's a common scenario, so I could put something in the project to make it easier.


in many cases the board state cannot be improved any further, and it should back track to try a different action

Note that you are supposed to return all possible actions, not just actions that would lead to an improved state.

from rurel.

nyurik avatar nyurik commented on July 28, 2024

Thanks @milanboers!

One issue is that calling state.actions() inside the should_stop() would repeat the actions list generation - in my case this is a fairly expensive process. Caching it is tricky because should_stop(&mut self, state: &S) gets non-mutable state, and neither is fn actions(), so cloning would be required (might get expensive).

in many cases the board state cannot be improved any further, and it should back track to try a different action

Note that you are supposed to return all possible actions, not just actions that would lead to an improved state.

Of course. I always generate all available actions - its just that all my actions are "additive" (like in Go) -- i keep adding to the board while its possible, but at some point there are no more valid moves.

from rurel.

milanboers avatar milanboers commented on July 28, 2024

If generating the actions is expensive, you can try to generate only one possible action (at random). You're essentially combining the action generation with the exploration strategy then.

It's a bit of a shortcoming of putting things into a framework like this. You may just want to implement Q-learning yourself, the algorithm is relatively straightforward.

from rurel.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.