Code Monkey home page Code Monkey logo

Comments (3)

ammarhydr avatar ammarhydr commented on June 10, 2024

Thank you for the questions. I am also a safe-RL learner. Here are my comments about your questions.
1-) There are two contained RL methods in general. First, peak constraint RL which deals with the constraints on the reward function itself, the other method is the average constraint RL which tries to minimize the cost with extra value function while trying to maximize the reward. So for average constrain formulation yes it is required. I did not get you what do you mean by "actor loss by the cost from off-policy data"

2-) I have not inspected the change of the lambda to be honest, but with little modification on my code, u can also inspect the lambda value. The reason for doing max(0, lambda) for lagrangian optimization is to keep the Lamba in a positive scale but again I have to work on it to give you a proper answer. These days I am busy with other stuff.

from sac-lagrangian.

ZhihanLee avatar ZhihanLee commented on June 10, 2024

Thank you so much for your reply.
Maybe "constructing the critic loss by the cost from off-policy data" is proper.
'the cost' is the 'ci' in each step (i is the constraint number).
Let me reorganize my words. The reason why I say that is because we adopt an extral critic network to get safety value now, thus we get the actor loss as : alphalog(pi) - Q_critic + Q_safety, and the critic loss has two types (Q_critic and Q_safety, they are all the distance between the Q prediction and real Q value coming from sampled data).
However, I think the Q_safety can be perhaps replaced by the cost that we collected before, which means there is no Q_safety in actor loss. And we add the consideration of safety into the critic loss, the critic loss is now equal to the distance between the network prediction and (a real Q value minus lambda
cost), the latter one only depends on the sampled data. Just like the SAC with automatic temperature adjustment, it adjusts alpha without extra network.
M a new guy with safe RL, and hoping to receive your suggestions.

from sac-lagrangian.

ammarhydr avatar ammarhydr commented on June 10, 2024

from sac-lagrangian.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.