Comments (3)
Thank you for the questions. I am also a safe-RL learner. Here are my comments about your questions.
1-) There are two contained RL methods in general. First, peak constraint RL which deals with the constraints on the reward function itself, the other method is the average constraint RL which tries to minimize the cost with extra value function while trying to maximize the reward. So for average constrain formulation yes it is required. I did not get you what do you mean by "actor loss by the cost from off-policy data"
2-) I have not inspected the change of the lambda to be honest, but with little modification on my code, u can also inspect the lambda value. The reason for doing max(0, lambda) for lagrangian optimization is to keep the Lamba in a positive scale but again I have to work on it to give you a proper answer. These days I am busy with other stuff.
from sac-lagrangian.
Thank you so much for your reply.
Maybe "constructing the critic loss by the cost from off-policy data" is proper.
'the cost' is the 'ci' in each step (i is the constraint number).
Let me reorganize my words. The reason why I say that is because we adopt an extral critic network to get safety value now, thus we get the actor loss as : alphalog(pi) - Q_critic + Q_safety, and the critic loss has two types (Q_critic and Q_safety, they are all the distance between the Q prediction and real Q value coming from sampled data).
However, I think the Q_safety can be perhaps replaced by the cost that we collected before, which means there is no Q_safety in actor loss. And we add the consideration of safety into the critic loss, the critic loss is now equal to the distance between the network prediction and (a real Q value minus lambdacost), the latter one only depends on the sampled data. Just like the SAC with automatic temperature adjustment, it adjusts alpha without extra network.
M a new guy with safe RL, and hoping to receive your suggestions.
from sac-lagrangian.
from sac-lagrangian.
Related Issues (3)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sac-lagrangian.