Code Monkey home page Code Monkey logo

agreeablepdsignals's Introduction

This is a small thing to try to get experienced agents to use "social approval" signals to train inexperienced agents.

Agents learn to value approval from different agents in its surroundings, and this serves to form groups. 
Hopefully, humans will be able to step in and play the role of these social approval instructors.

Halfway through, I'll try to integrate this with minimal symbolic communication and OORL.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Initial project: agents which choose which to play Prisoner's Dilemma with, given prior experiences.

Everyone goes in a circle, picking their opponent for Prisoner's Dilemma. The rewards are such that you want to be picked (positive; 0 for everyone not in the game that round).
Basically, there are two games going on: which opponent to pick when it's your turn, and how to play within the round.

Try simple Q-learning first; build models (including the "new" moniker) second.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Structure:

Environment variable: "In-game" and "choosing player."

"In-game" will also show the User-ID of the opponent, so the agent can learn different strategies based on different users.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Prediction


Most likely, we'll get tons of defections initially, until I train it with ready-made "dummy agents" which always defect or always approve.

The idea of reacting to a "new agent," but the "new" label itself is transferable, will also be worth exploring.

That way, the agent develops a robust policy for treating "new" agents.
An additional complication can be "Group ID" or "Queue ID," where the agent can gain experience about reputation in different groups and learn to be 
agreeable initially, even if no one from the previous group was willing to engage with it. You'll need a low gamma to make this work, of course, and 
likely some additional dummy agents.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Further Work

Learning to be agreeable in a social setting is a little artificial, and very doable. It's really not what I'm setting out to do, though; 
after that, I will be able to expose the expected-Q-value to different agents (no "conscious" signals yet), and they can react differently to 
"friendly" and "unfriendly" agents.

After that, either that, or something like it, can be used for instruction by skilled agents for unskilled agents.

After that, we can start "manipulation" games by giving control over that signal away.
Discrete symbols and instructions can be added after that. Or, more interestingly, model-based symbolic instructions. 
But that is far away.

For now, focus on the PD setting.

agreeablepdsignals's People

Contributors

atbolsh avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.