Code Monkey home page Code Monkey logo

dtqn's Introduction

Hi there ๐Ÿ‘‹, I'm Kevin Esslinger! But you can call me Kev

Want to chat? Feel free to send me a message ๐Ÿ“ฌ, a follow ๐Ÿฆ, an invitation to connect ๐Ÿ“ฅ, or an email ๐Ÿ“ง! (Or all of the above ๐Ÿ˜)

Discord Badge Twitter Badge Linkedin Badge Outlook

Bio

  • ๐Ÿ‘จโ€๐Ÿ’ป I'm working as a Graduate Software Developer at Flow Traders in Amsterdam, The Netherlands.
  • ๐ŸŽ“ Master of Science in Computer Science from Northeastern University in the Khoury College of Computer Sciences in Boston, Massachusetts. Graduated December 2022. Co-advised by Chris Amato and Robert Platt, worked on using transformers to solve challenging, partially observable tasks with reinforcement learning
  • ๐ŸŽ“ Bachelor of Science in Computer Science and Mathematics with a minor in Data Science from Temple University in Philadelphia, Pennsylvania. Graduated Summa Cum Laude in May 2020
  • ๐Ÿฆ… Achieved the rank of Eagle Scout

My favourite technologies:

  • Programming languages: PythonJavaGo
  • Text editors: Visual Studio CodeEmacs
  • Machine learning lbiraries: PyTorchTensorFlowKerasNumPyPandas
  • Operating system: LinuxUbuntu

News

  • ๐Ÿ““ My first paper, Deep Transformer Q-Networks for Partially Observable Reinforcement Learning, was published to the Neurips 2022 Workshop on Foundation Models for Decision Making! It's publicly available here on arXiv. The code for the paper is available at my DTQN repo

More about me

  • ๐Ÿšดโ€โ™‚๏ธ Casual city biker
  • ๐ŸŽฎ Teamfight Tactics and Magic: the Gathering player
  • ๐Ÿ€ 76ers fans
  • ๐Ÿˆ Ravens and Eagles fan
  • โ˜• Latte art amateur
  • ๐Ÿ“œ Check out my resume here

dtqn's People

Contributors

kevslinger avatar lyu-xg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dtqn's Issues

Getting an error stating environment D does not exist

I tried to run the basic version of the code python run.py without installing any of the additional packages and getting this error

WARNING: ``gym_gridverse`` is not installed. This means you cannot run an experiment with the `gv_*` domains.
WARNING: ``gym_gridverse`` is not installed. This means you cannot run an experiment with the gv_*.yaml domains.
WARNING: ``gym_pomdps`` is not installed. This means you cannot run an experiment with the HeavenHell or Hallway domain. 
WARNING: ``mini_hack`` is not installed. This means you cannot run an experiment with any of the MH- domains.
Loading using gym.make
Environment with id D not found.
Loading using YAML
Traceback (most recent call last):
  File "/...../DTQN-main/utils/env_processing.py", line 34, in make_env
    env = gym.make(id_or_path)
  File "/......./lib/python3.10/site-packages/gym/envs/registration.py", line 569, in make
    _check_version_exists(ns, name, version)
  File "/......./lib/python3.10/site-packages/gym/envs/registration.py", line 219, in _check_version_exists
    _check_name_exists(ns, name)
  File "/....../lib/python3.10/site-packages/gym/envs/registration.py", line 197, in _check_name_exists
    raise error.NameNotFound(
gym.error.NameNotFound: Environment D doesn't exist. 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/....../DTQN-main/run.py", line 533, in <module>
    run_experiment(get_args())
  File "/......./DTQN-main/run.py", line 415, in run_experiment
    envs.append(env_processing.make_env(env_str))
  File "/....../DTQN-main/utils/env_processing.py", line 39, in make_env
    inner_env = factory_env_from_yaml(
NameError: name 'factory_env_from_yaml' is not defined

But then I did installation of gridverse and ran python3 run.py --envs DiscreteCarFlags-v0 --device mps and then it ran successfully.

I'm trying to connect DTQN network with SUMO. I want to know inputs of DTQN.

Hello, again.
As I mention on the title. I'm trying to connect DTQN network with SUMO as an environment. I want to know inputs of DTQN.
I think the inputs are Observations as I read the following paper.
But I don't know the format.
I'm struggle really hard on finding the input format of DTQN.
Which python file should I check? Could you tell me the location, Please?
Furthermore, I'm trying to use sequential numbers as states. Ex) relative distance between agent and other cars, agent's velocity and current lane number.
And I wonder DTQN architecture can handle this state as observation.
Because DQN's inputs seem like sequential images. Is DTQN's inputs are sequential images too?

Error run.py

(DTQN) mds@mds:~/DTQN$ python -u "/home/mds/DTQN/run.py"
Loading using gym.make
Environment with id D not found.
Loading using YAML
Traceback (most recent call last):
File "/home/mds/DTQN/utils/env_processing.py", line 34, in make_env
env = gym.make(id_or_path)
File "/home/mds/anaconda3/envs/DTQN/lib/python3.8/site-packages/gym/envs/registration.py", line 142, in make
return registry.make(id, **kwargs)
File "/home/mds/anaconda3/envs/DTQN/lib/python3.8/site-packages/gym/envs/registration.py", line 86, in make
spec = self.spec(path)
File "/home/mds/anaconda3/envs/DTQN/lib/python3.8/site-packages/gym/envs/registration.py", line 115, in spec
raise error.Error('Attempted to look up malformed environment ID: {}. (Currently all IDs must be of the form {}.)'.format(id.encode('utf-8'), env_id_re.pattern))
gym.error.Error: Attempted to look up malformed environment ID: b'D'. (Currently all IDs must be of the form ^(?:[\w:-]+/)?([\w:.-]+)-v(\d+)$.)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/mds/DTQN/run.py", line 533, in
run_experiment(get_args())
File "/home/mds/DTQN/run.py", line 415, in run_experiment
envs.append(env_processing.make_env(env_str))
File "/home/mds/DTQN/utils/env_processing.py", line 39, in make_env
inner_env = factory_env_from_yaml(
File "/home/mds/anaconda3/envs/DTQN/lib/python3.8/site-packages/gym_gridverse/envs/yaml/factory.py", line 243, in factory_env_from_yaml
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/mds/DTQN/envs/gridverse/D'

When I executed run.py , got above Error. I think I installed right. But might installed in wrong path. I don't understand envs/gridverse/D.
Please help me execute run.py

Question about transformer with DQN

Hi, Kev
It's glad to know your work about DTQN.
I am very curious about why the work of combine Transformer and DQN is very small ,and this two technology is emit very early.
Because I thought there would be a lot of work in that point, but it is not.
As I know , It's maybe just one paper 'TRANSFORMER BASED REINFORCEMENT LEARNING FOR GAMES' before your work, and in that paper DTQN is not good as DRQN.
Do you have some insight about this?

Reproduction of GridVerse results

Hi!

Thanks for sharing your intriguing ideas on how to setup a transformer-based memory DRL algorithm. I'm interested in the way how the interface of the transformer works during inference and optimization. So I started out by simply trying to reproduce your GridVerse results as stated in your readme.

I'm currently running 3 repetitions of this experiment:

python run.py --env gv_memory.7x7.yaml --inembed 128 --disable-wandb --verbose

The success rate stays zero for the entire training so far.

[ December 15, 14:00:03 ] Training Steps: 699000, Success Rate: 0.00, Return: -25.00, Episode Length: 500.00, Hours: 3.77
[ December 15, 14:00:23 ] Training Steps: 700000, Success Rate: 0.00, Return: -25.00, Episode Length: 500.00, Hours: 3.78
[ December 15, 14:00:42 ] Training Steps: 701000, Success Rate: 0.00, Return: -25.00, Episode Length: 500.00, Hours: 3.78
[ December 15, 14:01:02 ] Training Steps: 702000, Success Rate: 0.00, Return: -25.00, Episode Length: 500.00, Hours: 3.79

I'm pretty sure I missed something. It would be great if you could help.

edit:
Training on 5x5 looks pretty volatile in comparison to the reported results.

[ December 15, 14:10:52 ] Training Steps: 891000, Success Rate: 0.30, Return: -2.24, Episode Length: 4.90, Hours: 3.92
[ December 15, 14:11:04 ] Training Steps: 892000, Success Rate: 0.70, Return: 1.78, Episode Length: 4.40, Hours: 3.92
[ December 15, 14:11:17 ] Training Steps: 893000, Success Rate: 0.40, Return: -1.20, Episode Length: 4.00, Hours: 3.92
[ December 15, 14:11:30 ] Training Steps: 894000, Success Rate: 0.70, Return: -0.42, Episode Length: 48.40, Hours: 3.93

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.