ibm / loa Goto Github PK

Neuro-Symbolic Reinforcement Learning: Logical Optimal Action (LOA), a novel RL with Logical Neural Network (LNN) on text-based games

License: MIT License

Python 100.00%

machine-learning reinforcement-learning neuro-symbolic-ai neuro-symbolic artificial-intelligence

loa's Introduction

Logical Optimal Action

Logical Optimal Actions (LOA) is an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games. This repository has an implementation of LOA experiments consists of Python package on TextWorld Commonsense (TWC) game.

Setup

Anaconda 4.10.3
Tested on Mac and Linux

git clone --recursive [email protected]:IBM/LOA.git loa
cd loa


# Setup games
git clone [email protected]:IBM/commonsense-rl.git 
cp -r commonsense-rl/games ./
rm -rf commonsense-rl


# Setup environment
conda create -n loa python=3.8
conda activate loa
conda install pytorch=1.10.0 torchvision torchaudio nltk=3.6.3 -c pytorch
pip install -r requirements.txt
python -m spacy download en

cd third_party/amr-cslogic
# Execute installation scripts in INSTALLATION.md for seting up AMR-CSLogic
export FLASK_APP=./amr_verbnet_semantics/web_app/__init__.py
python -m flask run --host=0.0.0.0 --port 5000 &
cd ../../

# If you don't want to run the server
mkdir -p cache
wget -O cache/amr_cache.pkl https://ibm.box.com/shared/static/klsvx54skc5wlf35qg3klo35ex25dbb0.pkl
# Note: This cache only contains sentences for "easy" game which is default in train.py

Train and Test

python train.py

# if you have AMR server
python train.py --amr_server_ip localhost --amr_server_port 5000

Citations

This repository provides code for the following paper, please cite the paper and give a star if you find the paper and code useful for your work.

Daiki Kimura, Subhajit Chaudhury, Masaki Ono, Michiaki Tatsubori, Don Joven Agravante, Asim Munawar, Akifumi Wachi, Ryosuke Kohita, and Alexander Gray, "LOA: Logical Optimal Actions for Text-based Interaction Games", ACL-IJCNLP 2021.

Details and bibtex

The paper presents an initial demonstration of logical optimal action (LOA) on TextWorld (TW) Coin collector, TW Cooking, TW Commonsense, and Jericho. In this version, the human player can select an action by hand and recommendation action list from LOA with visualizing acquired knowledge for improvement of interpretability of trained rules.

@inproceedings{kimura-etal-2021-loa,
    title = "{LOA}: Logical Optimal Actions for Text-based Interaction Games",
    author = "Kimura, Daiki  and  Chaudhury, Subhajit  and  Ono, Masaki  and  Tatsubori, Michiaki  and  Agravante, Don Joven  and  Munawar, Asim  and  Wachi, Akifumi  and  Kohita, Ryosuke  and  Gray, Alexander",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-demo.27",
    doi = "10.18653/v1/2021.acl-demo.27",
    pages = "227--231"
}

Applications for LOA

Subhajit Chaudhury, Sarathkrishna Swaminathan, Daiki Kimura, Prithviraj Sen, Keerthiram Murugesan, Rosario Uceda-Sosa, Michiaki Tatsubori, Achille Fokoue, Pavan Kapanipathi, Asim Munawar and Alexander Gray, "Learning Symbolic Rules over Abstract Meaning Representations for Textual Reinforcement Learning", ACL 2023.
Details and bibtex
Text-based reinforcement learning agents have predominantly been neural network-based models with embeddings-based representation, learning uninterpretable policies that often do not generalize well to unseen games. On the other hand, neuro-symbolic methods, specifically those that leverage an intermediate formal representation, are gaining significant attention in language understanding tasks. This is because of their advantages ranging from inherent interpretability, the lesser requirement of training data, and being generalizable in scenarios with unseen data. Therefore, in this paper, we propose a modular, NEuro-Symbolic Textual Agent (NESTA) that combines a generic semantic parser with a rule induction system to learn abstract interpretable rules as policies. Our experiments on established text-based game benchmarks show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better generalization to unseen test games and learning from fewer training interactions.

Daiki Kimura, Masaki Ono, Subhajit Chaudhury, Ryosuke Kohita, Akifumi Wachi, Don Joven Agravante, Michiaki Tatsubori, Asim Munawar, and Alexander Gray, "Neuro-Symbolic Reinforcement Learning with First-Order Logic", EMNLP 2021.

Details and bibtex

The paper shows an initial experiment of LOA by extracting first-order logical facts from text observation and external word meaning network on TextWorld Coin-collector. The experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark.

@inproceedings{kimura-etal-2021-neuro,
    title = "Neuro-Symbolic Reinforcement Learning with First-Order Logic",
    author = "Kimura, Daiki  and  Ono, Masaki  and  Chaudhury, Subhajit  and  Kohita, Ryosuke  and  Wachi, Akifumi  and  Agravante, Don Joven  and  Tatsubori, Michiaki  and  Munawar, Asim  and  Gray, Alexander",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.283",
    pages = "3505--3511"
}

Subhajit Chaudhury, Prithviraj Sen, Masaki Ono, Daiki Kimura, Michiaki Tatsubori, and Asim Munawar, "Neuro-symbolic Approaches for Text-Based Reinforcement Learning", EMNLP 2021.

Details and bibtex

The paper presents SymboLic Action policy for Textual Environments (SLATE) method which is same concept of LOA. The method outperforms previous state-of-the-art methods for the coin collector game from 5-10x fewer training games.

@inproceedings{chaudhury-etal-2021-neuro,
    title = "Neuro-Symbolic Approaches for Text-Based Policy Learning",
    author = "Chaudhury, Subhajit  and  Sen, Prithviraj  and  Ono, Masaki  and  Kimura, Daiki  and  Tatsubori, Michiaki  and  Munawar, Asim",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.245",
    pages = "3073--3078"
}

Sarathkrishna Swaminathan, Dmitry Zubarev, Subhajit Chaudhury, Asim Munawar, “Reinforcement Learning with Logical Action-Aware Features for Polymer Discovery”, Reinforcement Learning for Real Life Workshop 2021.
Details and bibtex
The paper presents the first application of reinforcement learning in materials discovery domain that explicitly considers logical structure of the interactions between the RL agent and the environment.
```
@conference{swaminathan-etal-2021-reinforcement,
    title = "Reinforcement Learning with Logical Action-Aware Features for Polymer Discovery",
    author = "Swaminathan, Sarathkrishna  and  Zubarev, Dmitry  and  Chaudhury, Subhajit  and  Munawar, Asim",
    booktitle = "Reinforcement Learning for Real Life Workshop",
    year = "2021"
}
```

License

MIT License

loa's People

Contributors

Stargazers

Watchers

Forkers

cognitiveailab hkrsnd tatsubori socioprophet peng-weil naosuke-1111 romandevjavascript sga-tsukasa-nagashima vyass612 liamdgray subhajitchaudhury jaraxxus-me pi-gram ghas-results pyronlaboratory celsopitta

loa's Issues

about INSTALLATION.md

Hi! In the path of third_party/ AMr-cslogic /INSTALLATION.md, the author explained to download AMR parsing models on CCC, but I searched a lot of information but couldn't find the website of CCC. Could you tell me the specific website of CCC?

Error in requirements.txt?

First off, thanks for your efforts in putting this work out there!

I was following the installation instructions and hit this error:

AttributeError: module 'cdd' has no attribute 'Matrix'

Digging into this, I believe your requirements.txt is meant to have pycddlib (this pypi package) rather than cdd (this pypi package). I didn't open a PR because I don't know the version you're actually using -- there is no pycddlib==0.1.4.

Thank you!

Unable to install pycddlib

I have tried to install pycddlib in the requirements.txt file but am running into an error

Installing collected packages: pycddlib
  Running setup.py install for pycddlib ... error
  error: subprocess-exited-with-error
  
  × Running setup.py install for pycddlib did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
      /home/vboxuser/anaconda3/envs/loa/lib/python3.8/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
        warnings.warn(
      running install
      /home/vboxuser/anaconda3/envs/loa/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_ext
      skipping 'cdd.c' Cython extension (up-to-date)
      building 'cdd' extension
      creating build
      creating build/temp.linux-x86_64-cpython-38
      creating build/temp.linux-x86_64-cpython-38/cddlib
      creating build/temp.linux-x86_64-cpython-38/cddlib/lib-src
      gcc -pthread -B /home/vboxuser/anaconda3/envs/loa/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DGMPRATIONAL -Icddlib/lib-src -I/home/vboxuser/anaconda3/envs/loa/include/python3.8 -c cdd.c -o build/temp.linux-x86_64-cpython-38/cdd.o
      In file included from cddlib/lib-src/cdd.h:17,
                       from cdd.c:601:
      cddlib/lib-src/cddmp.h:30:11: fatal error: gmp.h: No such file or directory
         30 |  #include "gmp.h"
            |           ^~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> pycddlib

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Im unsure as to what I am missing here, kindly suggest a solution. Thank you :)

TWC_HOME

Thanks for releasing your code! I have followed the instructions and get the following error when I run it (probably just some missing steps in the instructions?):

(loa) peter@neutronium:~/github/LOA-ScienceWorld$ python train.py
Could not find TWC_HOME. Using default path...
Could not find DDLNN_HOME. Using default path...
Traceback (most recent call last):
  File "train.py", line 78, in <module>
    loa_agent.obtain_admissible_verb(
  File "/home/peter/github/LOA-ScienceWorld/loa_agent.py", line 200, in obtain_admissible_verb
    LogicalTWCQuantifier(difficulty_level,
  File "/home/peter/github/LOA-ScienceWorld/logical_twc.py", line 173, in __init__
    load_twc_game(difficulty_level,
  File "/home/peter/github/LOA-ScienceWorld/logical_twc.py", line 88, in load_twc_game
    game_file_names = game_file_names[game_no]
IndexError: list index out of range

Also, am I correct in that the AMR processor is required to run this model on different environments? Is it possible to get access to it?

Broken AMR?

I do not have access to the provide AMR-CSLogic link found in the setup section however noted that there is this repo available: https://github.com/IBM/AMR-CSLogic. None the less, whether I use that repo or the cache .pkl file provided in the setup section, the training doesn't work.

When using the cached .pkl provided, I get the following error on line 142 in amr_parser.py

Exception: Need the AMR server for "On the dark carpet you can make out a clean red dress."

If I try to run the flask server instead with that other AMR-CSLogic repo I linked above, when it tries to process the same sentence (i.e. On the dark carpet you can make out a clean red dress.) then an exception occurs in the flask server:

loa/third_party/amr-cslogic/amr_verbnet_semantics/service/amr.py", line 25, in _get_parser
    from transition_amr_parser.parse import AMRParser
ModuleNotFoundError: No module named 'transition_amr_parser'
500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
127.0.0.1 - - [14/Jan/2022 16:26:40] "GET /amr_parsing?text=On+the+dark+carpet+you+can+make+out+a+clean+red+dress. HTTP/1.1" 500 -

What is the fundamental purpose of the AMR and can these issues be avoided? Further, it seems that this AMR server only applies to the TWC game, how about other TextWorld games? How might we use this code for such games?

KeyError: 'amr_parse'

Not sure what this error means. Kindly suggest a solution. I executed the recommended AMR-CSLOGIC installation scripts and using flask app to run the model.

Found 32 observations
Mincount:  8
Loaded cache from ./cache/amr_cache.pkl len: 241
 47%|███████        | 15/32 [00:02<00:02,  5.69it/s]
Traceback (most recent call last):
  File "train.py", line 88, in <module>
    loa_agent.extract_fact2logic(difficulty_level=args.difficulty_level,
  File "/home/vboxuser/loa/loa_agent.py", line 633, in extract_fact2logic
    get_verbnet_preds_from_obslist(
  File "/home/vboxuser/loa/amr_parser.py", line 60, in get_verbnet_preds_from_obslist
    rest_amr.obs2facts(obs_text,
  File "/home/vboxuser/loa/amr_parser.py", line 380, in obs2facts
    ret = self.text2amr(text)
  File "/home/vboxuser/loa/amr_parser.py", line 147, in text2amr
    self.cache[sent] = ret[self.json_key][0]
KeyError: 'amr_parse'

"python train.py" command fails without AMR server

Hi!
Thank you releasing this repo!

I am attempting to train a LOA model by running the command:

python train.py

As I understand from the readme file, setting up AMR server is not necessary for training. Instead we can use the pkl file which can be obtained by running the following two commands:

mkdir -p cache
wget -O cache/amr_cache.pkl https://ibm.box.com/shared/static/klsvx54skc5wlf35qg3klo35ex25dbb0.pkl

However, attempting to train using the pkl file results in the error with the message:

Exception: Need the AMR server for "On the desk you make out a clean red dress."

The full traceback of the error:

Found 32 observations
Mincount:  8
AMR is cache only mode
Loaded cache from ./cache/amr_cache.pkl len: 241
  3%|█████▏                                                                                                                                                                 | 1/32 [00:00<00:07,  4.24it/s]
Traceback (most recent call last):
  File "train.py", line 88, in <module>
    loa_agent.extract_fact2logic(difficulty_level=args.difficulty_level,
  File "/home/sushanth/Desktop/eee587/loa/loa_agent.py", line 633, in extract_fact2logic
    get_verbnet_preds_from_obslist(
  File "/home/sushanth/Desktop/eee587/loa/amr_parser.py", line 60, in get_verbnet_preds_from_obslist
    rest_amr.obs2facts(obs_text,
  File "/home/sushanth/Desktop/eee587/loa/amr_parser.py", line 380, in obs2facts
    ret = self.text2amr(text)
  File "/home/sushanth/Desktop/eee587/loa/amr_parser.py", line 143, in text2amr
    raise Exception('Need the AMR server for "' + sent + '"')
Exception: Need the AMR server for "On the desk you make out a clean red dress."

So could you please let me know whether training is not possible without setting up the AMR server? If it is possible, what might be the issue here?
Any insights you might have will be helpful.

Thanks!

PS: I was successfully able to run the nesa-demo repo and visualise the workings of a trained LOA model. So, I am sure the requirements are all good.

ibm / loa Goto Github PK

loa's Introduction

Logical Optimal Action

Setup

Train and Test

Citations

Applications for LOA

License

loa's People

Contributors

Stargazers

Watchers

Forkers

loa's Issues

Recommend Projects

Recommend Topics

Recommend Org