Code Monkey home page Code Monkey logo

miniwob-plusplus's Introduction

pre-commit Code style: black

The MiniWoB++ (Mini World of Bits++) library contains a collection of over 100 web interaction environments, along with JavaScript and Python interfaces for programmatically interacting with them. The Python interface follows the Gymnasium API and uses Selenium WebDriver to perform actions on the web browser.

MiniWoB++ is an extension of the OpenAI MiniWoB benchmark, and was introduced in the paper Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration.

The documentation website is at miniwob.farama.org. Development on MiniWoB++ is currently ongoing to bring it up to Farama Standards for mature projects, and will be maintained long term after this point. See the Project Roadmap for more details. If you'd like to help out, you can join our discord server here: https://discord.gg/PfR7a79FpQ.

Installation

MiniWoB++ supports Python 3.8+ on Linux and macOS.

Installing the MiniWoB++ Library

To install the MiniWoB++ library, use pip install miniwob.

Installing Chrome/Chromium and ChromeDriver

We strongly recommend using Chrome or Chromium as the web browser, as other browsers may render the environments differently.

The MiniWoB++ Python interface uses Selenium, which interacts with the browser via the WebDriver API. Follow one of the instruction methods to install ChromeDriver. The simplest method is to download ChromeDriver with the matching version, unzip it, and then add the directory containing the chromedriver executable to the PATH environment variable:

export PATH=$PATH:/path/to/chromedriver

For Chromium, the driver may also be available in a software package; for example, in Debian/Ubuntu:

sudo apt install chromium-driver

Example Usage

The following code performs a deterministic action on the click-test-2 environment.

import time
import gymnasium
import miniwob
from miniwob.action import ActionTypes

gymnasium.register_envs(miniwob)

env = gymnasium.make('miniwob/click-test-2-v1', render_mode='human')

# Wrap the code in try-finally to ensure proper cleanup.
try:
  # Start a new episode.
  obs, info = env.reset()
  assert obs["utterance"] == "Click button ONE."
  assert obs["fields"] == (("target", "ONE"),)
  time.sleep(2)       # Only here to let you look at the environment.
  
  # Find the HTML element with text "ONE".
  for element in obs["dom_elements"]:
    if element["text"] == "ONE":
      break

  # Click on the element.
  action = env.unwrapped.create_action(ActionTypes.CLICK_ELEMENT, ref=element["ref"])
  obs, reward, terminated, truncated, info = env.step(action)

  # Check if the action was correct. 
  print(reward)      # Should be around 0.8 since 2 seconds has passed.
  assert terminated is True
  time.sleep(2)

finally:
  env.close()

See the documentation for more information.

Environments

The list of the environments that were included in the MiniWoB++ library can be found in the documentation. All environments share the same observation space, while the action space can be configured during environment construction.

Citation

To cite this project please use:

@inproceedings{liu2018reinforcement,
 author = {Evan Zheran Liu and Kelvin Guu and Panupong Pasupat and Tianlin Shi and Percy Liang},
 title = {Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration},
 booktitle = {International Conference on Learning Representations ({ICLR})},
 url = {https://arxiv.org/abs/1802.08802},
 year = {2018},
}

miniwob-plusplus's People

Contributors

elliottower avatar jjshoots avatar jkterry1 avatar mgoulao avatar ppasupat avatar pseudo-rnd-thoughts avatar rodrigodelazcano avatar younik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

miniwob-plusplus's Issues

Recording Your Own Demonstrations : Not Working

Hi ! I'm a RL beginner and I tried to record a demonstration on the click-test.html
When I write the path : file:///path/to/miniwob-plusplus/html/miniwob/click-test.html, the page is rendered correctly.
Then as explained in the instructions, I created a directory named 'out' and launched the record.py script.

My problem is that when I add '?record=true' to the path of the website, I get 'ERR_FILE_NOT_FOUND' in my browser (Google Chrome).

Thank you in advance for your answer.

[Proposal] Implement missing actions

Proposal

Implement the missing actions.

Motivation

Some actions cannot be performed, preventing some environments from being solved:

  • Dragging
  • Scroll wheel
  • Emit keys (e.g., Ctrl+C)

Pitch

Implement such actions in Selenium.

Alternatives

Most of these actions look doable but could run into issues:

  • Getting observation in the middle of a mouse drag might cause issues.
  • Scroll wheel needs an arbitrary scroll offset.
  • Clipboard might not work or might interfere with the system clipboard (since we're not sandboxing the environment).

There are probably hacks for these issues (e.g., replace clipboard copy/paste with JS calls). The worst case scenario is that they cannot be done, in which case we might need to consider moving away from Selenium.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Proposal] Put task fields in the observation space.

Proposal

Put task fields in the observation space.

Motivation

Task fields are the fields extracted from the natural language instructions using RegExp. For example:

"Click on the ONE button." --> {"target": "ONE"}

Many previous works on MiniWoB++ use task fields in the action space (mostly for emit-text actions). Currently the task fields are extracted using ad-hoc methods (fields.py) and are not part of the observation space (they are in the infos dict).

Pitch

  • Move the field extractors to the environment classes.
  • Include the fields (keys + values, with a fixed key ordering) as a sequence of fixed length to the observation space. The fixed length makes it easy for RL agents to predict. The padding entries can have empty strings as keys and values.
  • Remove the "dummy" field. It was a hack for the RL agent in the MiniWoB paper.

Alternatives

  • Put the field extractors in the HTML/JS code.
  • Use a variable-length sequence for fields. This makes it difficult to numpify.

Checklist

  • I have checked that there is no similar issue in the repo (required)

Project Roadmap

This is a loose roadmap of our plans for major changes to MiniWoB++:

May:

  • Release:
    • PyPI release and GitHub CI for auto-release

June:

  • Documentation
    • Write a guide for authoring new environments.
  • Code
    • Handle Unicode properly.
    • Add FlightWoB tasks.

July:

  • Release:
    • Mature version release
  • Documentation
    • Document other properties of the environments such as reward function, time limit, etc.
  • Environments
    • Add the missing tasks.

TBD:

  • Code
    • Enhance the demonstration recorder.
    • Refactor Instance to support other backends than Selenium (e.g., X11).
  • Environments
    • Add more FlightWoB-style tasks.

Development on MiniWoB++ is currently ongoing to bring it up to Farama Standards for mature projects, and will be maintained long term after this point. To contribute, please read CONTRIBUTING.md and join our discord server here: https://discord.gg/PfR7a79FpQ.

[Proposal] Legacy code release for backward compatability

Proposal

Create a release containing the legacy code (before migrating to Farama).

Motivation

A few projects depend on the old code structure. Examples include:

To prevent breaking such codebases, it would be nice to create a release containing the legacy MiniWoB++ code.

Pitch

Create a release at commit 833a477a8fbfbd2497e95fee019f76df2b9bd75e, which is right before the migration.

Alternatives

The release should have minimal effects on the current code development.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Bug Report] Missing FlightWoB environments

If you are submitting a bug report, please fill in the following details and use the tag [bug].

Describe the bug

Three FlightWoB environments are listed in the documentation under "Flight Search Tasks" but they are not implemented.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Proposal] Alternative action spaces

Proposal

Add a way to customize the action space.

Motivation

Previous works on MiniWoB++ use different action spaces:

Many RL methods work best on discrete action spaces, but it trades off with generality (e.g., binned cursor position vs any cursor position).

Pitch

Add ActionSpaceConfig that allows the user to customize:

  • The list of action types to support
  • Whether the cursor position should be binned
  • The list of allowed keyboard keys
  • Whether the emitted text is freeform or selected from a task field

Alternatives

One alternative is to always include all action types in the space (including redundant ones like clicking elements + clicking coordinates). Downsides include:

  • The RL agent can no longer define a simple multinomial distribution over the action types (a mask is needed).
  • If we add new action types, the action indices will change.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Bug Report] Documentation: Environment pages were not generated.

Describe the bug
The environment list https://farama-foundation.github.io/miniwob-plusplus/environments/list/ contains links that go nowhere.

From the GitHub Actions log, the "Build Envs Docs" step (docs/_scripts/gen_mds.py) has the following errors:

ID: miniwob/bisect-angle-v1
No module named 'miniwob.envs'
ID: miniwob/book-flight-v1
No module named 'miniwob.envs'
...

I couldn't reproduce this locally. On my local machine I was able to generate the environment pages. Probably a dependency issue.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Proposal] Add documentations in docs/

Proposal

Add documentations in docs/

Motivation

The documentation currently lives in README.md. It is quite incomplete and out-of-date. The list of tasks also disappeared.

Pitch

Write documentations as Markdown in docs/, which will be rendered with Sphinx. Follow examples from other projects such as MAgent2.

Checklist

  • I have checked that there is no similar issue in the repo (required)

Checklist for Maturity

Checklist for Maturity

(Copied from Farama-Foundation/stable-retro#21)

  • Fully deterministic (as far as possible)
  • Explicit Versioning
  • Farama Notifications
  • PyPI
  • Full Package Description
  • Deploy via GH Actions
  • Website
  • Linux and MacOS
  • Precommit
  • Typehinting
  • Docstrings
  • CI Testing
  • Logo
  • CoC (code of conduct)
  • Gymnasium API
  • Python Versions
  • JKTerry and Mark as owners (pypi)
  • TOTP (time based one time password)
  • Google Analytics
  • License
  • Sponsor this project button

[Proposal] Add type hinting where reasonable

Proposal

Add type hints and enable the Pyright type checker in git pre-commit.

Motivation

Type hint is part of the Farama project standards: https://farama.org/project_standards.

Pitch

  • Add type hints to all files inside the miniwob package. Use type hint features compatible with the oldest supported Python (3.7).
  • Enable Pyright type checker in git pre-commit (currently it is commented out).

Alternatives

None.

Additional context

None.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Question] The format of 12K human demonstration

Question

Hi,
I'm confused with the human demonstrations provided in https://github.com/stanfordnlp/miniwob-plusplus-demos. These demonstrations seem mussy, which has dozens of (eg: 20+) state contain mouse up/down and keyboard up/down in one trajectory. Is there any method to get the cleaned or simplified actions, e.g. {'action': click, 'ref': '6'}, {'action': "type", 'ref': '10', "typed_text": "John"}. I want to use these 12k demonstrations to supervised finetuned my own model.

Thanks a lot!

[Proposal] Add Continuous Integration tests

Proposal

Add Continuous Integration (CI) tests to the codebase.

Motivation

Currently the tests in the Python module require a manual pytest run. It only tests correctness and not the code quality. We want to add automatic tests to ensure both the code quality and correctness.

Pitch

  • Create CI tests with Pyright.
  • Make sure the tests pass on both Linux and MacOS.

Alternatives

None. This is a required project standard.

Additional context

One potential issue is that the environment is not sandboxed: it uses Selenium and depends on an external Chrome installation. This could cause flaky tests (e.g., in a small number of subtasks, navigating away from the Chrome window could fail the test). We will need to find a way to address this.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Proposal] Port MiniWoBEnvironment to the gym.Env interface

Proposal

Port MiniWoBEnvironment to the gym.Env interface.

Motivation

This is an effort toward making the MiniWoB codebase compatible with Gymnasium.

Pitch

Perform the following steps:

  • Convert MiniWoBEnvironment to the gym.Env interface, following the instructions on the Make your own custom environment
    page.
  • Make sure it passes the test in gymnasium.utils.env_checker.check_env.

Alternatives

None. This is a required project standard.

Additional context

None.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Proposal] Add tests covering all environments

Proposal

Add more tests to cover all existing environments in MinoWoB++.

Motivation

The current tests only cover a subset of elements. In particular, the tests do not cover the more complex tasks and FlightWoB tasks.

Pitch

Add the following tests on all existing environments in MinoWoB++.

  • Start a web server automatically for FlightWoB tasks (the file:// protocol does not work).
  • Confirm that reset() and step() work on the task without errors, and that getting a reward is possible.
  • Confirm seed determinism.
  • For tasks with algorithmic solutions, confirm that the tasks can be solved (with positive rewards).
  • Confirm that all observations and actions are in the spaces (e.g., test if the numerical values are within the bounds; test the length and charset of Text spaces).

Alternatives

  • Manual verification (e.g., interacting with the tasks on Google Chrome).

Additional context

None.

Checklist

  • I have checked that there is no similar issue in the repo (required)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.