Code Monkey home page Code Monkey logo

progprompt-vh's Introduction

ProgPrompt on VirtualHome

This is the code release for the paper ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. It contains code for replicating the results on the VirtualHome dataset.

Setup

Create a conda environment (or your virtualenv):

conda create -n progprompt python==3.9

Install dependencies:

pip install -r requirements.txt

Clone VirtualHome and install it by running:

pip install -e .

Note: If you an encounter an error to do with wrong number of arguments to function execute, then in file virtualhome/src/virtualhome/simulation/evolving_graph/execution.py line 67, add *args as follows:

    def execute(self, script: Script, state: EnvironmentState, info: ExecutionInfo, char_index, *args):

This was tested on VirtualHome commit f84ee28a75b23318ee1bf652862b1c993269cd06.

Finally, download the virtualhome unity simulator and make sure it runs. The simulator can run on the desktop, or on a virtual x-server.

Running evaluation

Here is a minimal example how to run the evaluation script. Replace {arguments in curly braces} with appropriate values on your system:

python3 scripts/run_eval.py --progprompt-path $(pwd) --expt-name {expt_name} --openai-api-key {key} --unity-filename {v2.3_virtualhome_sim} --display {0}

For more options and arguments, look inside scripts/run_eval.py.

progprompt-vh's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

progprompt-vh's Issues

evaluation metric

In run_eval.py,
results["overall"] = {'PSR': sum(sr)/len(sr),
"SR": sr.count(1.0)/len(sr),
"Precision": 1-sum(unchanged_conds)/sum(total_unchanged_conds),
"Exec": sum(exec_per_task)/len(exec_per_task)
}
Could you please explain which is "SR", "Exec", "GSR" in the paper? Based on my understanding, SR is calculated by "PSR" or "SR" , "Exec" is obtained by "Exec" in the code. But how to get "GCR"? Is that same as "Precision"? Checking if the executor keeps the states which should keep unchanged during the whole set of executions, unchanged, and translating it into the overlapping between the final achieved state g' and ground truth final state g.

After the program runs, the action command can be issued, but the virtualhoem is always a black screen, and the animation cannot be seen

[WALK] (205) [0]
[FIND] (247) [0]
[FIND] (272) [0]
[GRAB] (272) [0]
[PUTIN] (272) (247) [0]
[FIND] (271) [0]
[GRAB] (271) [0]
[PUTIN] (271) (247) [0]
Executing: throw away the lime

[WALK] (205) [0]
[FIND] (229) [0]
[FIND] (440) [0]
[GRAB] (440) [0]
[FIND] (229) [0]
[OPEN] (229) [0]
[PUTIN] (440) (229) [0]
[CLOSE] (229) [0]
Executing: put the wine glass in the kitchen cabinet

[WALK] (205) [0]
[FIND] (198) [0]
[GRAB] (198) [0]
[FIND] (236) [0]
[OPEN] (236) [0]
[OPEN] (236) [0]
[PUTIN] (198) (236) [0]
[OPEN] (236) [0]
[CLOSE] (236) [0]
Executing: put the candle on the living room shelf

[WALK] (335) [0]
[FIND] (69) [0]
[GRAB] (69) [0]
[FIND] (43) [0]
[FIND] (69) [0]
[GRAB] (69) [0]
[PUTIN] (69) (250) [0]
Executing: listen to radio

[WALK] (335) [0]
[FIND] (176) [0]
[FIND] (262) [0]
[SWITCHOFF] (262) [0]
[SWITCHON] (262) [0]
[FIND] (176) [0]
[FIND] (428) [0]
[SWITCHON] (428) [0]
[FIND] (176) [0]
[FIND] (176) [0]
[SWITCHOFF] (176) [0]
[SWITCHON] (176) [0]
Executing: bring pillow to the sofa

[WALK] (335) [0]
[FIND] (186) [0]
[GRAB] (186) [0]
[FIND] (368) [0]
[PUTIN] (186) (368) [0]
Executing: open window

[WALK] (70) [0]
[FIND] (70) [0]
[OPEN] (70) [0]
Executing: cut apple

[WALK] (205) [0]
[FIND] (438) [0]
[GRAB] (438) [0]
[FIND] (283) [0]
[GRAB] (283) [0]
[FIND] (231) [0]
[FIND] (438) [0]
[GRAB] (438) [0]
[PUTIN] (438) (231) [0]
[FIND] (231) [0]
[PUTIN] (283) (231) [0]
Executing: wash mug

[WALK] (205) [0]
[FIND] (247) [0]
[FIND] (248) [0]
[FIND] (50) [0]
[FIND] (248) [0]
[SWITCHON] (248) [0]
[FIND] (194) [0]
[FIND] (447) [0]
[FIND] (447) [0]
[GRAB] (447) [0]
[FIND] (247) [0]
[PUTIN] (447) (247) [0]
[FIND] (267) [0]
[GRAB] (267) [0]
[FIND] (247) [0]
[PUTIN] (267) (247) [0]
[FIND] (266) [0]
[GRAB] (266) [0]
[FIND] (247) [0]
[PUTIN] (266) (247) [0]

----Results----
{'PSR': 0.30428571428571427, 'SR': 0.2, 'Precision': 0.9966354241768806, 'Exec': 0.8809396159396158}

After the program runs, the action command can be issued, but the virtualhome is always a black screen, and the animation cannot be seen

OpenAI API Update

Kindly Update the code in utils_execute.py to be compatible with the new openai SKD.
When crafting a response (line 38), one now needs to use "openai.completions.create" instead of "open.Completion.create". You can find the link for this conversation : here

I also suggest updating the Readme to also mention the filepath changes that need to be made in virtualhome/virtualhome/init.py , and the location of where virtualhome needs to be cloned w.r.t. progprompt, to make setup easier for the first-time user.

Sincerely

question for test dataset

I noticed in your article that your dataset has 70 tasks, but I only found 34 of them in github, where are the rest of the tasks please? Thank you

question for executing generated plan

I ran your code and generated a plan, and when this plan is running the simulator is just switching images very quickly from frame to frame without playing any animation. I'd like to say if this is due to a different version of the emulator, but I noticed we're both using version 2.3 of the emulator. If you guys could provide a download link for the emulator that would be great! Thank you.

gpt_version issuses

Hello, author. I am very interested in your work, but I have encountered some issues during the learning process and I hope to get your answers. I have set the "gpt_version" in the "run_eval.py" file to "gpt-3.5-turbo-instruct" and "gpt-3.5-turbo-instruct-0914". However, I noticed that when using these versions, all values for SR=0 are coming out as zero. Is this normal? Also, when using "code-davinci-002", it shows that the model does not exist. Could you please let me know which model versions I can use for training and testing under the current API key?
Snipaste_2023-12-31_00-28-22
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.