princeton-nlp / intercode Goto Github PK

View Code? Open in Web Editor NEW

175.0 175.0 30.0 100.95 MB

[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898

Home Page: https://intercode-benchmark.github.io/

License: MIT License

Dockerfile 0.56% Python 56.95% Shell 3.77% Jupyter Notebook 30.60% C 2.93% Java 3.15% Assembly 1.56% PowerShell 0.49%

intercode's Issues

setup error

Hi John,

Thanks for the great work. I'm following the steps of building from source and have such an error, when I run python run_demo.py sql

Traceback (most recent call last):
  File "/net/nfs2.mosaic/yuchenl/intercode/run_demo.py", line 7, in <module>
    from experiments.policies import HumanPolicy
  File "/net/nfs2.mosaic/yuchenl/intercode/experiments/policies.py", line 5, in <module>
    from .utils import ACTION_PARSER_MAP, PROMPT_MAP, CompletionGPT, ChatGPT, PalmChat, PalmCompletion, HFChat
  File "/net/nfs2.mosaic/yuchenl/intercode/experiments/utils/__init__.py", line 16, in <module>
    from .palm_api import (
  File "/net/nfs2.mosaic/yuchenl/intercode/experiments/utils/palm_api.py", line 15, in <module>
    assert(api_key != None)
AssertionError

btw, I also found that rpyc is not part of the requirements so that I have to install it. (a minor issue)

Missing files for test case

Hi,

I was trying to run the tests using pytest and I realized that a lot of data dependencies for tests do not exist in the repository. Would it be possible to include them as well?

For example: ./data/test/bash_queries.json which is required in tests/test_env_bash.py

I am planning to use intercode as an isolated execution environment for my cybersecurity related competition. It would be very helpful if you could include the tests in the repo.

Results of eval_n_turn not match the paper

I run the eval_n_turn.py to reproduce the single turn handicap sql results

python -m experiments.eval_n_turn \
    --data_path ./data/sql/spider/ic_spider_dev.json \
    --dialogue_limit 5 \
    --env sql \
    --image_name docker-env-sql \
    --log_dir logs/experiments \
    --max_turns 1 \
    --policy chat \
    --template game_sql \
    --model gpt-3.5-turbo \
    --handicap \
    --verbose

i use this script to compute the success rate:

import json
from re import T
result_file_path = './logs/experiments/ic_sql_multiturn_gpt-3.5-turbo_1_turns.json'
with open(result_file_path, 'r') as f:
    result = { key: {'success':0, 'total':0} for key in ['easy', 'medium', 'hard', 'extra','all'] }
    data = json.load(f)
    
    for index in data.keys():
        if data[index]['summary']['max_reward'] == 1.0:
            result[data[index]['hardness']]['success']+=1
            result['all']['success']+=1
        result[data[index]['hardness']]['total']+=1
        result['all']['total']+=1

    for key in result.keys():
        success = result[key]['success']
        total = result[key]['total']
        print(f"{key} Success rate: {success}/{total} ({success/total:.2%})")

get this result:

easy Success rate: 202/248 (81.45%)
medium Success rate: 281/446 (63.00%)
hard Success rate: 75/174 (43.10%)
extra Success rate: 37/166 (22.29%)
all Success rate: 595/1034 (57.54%)

It is lower than the result in paper.
Did I do something wrong?

I also run the eval_n_turn.py to reproduce the single turn sql results.

python -m experiments.eval_n_turn \
    --data_path ./data/sql/spider/ic_spider_dev.json \
    --dialogue_limit 5 \
    --env sql \
    --image_name docker-env-sql \
    --log_dir logs/experiments \
    --max_turns 1 \
    --policy chat \
    --template game_sql \
    --model gpt-3.5-turbo

Result is here:

easy Success rate: 41/248 (16.53%)
medium Success rate: 28/446 (6.28%)
hard Success rate: 3/174 (1.72%)
extra Success rate: 2/166 (1.20%)
all Success rate: 74/1034 (7.16%)

Did I do something wrong?

HumanPolicy

In the run_sql.py, you initialize the pocily by policy = HumanPolicy().
Shouldn't it be ChatGPTPolicy?

Is this repo abandoned?

@john-b-yang I Just wanted to know before I invest time trying to see if it's still up to date and spinning up a huggingface endpoint to eval a model.

Thanks!

Thanks for your contribution and the excellent benchmark! But the results when running eval_n_turn with gpt 3.5 and n=10 don't match the TryAgain baseline reported in your paper Table 2. I tested in the Bash environment, and the success rate in Bash 2 is 15%. Is it due to the internal changes in openai api? eval_n_turn with gpt 3.5 and n=20 roughly matches the results in your paper that uses n=10.

Best,
Shenao

System Prompt for NL2Bash

Thanks for your release of the environment code and this is really a nice work!

During my re-implementation of the experiments in your paper, I was not sure about what is your system prompt for the NL2Bash single-turn code generation?

Docker image not found

When I run the following scripts:

SQL Call python -m experiments.eval_n_turn \ --data_path ./data/sql/spider/ic_spider_dev.json \ --dialogue_limit 5 \ --env sql \ --image_name docker-env-sql \ --log_dir logs/experiments \ --max_turns 10 \ --policy chat \ --template game_sql \ --model gpt-3.5-turbo --handicap --verbose

It throws an exeception that tells me the docker image is not found:

`(intercode) user@ubuntu:/botao/intercode-master$ python main.py
Traceback (most recent call last):
File "/home/user/botao/intercode-master/main.py", line 11, in
from experiments.policies import (
File "/home/user/botao/intercode-master/experiments/policies.py", line 5, in
from .utils import ACTION_PARSER_MAP, PROMPT_MAP, CompletionGPT, ChatGPT, PalmChat, PalmCompletion, HFChat
File "/home/user/botao/intercode-master/experiments/utils/init.py", line 20, in
from .open_api import (
File "/home/user/botao/intercode-master/experiments/utils/open_api.py", line 13, in
assert(access_token)
AssertionError
(intercode) user@ubuntu:/botao/intercode-master$ vim experiments/utils/open_api.py
(intercode) user@ubuntu:~/botao/intercode-master$ python main.py
Traceback (most recent call last):
File "/home/user/anaconda3/envs/intercode/lib/python3.9/site-packages/docker/api/client.py", line 265, in _raise_for_status
response.raise_for_status()
File "/home/user/anaconda3/envs/intercode/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.43/images/docker-env-sql/json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/user/botao/intercode-master/main.py", line 215, in
expr_wrapper = ExperimentWrapper(args)
File "/home/user/botao/intercode-master/main.py", line 55, in init
self.env = SqlEnv(image_name=args.image_name,
File "/home/user/botao/intercode-master/intercode/envs/sql/sql_env.py", line 27, in init
super(SqlEnv, self).init(image_name, **kwargs)
File "/home/user/botao/intercode-master/intercode/envs/ic_env.py", line 77, in init
self.container = get_container(self.container_name, self.image_name, **kwargs)
File "/home/user/botao/intercode-master/intercode/utils/utils.py", line 48, in get_container
image = client.images.get(image_name)
File "/home/user/anaconda3/envs/intercode/lib/python3.9/site-packages/docker/models/images.py", line 333, in get
return self.prepare_model(self.client.api.inspect_image(name))
File "/home/user/anaconda3/envs/intercode/lib/python3.9/site-packages/docker/utils/decorators.py", line 19, in wrapped
return f(self, resource_id, *args, **kwargs)
File "/home/user/anaconda3/envs/intercode/lib/python3.9/site-packages/docker/api/image.py", line 251, in inspect_image
return self._result(
File "/home/user/anaconda3/envs/intercode/lib/python3.9/site-packages/docker/api/client.py", line 271, in _result
self._raise_for_status(response)
File "/home/user/anaconda3/envs/intercode/lib/python3.9/site-packages/docker/api/client.py", line 267, in _raise_for_status
raise create_api_error_from_http_exception(e) from e
File "/home/user/anaconda3/envs/intercode/lib/python3.9/site-packages/docker/errors.py", line 39, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation) from e
docker.errors.ImageNotFound: 404 Client Error for http+docker://localhost/v1.43/images/docker-env-sql/json: Not Found ("No such image: docker-env-sql:latest")`

/bin/bash: no such file or directory

When I run this script to test nl2bash. I get an error : exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown

python -m experiments.eval_n_turn \
    --data_path ./data/nl2bash/nl2bash_fs_2.json \
    --dialogue_limit 7 \
    --env bash \
    --image_name intercode-nl2bash \
    --log_dir logs/experiments \
    --max_turns 10 \
    --policy chat \
    --template v2 \
    --model gpt-3.5-turbo \
    --verbose

This is the log

Query 37: Recursively find the latest modified file in the /system directory                                                                      
- Turn 0                                                                                                                                          
-- Action: find /system -type f -printf '%t@ %p\n' | sort -n | tail -1 | cut -d' ' -f2                                                           
-- Observation: find: '/system': No such file or directory                                                                          
                                                                                                                                                  
- Turn 1                                                                                                                                          
-- Action: find / -type d -name "system" 2>/dev/null | xargs -I {} find {} -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -d' ' -f2         -- Observation: Command timed out                                                                                                                 
- Turn 2                                                                                                                                          
-- Action: find / -type f -name "*" -printf '%T@ %p\n' 2>/dev/null | sort -n | tail -1 | cut -d' ' -f2                                            
-- Observation:                                                                                                                                   
ERRO[2024-03-19T22:23:28.934840930+08:00] Error running exec 4f326e717202bfe9efbd2065082594160d5013167216f0c6616b67b8548fb118 in container: exec a
ttach failed: error attaching stdout stream: write unix /var/tmp/xdg_runtime_dir_zhoupeng/docker.sock->@: write: broken pipe                      
- Turn 3                                                                                                                                          
-- Action: find /system -type f -printf '%T@ %p\n' 2>/dev/null | sort -n | tail -1 | cut -d' ' -f2                                                -- Observation:                                                                                                                                   
- Turn 4                                                                                                                                          
-- Action: find /system -type f -exec stat --format='%Y : %n' {} + 2>/dev/null | sort -n | tail -1 | cut -d' ' -f3                                
-- Observation:                                                                                                                                   
- Turn 5                                                                 
-- Action: find /system -type f -exec stat --format='%Y : %n' {} + 2>/dev/null | sort -n | tail -1 | cut -d' ' -f3                                -- Observation:                                                                                                                                   
- Turn 6                                                                                                                                          
-- Action: find /system -type f -exec stat --format='%Y : %n' {} + 2>/dev/null | sort -n | tail -n 1 | cut -d' ' -f3                              
-- Observation:                                                          
- Turn 7                                                                                                                                          
-- Action: find /system -type f -exec stat --format='%Y : %n' {} + 2>/dev/null | sort -nr | head -n 1 | cut -d' ' -f3                             
-- Observation:                                                                                                                                   - Turn 8                                                                 
-- Action: find /system -type f -printf '%T@ %p\n' 2>/dev/null | sort -n | tail -n 1 | cut -d' ' -f2-                                             
-- Observation:                                                                                                                                   
ERRO[2024-03-19T22:23:40.527309951+08:00] Error running exec 3c3630d421ca422ba88b6ed33fe82fea20d32c761077d6c5a82ad64a078d31a8 in container: exec a
ttach failed: error attaching stderr stream: write unix /var/tmp/xdg_runtime_dir_zhoupeng/docker.sock->@: write: broken pipe                      
ERRO[2024-03-19T22:23:40.905832218+08:00] stream copy error: reading from a closed fifo                                                           
ERRO[2024-03-19T22:23:40.905902049+08:00] stream copy error: reading from a closed fifo                                                           
ERRO[2024-03-19T22:23:40.907188057+08:00] Error running exec 5b84a6bf90ddd458529cb7e96f52eec3b8091e42d36e2e4288adf6737c4bbfb2 in container: OCI ru
ntime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown          
ERRO[2024-03-19T22:23:41.059118502+08:00] stream copy error: reading from a closed fifo                                                           
ERRO[2024-03-19T22:23:41.063127206+08:00] stream copy error: reading from a closed fifo                                                           
ERRO[2024-03-19T22:23:41.064658406+08:00] Error running exec 3722625ff81b2a28f3c2735dd61f113f7e44093acbdb535d27d201b2f3e99400 in container: OCI ru
ntime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown          
- Turn 9                                                                                                                                          
-- Action: find /system -type f -exec stat --format='%Y : %n' {} + 2>/dev/null | sort -nr | head -n 1 | cut -d' ' -f3                             
-- Observation: OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or       
directory: unknown                  
                                                                                                                                                  
Query 37 Finished                                                                                                                                 
-Reward: 0.8200000000000001                                                                                                                       -Turns: 10                                                                                                                                        
ERRO[2024-03-19T22:23:41.211589420+08:00] stream copy error: reading from a closed fifo
-Turns: 10                                                                                                                                [0/1933]
ERRO[2024-03-19T22:23:41.211589420+08:00] stream copy error: reading from a closed fifo                                                           
ERRO[2024-03-19T22:23:41.211596422+08:00] stream copy error: reading from a closed fifo                                                           
ERRO[2024-03-19T22:23:41.213089845+08:00] Error running exec 285b36c54baa3d5cc84ab194a215470986d9915ee9f3d4c251db947a4e48ae5d in container: OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown          
INFO[2024-03-19T22:23:51.243086638+08:00] Container failed to exit within 10s of signal 15 - using the force  container=56653d248d010916f88eb3562c
bd5e261191b32415f8fbf37b3ad13a1453c992                                                                                                            
ERRO[2024-03-19T22:23:51.275239748+08:00] Error running exec a19b9f9275762b661d1703acdfff36073cf3be170b8a9442d85df9af7dac33a2 in container: exec a
ttach failed: error attaching stderr stream: write unix /var/tmp/xdg_runtime_dir_zhoupeng/docker.sock->@: write: broken pipe                      
ERRO[2024-03-19T22:24:01.273464072+08:00] Container failed to exit within 10s of kill - trying direct SIGKILL  container=56653d248d010916f88eb3562
cbd5e261191b32415f8fbf37b3ad13a1453c992 error="context deadline exceeded"                                                                         
ERRO[2024-03-19T22:24:04.460413534+08:00] Error running exec 8aac40fe9cf45cfed74caaf8fc7ac7a5f343d17ee0164e9c13ec9e0da19d4b71 in container: exec attach failed: error attaching stderr stream: write unix /var/tmp/xdg_runtime_dir_zhoupeng/docker.sock->@: write: broken pipe                      
INFO[2024-03-19T22:24:04.477007128+08:00] ignoring event                                container=56653d248d010916f88eb3562cbd5e261191b32415f8fbf3
7b3ad13a1453c992 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"                                                
INFO[2024-03-19T22:24:04.476982691+08:00] shim disconnected                             id=56653d248d010916f88eb3562cbd5e261191b32415f8fbf37b3ad13
a1453c992 namespace=moby                                                 
WARN[2024-03-19T22:24:04.477138964+08:00] cleaning up after shim disconnected           id=56653d248d010916f88eb3562cbd5e261191b32415f8fbf37b3ad13a1453c992 namespace=moby                                                                                                                          
INFO[2024-03-19T22:24:04.477158816+08:00] cleaning up dead shim                         namespace=moby                                            
INFO[2024-03-19T22:24:14.728170504+08:00] Container failed to exit within 10s of signal 15 - using the force  container=9ad3cbf1df1f6950a2e24fd172
a4a047ed593fd08bc94ac31cb023a93810d149                                   
INFO[2024-03-19T22:24:14.825408778+08:00] ignoring event                                container=9ad3cbf1df1f6950a2e24fd172a4a047ed593fd08bc94ac3
1cb023a93810d149 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"                                                
INFO[2024-03-19T22:24:14.825971676+08:00] shim disconnected                             id=9ad3cbf1df1f6950a2e24fd172a4a047ed593fd08bc94ac31cb023a93810d149 namespace=moby                                                 
WARN[2024-03-19T22:24:14.826073440+08:00] cleaning up after shim disconnected           id=9ad3cbf1df1f6950a2e24fd172a4a047ed593fd08bc94ac31cb023a
93810d149 namespace=moby                                                                                                                          
INFO[2024-03-19T22:24:14.826121839+08:00] cleaning up dead shim                         namespace=moby                                            
WARN[2024-03-19T22:24:15.498477176+08:00] cleanup warnings time="2024-03-19T22:24:15+08:00" level=warning msg="failed to remove runc container" er
ror="runc did not terminate successfully: exit status 255: " runtime=io.containerd.runc.v2  namespace=moby                                        
Traceback (most recent call last):                                                                                                                
  File "/home/zhoupeng/miniconda3/envs/intercode/lib/python3.9/runpy.py", line 197, in _run_module_as_main                                        
    return _run_code(code, main_globals, None,                                                                                                    
  File "/home/zhoupeng/miniconda3/envs/intercode/lib/python3.9/runpy.py", line 87, in _run_code                                                   
    exec(code, run_globals)                                                                                                                       
  File "/home/zhoupeng/project/LLM/Code_LLM_Survey/intercode/experiments/eval_n_turn.py", line 212, in <module>                                   
    expr_wrapper.run_expr()                                                                                                                       
  File "/home/zhoupeng/project/LLM/Code_LLM_Survey/intercode/experiments/eval_n_turn.py", line 94, in run_expr                                    
    self.env.reset(idx)                                                                                                                           
  File "/home/zhoupeng/project/LLM/Code_LLM_Survey/intercode/intercode/envs/ic_env.py", line 142, in reset                                        
    self.reset_container()          
  File "/home/zhoupeng/project/LLM/Code_LLM_Survey/intercode/intercode/envs/bash/bash_env.py", line 37, in reset_container                        
    raise RuntimeError(f"Failed to reset `{self.container_name}` container successfully: {output}")                                               
RuntimeError: Failed to reset `intercode-nl2bash_ic_ctr` container successfully: b'OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown\r\n'

Pypi Release Out-of-date

Currently the latest release on pypi is from June 23'.
https://pypi.org/project/intercode-bench/0.1.22/

Can you upload the new 1.0.1 release containing the rewritten CTF environment?

Running without Docker

Hi Authors,

Thanks for building this environment, that's a really great contribution.

I was wondering if it's possible to extend the codebase and either get rid of the dependency on Docker or make it compatible with other container technologies (such as Apptainer)?
The reason I'm asking is that Docker is not available on the Canadian cluster (and probably on some other clusters too) because of its security risks (https://docs.alliancecan.ca/wiki/Apptainer#Other_Linux_container_technologies).

Thank you.

No Python results?

Hi,

Thanks for building this environment, that's a really great contribution!

I am just wondering why there is no MBPP results in the paper and leadboard?

Best,
Jiyang Zhang

RuntimeError: Preprocess command failed to execute successfully: ['use poker_player']

Hello, thanks for your outstanding job!
When I run the scripts/expr_react.sh

python -m experiments.eval_react
--data_path ./data/sql/spider/ic_spider_dev.json
--env sql
--image_name docker-env-sql
--log_dir logs/experiments/sql_gpt3test
--max_turns 5
--verbose

I encounter the error:

Traceback (most recent call last):
File "/data/home/miniconda3/envs/intercode/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data/home/miniconda3/envs/intercode/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data/home/data_creation/intercode/experiments/eval_react.py", line 195, in
expr_wrapper.run_expr()
File "/data/home/data_creation/intercode/experiments/eval_react.py", line 101, in run_expr
self.env.reset(idx)
File "/data/home/data_creation/intercode/intercode/envs/ic_env.py", line 150, in reset
raise RuntimeError(f"Preprocess command failed to execute successfully: {self.preprocess(self.record)}")
RuntimeError: Preprocess command failed to execute successfully: ['use poker_player']

It seems that the problem is caused by function:

def preprocess_sql(record: Dict) -> List:
          db = record["db"]
          print(f"db {db}")
          return [f"use {db}"]

And the error is raised in

        if self.preprocess is not None:
            preprocess_cmds = self.preprocess(self.record)
            for cmd in preprocess_cmds:
                self.exec_action(cmd)
                if not self.info[ACTION_EXEC]:
                    raise RuntimeError(f"Preprocess command failed to execute successfully: {self.preprocess(self.record)}")

Additionally, I find function exec_action is not implement and the self.info = {'action_executed': False} .

        if self.preprocess is not None:
            # self.info is *{}*
            preprocess_cmds = self.preprocess(self.record)
            for cmd in preprocess_cmds:
                self.exec_action(cmd)
               # self.info is *{'action_executed': False}*
                if not self.info[ACTION_EXEC]:
                    raise RuntimeError(f"Preprocess command failed to execute successfully: {self.preprocess(self.record)}")

So how can I solver the problem, or is there anything I missed? Appreciate for your response. :-)

LangChain?

You have LangChain listed in the requirements (ie environment.yml) but I cannot find any use of it in the code. Do you use it anywhere? Also, is there anyway to integrate InterCode with LangChain? If so, please describe how you would do this.

Sweep: fix run_demo.py which can not import PythonEnv

missing files

I want to access the json results of trajectories of the bash dataset.
However, the .data/results directory, which should have contains experiment results, is missed. And googledrive link does not contain files of the bash dataset. And ./data/docker/bash_scripts/setup_nl2b_fs_*.sh, which should have contained file system definition, is missed too(:
Would you please provide me with json results of trajectories of the bash dataset? Or the setup_nl2b_fs_*.sh, which would help me reproduce the results?

princeton-nlp / intercode Goto Github PK

intercode's Issues

setup error

Missing files for test case

Results of eval_n_turn not match the paper

HumanPolicy

Is this repo abandoned?

Results of eval_n_turn

System Prompt for NL2Bash

Docker image not found

/bin/bash: no such file or directory

Pypi Release Out-of-date

Running without Docker

No Python results?

RuntimeError: Preprocess command failed to execute successfully: ['use poker_player']

LangChain?

Sweep: fix run_demo.py which can not import PythonEnv

missing files

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent