Code Monkey home page Code Monkey logo

code-act's Introduction

Executable Code Actions Elicit Better LLM Agents

πŸ“ƒ Paper β€’ πŸ€— Data (CodeActInstruct) β€’ πŸ€— Model (CodeActAgent-Mistral-7b-v0.1) β€’ πŸ€– Chat with CodeActAgent!

We propose to use executable Python code to consolidate LLM agents’ actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations (e.g., code execution results) through multi-turn interactions (check out this example!).

Overview

Why CodeAct?

Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark M3ToolEval shows that CodeAct outperforms widely used alternatives like Text and JSON (up to 20% higher success rate). Please check our paper for more detailed analysis!

Comparison between CodeAct and Text/JSON Comparison between CodeAct and Text / JSON as action.

Comparison between CodeAct and Text/JSON Quantitative results comparing CodeAct and {Text, JSON} on M3ToolEval.

πŸ“ CodeActInstruct

We collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. Dataset is release at huggingface dataset πŸ€—. Please refer to the paper and this section for details of data collection.

Data Statistics Dataset Statistics. Token statistics are computed using Llama-2 tokenizer.

πŸͺ„ CodeActAgent

Trained on CodeActInstruct and general conversations, CodeActAgent excels at out-of-domain agent tasks compared to open-source models of the same size, while not sacrificing generic performance (e.g., knowledge, dialog). We release two variants of CodeActAgent:

  • CodeActAgent-Mistral-7b-v0.1 (recommended, model link): using Mistral-7b-v0.1 as the base model with 32k context window.
  • CodeActAgent-Llama-7b (model link): using Llama-2-7b as the base model with 4k context window.

Model Performance Evaluation results for CodeActAgent. ID and OD stand for in-domain and out-of-domain evaluation correspondingly. Overall averaged performance normalizes the MT-Bench score to be consistent with other tasks and excludes in-domain tasks for fair comparison.

Please check out πŸ“ƒ our paper for more details about data collection, model training, evaluation, and more!

πŸš€ Use CodeActAgent for Your Application!

codeact-demo.mp4
Demo of the chat interface.

Serve the Model using vLLM into OpenAI Compatible API

# You should download the model first, here is an example for CodeActAgent-Mistral
cd $YOUR_DIR_TO_DOWNLOADED_MISTRAL_MODEL
git lfs install
git clone https://huggingface.co/xingyaoww/CodeActAgent-Mistral-7b-v0.1
./scripts/chat/start_vllm.sh $YOUR_DIR_TO_DOWNLOADED_MISTRAL_MODEL/CodeActAgent-Mistral-7b-v0.1
# OR
# ./scripts/chat_ui/start_vllm.sh $YOUR_DIR_TO_DOWNLOADED_LLAMA_MODEL/CodeActAgent-Llama-7b

This script (docker-required) will start hosting the model based on CUDA_VISIBLE_DEVICES to port 8080 and you may access the model via OPENAI_API_BASE of http://localhost:8080/v1 (by default). You may check the OpenAI API's official documentation for detailed instruction. You may also check vLLM's official instruction for more information.

Start your Code Execution Engine!

We implemented a containerized code execution engine based on JupyterKernelGateway. The idea is to start a Jupyter server inside a docker container per chat session to support code execution request from the model (the session will timeout in a fixed period of time). It requires docker to be installed locally.

# Start a code execution server at 8081
./scripts/chat/code_execution/start_jupyter_server.sh 8081

🌟 Contribution welcomed: We are still looking for a more scalable (and more secure) way to implement this (e.g., docker-in-docker, better management for pre-defined tools, support for web-browsing, file upload / management, etc) -- recommendations or PRs are welcomed!

Interact with the system!

via simple Python script

If you don't want to spin-up a fancy interface and just want to play with it from the command line, we got you covered!

# Make sure you started vLLM and code execution engine before running this!
python3 scripts/chat/demo.py --model_name xingyaoww/CodeActAgent-Mistral-7b-v0.1 --openai_api_base http://$YOUR_VLLM_HOST:$YOUR_VLLM_PORT/v1 --jupyter_kernel_url http://$YOUR_CODE_EXEC_ENGINE_HOST:$YOUR_CODE_EXEC_ENGINE_PORT/execute

via Chat-UI

If you've served the model and the code execution engine, you can run your own chat interface just like this!

If you want user management, you may need to start your own mongoDB instance:

./scripts/chat/start_mongodb.sh $YOUR_MONGO_DB_PASSWORD
# The database will be created at `pwd`/data/mongodb and available at localhost:27017

Then, you can configure your chat-ui interface.

cp chat-ui/.env.template chat-ui/.env.local
# Make sure you modify .env.local to your configuration by correctly fill-in
# 1. JUPYTER_API_URL
# 2. model endpoint (search for 'TODO_OPENAI_BASE_URL')
# 3. MONGODB_URL - You may leave this empty, the chat-ui will automatically start a database (but it will be deleted once the container is stopped)

Now you can build and start your own web application (docker-required)!

./scripts/chat/run_chat_ui.sh
# It will starts the interface on localhost:5173 by default

# Run this script for debug mode
# ./scripts/chat/run_chat_ui_debug.sh

For more information (e.g., if you don't want to use docker), please check-out chat-ui's documentation!

πŸŽ₯ Reproduce Experiments in the Paper

git clone https://github.com/xingyaoww/code-act
# To clone all submodules
git submodule update --init --recursive

πŸ“‚ Data Generation (Optional)

Recommended: You may download the processed CodeActInstruct from huggingface dataset πŸ€—.

For reproducibility: You can optionally generate data follow instructions in docs/DATA_GENERATION.md to generate interaction data.

πŸ“˜ Model Training

We use a fork of Megatron-LLM for training. You can follow docs/MODEL_TRAINING.md for detailed instructions.

πŸ“Š Evaluation

Please refer to docs/EVALUATION.md for detailed instruction.

πŸ“š Citation

@misc{wang2024executable,
      title={Executable Code Actions Elicit Better LLM Agents}, 
      author={Xingyao Wang and Yangyi Chen and Lifan Yuan and Yizhe Zhang and Yunzhu Li and Hao Peng and Heng Ji},
      year={2024},
      eprint={2402.01030},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

code-act's People

Contributors

eltociear avatar xingyaoww avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.