Code Monkey home page Code Monkey logo

effibench's Introduction

EffiBench: Benchmarking the Efficiency of Automatically Generated Code

๐Ÿ“ Abstract

Code generation models have increasingly become integral to aiding software development, offering assistance in tasks such as code completion, debugging, and code translation. Although current research has thoroughly examined the correctness of code produced by code generation models, a vital aspect โ€” the efficiency of the generated code โ€” has often been neglected. This paper presents EffiBench, a benchmark with 1,000 efficiency-critical coding problems for assessing the efficiency of code generated by code generation models. EffiBench contains a diverse set of LeetCode coding problems. Each problem is paired with an executable human-written canonical solution. With EffiBench, we empirically examine the capability of 21 Large Language Models (13 open-sourced and 8 closed-sourced) in generating efficient code. The results demonstrate that GPT-4-turbo generates the most efficient code, significantly outperforming Palm-2-chat-bison, Claude-instant-1, Gemini-pro, GPT-4, and GPT-3.5. Nevertheless, its code efficiency is still worse than the efficiency of human-written canonical solutions. In particular, the average / worst execution time of GPT-4-turbo generated code is 1.69 / 45.49 times that of the canonical solutions.

๐Ÿš€ Updates

02/21/2024: Code released

04/15/2024: HuggingFace: EffiBench

Installation

git clone [email protected]:huangd1999/EffiBench.git
cd EffiBench
pip install -r requirements.txt

Evaluation on EffiBench

Our evaluation consists of two steps: generation and metrics calculation.

Generation

Open-sourced Models

For open-sourced models like StarCoder, DeepSeek-Coder, etc., we provide batch inference scripts for fast inference on EffiBench.

cd ./src
mkdir results
python open_source_model_completion.py \
  --model codellama/CodeLlama-70b-Instruct-hf 

OpenAI models

OpenAI models are accessible through an API. You may use the following script:

cd ./src
mkdir results
python closed_source_model_completion.py \
  --model gpt-3.5-turbo-0301 

Metrics Calculation

After obtaining the generation, we can calculate the final metrics

cd ./src
python code_efficiency_calculator.py \
  --model gpt-3.5-turbo-0301
python report_overhead.py \
  --model gpt-3.5-turbo-0301

Citation

@article{huang2024effibench,
  title={EffiBench: Benchmarking the Efficiency of Automatically Generated Code},
  author={Huang, Dong and Zhang, Jie M and Qing, Yuhao and Cui, Heming},
  journal={arXiv preprint arXiv:2402.02037},
  year={2024}
}

Questions

Please feel free to email us (email addresses in the paper. You may also submit an issue in this repo.

License

This project is licensed under the Apache-2.0 License.

effibench's People

Contributors

huangd1999 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.