codium-ai / alphacodium Goto Github PK

View Code? Open in Web Editor NEW

3.3K 49.0 233.0 1.46 MB

Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

Home Page: https://www.codium.ai

License: GNU Affero General Public License v3.0

Dockerfile 0.49% Python 99.51%

code-generation flow-engineering paper-implementations state-of-the-art broader-impacts

alphacodium's Introduction

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

Paper | Dataset

Official Implementation

Tal Ridnik, Dedy Kredo, Itamar Friedman
CodiumAI

News 2024-17-05

Updated AlphaCodium leaderboard with scores of new GPT models, and Claude3 Opus. "GPT-4o" Is currently the leading model on AlphaCodium.

Abstract
Installation
How to run
Technical Q&A
Broader Applicability
Example Problem
Acknowledgments
Citation

Abstract

Code generation problems differ from common natural language problems - they require matching the exact syntax of the target language, identifying happy paths and edge cases, paying attention to numerous small details in the problem spec, and addressing other code-specific issues and requirements. Hence, many of the optimizations and tricks that have been successful in natural language generation may not be effective for code tasks.

In this work, we propose a new approach to code generation by LLMs, which we call AlphaCodium - a test-based, multi-stage, code-oriented iterative flow, that improves the performances of LLMs on code problems.

We tested AlphaCodium on a challenging code generation dataset called CodeContests, which includes competitive programming problems from platforms such as Codeforces. The proposed flow consistently and significantly improves results. On the validation set, for example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow.

Many of the principles and best practices we acquired in this work, we believe, are broadly applicable to general code generation tasks.

Installation

(1) setup a virtual environment：

python3 -m venv venv
source ./venv/bin/activate

and run: pip install -r requirements.txt.

(2) Duplicate the file alpha_codium/settings/.secrets_template.toml, rename it as alpha_codium/settings/.secrets.toml, and fill in your OpenAI API key:

[openai]
key = "..."

(3) Download the processed CodeContest validation and test dataset from hugging face, extract the zip file, and placed the extracted folder in the root of the project.

How to run

Configuration

The file: alpha_codium/settings/configuration.toml contains the configuration for the project. In the config section you can choose the model you want to use ("gpt-4", "gpt-3.5-turbo-16k", or others).

Solving a specific problem from CodeContest

To solve a specific problem with AlphaCodium, from the root folder run:

python -m alpha_codium.solve_problem \
--dataset_name /path/to/dataset \
--split_name test \
--problem_number 0

The dataset_name is the path to the dataset folder you downloaded in the installation step.
Note that the validation set contains 117 problems, and the test set contains 165 problems, so the problem_number parameter should be accordingly (zero-based)
The split_name can be either valid or test.
The following sections in the configuration file: solve, self_reflection,possible_solutions,generate_ai_tests,initial_code_generation,public_tests, ai_tests
enable to adjust possible configurations for the different stages of the flow.
Each run logs the results to a file named alpha_codium/example.log. Reviewing the log file is a good way to understand what is going on in each stage of the flow.

Example problem (test set, problem number 12):

Solving an entire CodeContest dataset split

to solve the entire dataset with AlphaCodium, from the root folder run:

python -m alpha_codium.solve_dataset \
--dataset_name /path/to/dataset \
--split_name test \
--database_solution_path /path/to/output/dir/dataset_output.json

The split_name can be either valid or test.
database_solution_path is the path to the directory where the solutions will be saved.
The dataset section in the configuration file contains the configuration for the running and evaluation of a dataset.
Note that this is a long process, and it may take a few days to complete with large models (e.g. GPT-4) and several iterations per problem.
dataset.num_iterations defines the number of iterations for each problem (pass@K). For a large number of iterations, it is recommended to introduce some randomness and different options for each iteration to achieve top results.

Running the evaluation

Once you generate a solution for the entire dataset (valid or test), you can evaluate it by running:

python -m alpha_codium.evaluate_dataset \
--dataset_name /path/to/dataset \
--split_name test \
--database_solution_path /path/to/output/dir/dataset_output.json

Solving a new problem (CodeContest format)

To solve a custom problem with AlphaCodium, first create a json file that includes the CodeContest problem fields, and then from the root folder run:

python -m alpha_codium.solve_my_problem \
--my_problem_json_file /path/to/my_problem.json

The my_problem_json_file is the path to to the custom problem json file.

See the my_problem_example.json to see an example of a custom problem. The json file should include the following fields:

name is the name of the problem.
description is a description of the problem.
(optional) public_tests with the following fields:
- input is a list of strings that represent the input.
- output is a list of strings that represent the output.
(optional) private_tests, that follows the same structure as public_tests
(optional) generated_tests, that follows the same structure as public_tests

Technical Q&A

Aggregating some technical questions we received about this project:

Q: How much time did you spend on "prompt engineering" compared to "flow engineering"?

A: Structured output almost completely eliminates the need for simple prompt engineering. We estimate that ~95% of the time we did more high-level design, reasoning, and injecting data at the correct places, ..., a.k.a. "flow engineering".

Q: How do you know that there wasn't a data leakage?

A: The test set of CodeContests dataset comprises problems published after September 2021, while the GPT-4 model variant we used (gpt-4-0613) has a data cutoff of September 2021. Hence, there is no data leakage for GPT4, on the test set. For other models like DeepSeek, we cannot be sure. However, note that our main result is a comparison of "direct prompt" vs. "AlphaCodium flow". Data leakage would help both approaches, so the relative improvement of AlphaCodium flow is still valid.

Q: Is this project relevant only to specific programming languages?

A: No. The proposed flow is language agnostic. We generated solutions in Python, but the flow can be applied to any language.

Q: How did you manage the context window?

A: We used models with a context window of 8192 tokens, and we did not encounter cases where it did not suffice. However, we clearly observed that as the context we used in practice grows larger (let's say, above 4000 tokens), the model starts to "ignore" some of the information in the context. Hence, there is a clear tradeoff:

Injecting the results of previous stages into the context, may help the model to generate better code.
However, it may also cause the model to ignore specific details and nuances from the problem description.

Q: Is this work "realistic" in terms of the number of LLM calls?

A: In comparison to AlphaCode, we do four orders of magnitude (!) fewer calls (per solution AlphaCodium does 15-20 calls). Yet we acknowledge that for some applications, this may still be too much, and more optimizations are needed. We however believe that many of the ideas and principles we acquired in this work are broadly applicable, even when the number of calls is further limited.

Q: Why do you iterate only on the generated code, and not on the AI-generated tests?

A: For code problems in CodeContests, the tests are a list of input-output pairs. Hence, you don't really learn anything new when you "fix" a test - you just change its output to the prediction of the generated code. Instead of fixing tests, we preferred to always try and fix the code, while using "test anchors". (see the paper for more details). However, for other code generation tasks, where the tests are more complex and contain runnable code, iterating on the tests, in addition to iterating on the generated code, may be beneficial.

Broader Applicability

While this work presents results on CodeContests dataset, we believe that it has a broader applicability.

First and foremost, we feel that the proposed AlphaCodium flow, with reasonable adjustments, can be used as a more general framework for other code generation tasks.

Secondly, many of the design concepts, principles, and tricks we acquired in this work are broadly applicable as-is to any general code generation tasks. For example:

YAML Structured output: asking the model to generate an output in YAML format, equivalent to a given Pydantic class
Semantic reasoning via bullet points analysis: Bullet points analysis encourages an in-depth understanding of the problem, and forces the model to divide the output into logical semantic sections, leading to improved results
LLMs do better when generating a modular code: when asking the model to: divide the generated code into small sub-functions, with meaningful names and functionality, we observe a better-produced code, with fewer bugs, and higher success rates for the iterative fixing stages.
Soft decisions with double validation: with a double validation process, we add an extra step where, given the generated output, the model is asked to re-generate the same output, but correct it if needed
Leave room for exploration: since the model can be wrong, it’s better to avoid irreversible decisions, and leave room for exploration and code iterations with different possible solutions

The list above is partial. See the paper for more details. The code provided in this repo can be used as a reference for better understanding the proposed concepts, and for applying them to other code generation tasks.

Example Problem

In this section, we present an example for a full problem from CodeContests dataset (test-set, problem 1), in order to demonstrate the complexity of the problems in the dataset, and the challenges they pose to LLMs.

problem name: '1575_B. Building an Amusement Park'

problem description:
Mr. Chanek lives in a city represented as a plane. He wants to build an amusement park in the shape of a circle of radius r. 
The circle must touch the origin (point (0, 0)).
There are n bird habitats that can be a photo spot for the tourists in the park. The i-th bird habitat is at point p_i = (x_i, y_i). 

Find the minimum radius r of a park with at least k bird habitats inside. 

A point is considered to be inside the park if and only if the distance between p_i and the center of the park is less than or equal 
to the radius of the park.
Note that the center and the radius of the park do not need to be integers.

In this problem, it is guaranteed that the given input always has a solution with r ≤ 2 ⋅ 10^5.

Input

The first line contains two integers n and k (1 ≤ n ≤ 10^5, 1 ≤ k ≤ n) — the number of bird habitats in the city and the number of bird 
habitats required to be inside the park.
The i-th of the next n lines contains two integers x_i and y_i (0 ≤ |x_i|, |y_i| ≤ 10^5) — the position of the i-th bird habitat.

Output

Output a single real number r denoting the minimum radius of a park with at least k bird habitats inside. It is guaranteed that the given 
input always has a solution with r ≤ 2 ⋅ 10^5.
Your answer is considered correct if its absolute or relative error does not exceed 10^{-4}.
Formally, let your answer be a, and the jury's answer be b. Your answer is accepted if and only if \frac{|a - b|}{max{(1, |b|)}} ≤ 10^{-4}.

Examples

Input

8 4
-3 1
-4 4
1 5
2 2
2 -2
-2 -4
-1 -1
-6 0

Output

3.1622776589


Input

1 1
0 0


Output

0.0000000000

Note

In the first example, Mr. Chanek can put the center of the park at (-3, -1) with radius √{10} ≈ 3.162. It can be proven this is the minimum r.

Acknowledgments

Our process CodeContests dataset is based on the original CodeContests dataset. We removed the train set (which is not relevant to our work) and did some post-processing and cleaning to the validation and test sets.

Citation

@misc{ridnik2024code,
      title={Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering}, 
      author={Tal Ridnik and Dedy Kredo and Itamar Friedman},
      year={2024},
      eprint={2401.08500},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

alphacodium's People

Contributors

Stargazers

Watchers

Forkers

mrt23 antonosika sweetdevil144 dnzdlklc vaibhavmalik4187 wayum999 vineetp6 maximus12793 mj3b mz0in llmapparchitect felixbade shabbirhasan1 tonywhite11 polya20 suryatmodulus artus-lytiq codeaudit fourpartswater andreslavescu blaizzy djlaserman sf9040 jansystemic webclinic017 pdragonlabs rarjun19 twobob hbcbh1999 sidhellman thegovind jfontestad a7t0fwa7 ototao o2alexanderfedin brunoscaglione 10nates easonatuestcglasgow mivanovitch nochwysid yiakwy-xpu-ml-framework-team johnxg xshapira vrmars3d gdlf13 zzmjohn techthiyanes sdrakulich o7s8r6 lahiaomar jjhw rlancemartin evelynmitchell ashwinrajendraprasad evdcush pierrevalade nhsjgczryf ssusantachary jkf87 killman122 saurabhchandra1024 cinkovic im-hidden rhinojosa osub wodole yun-fei-xie nshilon worldofxeen mehulgo93 mbrukman josephrp plurigrid beimingmaster eltociear allinbsv ailabteam lunabegray abhishek-honey vechtomov violabs dragekjeks xc0r biofiction shivamms cassini-chris sonnydev runrunliuliu apollohuang1 jadegeek liuchaoxd dearborn-open-ai king-darius ravikiranbadathala jenningsloy318 buddhanepal sergiobasstidas bryanfran khryptorgraphics sagargiradkar

alphacodium's Issues

Hello, I noticed that while running AI tests, some tests were mistakenly marked as failed due to timeouts. Upon reviewing the code, I found that the timeout duration is set to 3 seconds and it is not configurable.

Hello, I noticed that while running AI tests, some tests were mistakenly marked as failed due to timeouts. Upon reviewing the code, I found that the timeout duration is set to 3 seconds and it is not configurable.

However, when I run this test case individually with the generated code, the result is as expected, it's just that the execution time is a bit long.

Benchmark on SWE-Bench

It would be interesting to see the performance on SWE-Bench benchmarks, so that this project can be more clearly differentiated from the increasing number of other coding agents.

https://www.swebench.com/
https://github.com/princeton-nlp/SWE-bench
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://arxiv.org/abs/2310.06770
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We consider real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. We therefore introduce SWE-bench, an evaluation framework including 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. Claude 2 and GPT-4 solve a mere 4.8% and 1.7% of instances respectively, even when provided with an oracle retriever. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.

Support for Claude 3

Can AlphaCodium run on Claude 3 Opus?

It would be great to see how AlphaCodium using Claude 3 performs compared to AlphaCodium using GPT-4

코딩AI

Got errors when using gpt-3.5-turbo-1106

Hi,

python -m alpha_codium.solve_problem --dataset_name valid_and_test_processed --split_name test --problem_number 0

The above script works well when I use gpt-3.5-turbo-0613. But when I use 'gpt-3.5-turbo-1106', it always shows the following error. Can you have a try on 'gpt-3.5-turbo-1106' to see if you have the same error? Thanks!

2024-01-25 21:11:46.480 | INFO | alpha_codium.gen.coding_competitor:solve_problem:118 - problem['name']: 1575_A. Another Sorting Problem
2024-01-25 21:11:46.484 | INFO | alpha_codium.gen.coding_competitor:run:60 - Running code contests competitor, model gpt-3.5-turbo-1106
2024-01-25 21:11:46.485 | INFO | alpha_codium.gen.stages.run_self_reflect:run_self_reflect:18 - --reflection stage--
2024-01-25 21:11:46.491 | INFO | alpha_codium.llm.ai_handler:chat_completion:86 - -----------------
2024-01-25 21:11:46.491 | INFO | alpha_codium.llm.ai_handler:chat_completion:87 - Running inference ...
2024-01-25 21:11:56.045 | INFO | alpha_codium.llm.ai_handler:chat_completion:133 - done
2024-01-25 21:11:56.045 | INFO | alpha_codium.llm.ai_handler:chat_completion:134 - -----------------
ERROR:root:'run_self_reflect' stage, counter_retry 0, Error: while scanning for the next token
found character '`' that cannot start any token
in "", line 1, column 1:
```yaml
^

httpx.ConnectError: All connection attempts failed

Hi,

I run
python -m alpha_codium.solve_problem --dataset_name /workspace/xxx/codes/AlphaCodium/valid_and_test_processed --split_name test --problem_number 1

It always shows the problem:
`2024-03-28 14:14:21.151 | INFO | alpha_codium.gen.coding_competitor:solve_problem:116 - problem_name: 1575_B. Building an Amusement Park
2024-03-28 14:14:21.156 | INFO | alpha_codium.gen.coding_competitor:solve_problem:120 - problem['name']: 1575_B. Building an Amusement Park
2024-03-28 14:14:21.159 | INFO | alpha_codium.gen.coding_competitor:run:63 - Running code contests competitor, model gpt-3.5-turbo-16k
2024-03-28 14:14:21.164 | INFO | alpha_codium.llm.ai_handler:chat_completion:86 - -----------------
2024-03-28 14:14:21.164 | INFO | alpha_codium.llm.ai_handler:chat_completion:87 - Running inference ...

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

...
httpcore.ConnectError: All connection attempts failed.
...

Traceback (most recent call last):
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/llm/ai_invoker.py", line 15, in send_inference
return await f(model)
^^^^^^^^^^^^^^
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/gen/coding_competitor.py", line 52, in _run
response, finish_reason = await self.ai_handler.chat_completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/llm/ai_handler.py", line 127, in chat_completion
raise APIError from e
TypeError: APIError.init() missing 5 required positional arguments: 'status_code', 'message', 'llm_provider', 'model', and 'request'

ERROR:root:Error: APIError.init() missing 5 required positional arguments: 'status_code', 'message', 'llm_provider', 'model', and 'request'

...

2024-03-28 14:43:50.504 | INFO | alpha_codium.gen.coding_competitor:solve_my_problem:184 - evaluating solution on generated tests...
Process Process-6:
Traceback (most recent call last):
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/code_contests/eval/local_exec.py", line 89, in unsafe_execute
with create_tempdir():
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/contextlib.py", line 144, in exit
next(self.gen)
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/code_contests/eval/local_exec.py", line 278, in create_tempdir
with tempfile.TemporaryDirectory() as dirname:
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/tempfile.py", line 943, in exit
self.cleanup()
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/tempfile.py", line 947, in cleanup
self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/tempfile.py", line 929, in _rmtree
_shutil.rmtree(name, onerror=onerror)
TypeError: 'NoneType' object is not callable
2024-03-28 14:43:50.627 | INFO | alpha_codium.gen.coding_competitor:solve_my_problem:188 -
test_passed_generate: 0, test_passed_private: 0, test_passed_public: 0
test_failed_generate: 0, test_failed_private: 0, test_failed_public: 0
test_timeout_generate: 0, test_timeout_private: 0, test_timeout_public: 0
`

Invalid IPC stream: negative configuration token

I followed all the steps in the readme to set up the environment, then extracted the folder to the AlphaCodium directory. I ran this command: python -m alpha_codium.solve_problem --dataset_name "C:\Users\****\Downloads\py\AlphaCodium\valid_and_test_processed" --split_name valid --problem_number 0
and got the following output:

  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\****\Downloads\py\AlphaCodium\alpha_codium\solve_problem.py", line 16, in <module>
    solve_problem(dataset_name=args.dataset_name,
  File "C:\Users\****\Downloads\py\AlphaCodium\alpha_codium\gen\coding_competitor.py", line 109, in solve_problem
    data_provider = CodeContestDataProvider(dataset_location=dataset_name)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\alpha_codium\code_contests\data\provider.py", line 29, in __init__
    self.dataset = self.load_dataset()
                   ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\alpha_codium\code_contests\data\provider.py", line 131, in load_dataset
    return f(self.dataset_location)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\load.py", line 2636, in load_from_disk
    return DatasetDict.load_from_disk(dataset_path, keep_in_memory=keep_in_memory, storage_options=storage_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\dataset_dict.py", line 1369, in load_from_disk
    dataset_dict[k] = Dataset.load_from_disk(
                      ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\arrow_dataset.py", line 1706, in load_from_disk
    arrow_table = concat_tables(
                  ^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\table.py", line 1765, in concat_tables
    tables = list(tables)
             ^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\arrow_dataset.py", line 1707, in <genexpr>
    table_cls.from_file(posixpath.join(dest_dataset_path, data_file["filename"]))
  File "C:\Users\loydni\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\table.py", line 1022, in from_file
    table = _memory_mapped_arrow_table_from_file(filename)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\table.py", line 64, in _memory_mapped_arrow_table_from_file
    opened_stream = _memory_mapped_record_batch_reader_from_file(filename)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\table.py", line 50, in _memory_mapped_record_batch_reader_from_file
    return pa.ipc.open_stream(memory_mapped_stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\pyarrow\ipc.py", line 190, in open_stream
    return RecordBatchStreamReader(source, options=options,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\pyarrow\ipc.py", line 52, in __init__
    self._open(source, options=options, memory_pool=memory_pool)
  File "pyarrow\ipc.pxi", line 974, in pyarrow.lib._RecordBatchStreamReader._open
  File "pyarrow\error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow\error.pxi", line 91, in pyarrow.lib.check_status
OSError: Invalid IPC stream: negative continuation token```

Custom Problem

Is it possible to modify the code so that it works for a custom problem that is not included in the dataset?

How to use deepseek

When I modified the configuration file to use the model="deepseek-coder-33b-instruct" and ran the code for model inference, it failed with the following error:

alpha_codium.llm.ai_handler:chat_completion:87 - Running inference ...

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

ERROR:root:Error during OpenAI inference

It seems that litellm does not support deepseek, and I would like to know how to resolve this issue.

Generate Solution for other language and where to see test files generated

Module works fine for python solution , is there are any parameter for specifying any programming language.

Where to see the generate test files

Enhancements to the Iterative Flow Mechanism in AlphaCodium for Robust Code Generation

Dear Tal Ridnik, Dedy Kredo, and Itamar Friedman,

I have been thoroughly engrossed in the study of your work on AlphaCodium as detailed in your recent GitHub repository. The methodology you have proposed for code generation through the use of a test-based, multi-stage iterative flow is indeed revolutionary and appears to have the potential to significantly improve the accuracy of language models on code-related tasks.

However, upon delving into the intricacies of your approach, I have identified a few areas where the iterative flow mechanism could possibly be enhanced to ensure even more robust code generation. I am listing these below, along with suggestions for potential improvements:

Context Management Optimisation: As noted in your Technical Q&A section, the model tends to overlook certain details in the problem description when the context grows too large. Would it be feasible to implement a more dynamic context management strategy that prioritises the most relevant information from previous iterations, ensuring that the model retains focus on the key aspects of the problem?
Enhanced Feedback Loop for Test Generation: While iterating on the generated code is the current focus, could there be merit in establishing a feedback loop for the AI-generated tests as well? For instance, tests that consistently fail could trigger a deeper analysis of specific code segments, potentially uncovering subtle bugs that are not immediately apparent.
Granular Control Over Iterative Steps: Could the configuration file expose more granular control over the iterative steps? For example, allowing users to specify different iteration strategies for certain types of problems or to adjust the iteration count based on the complexity of the task at hand.
Integration with Real-world Development Environments: How might AlphaCodium be integrated into real-world development environments to support live coding scenarios? Would it be possible to create plugins or extensions for popular Integrated Development Environments (IDEs) that utilise AlphaCodium's flow to assist developers in real-time?
Cross-language Applicability and Testing: While the flow is language-agnostic, have there been any efforts to test its efficacy across a broader range of programming languages? Insights gained from such tests could help refine the flow to better accommodate the idiosyncrasies of different programming paradigms.

I believe that addressing these points could further elevate the practicality and effectiveness of AlphaCodium in real-world coding applications. I am eager to hear your thoughts on these suggestions and whether they could be incorporated into your future work.

Thank you for your pioneering contributions to the field of AI-driven code generation. I look forward to your response and am excited about the potential advancements that your continued research will bring to the developer community.

Best regards,
yihong1120

Problems during the AlphaCodium installation process

I encountered some problems during the AlphaCodium installation process.

OS : MacOS Sonoma 14.3
Python : 3.12.1

When I executed the command < pip install -r requirements.txt >, I got error logs like below :

Collecting PyYAML==6.0 (from -r requirements.txt (line 12))
Downloading PyYAML-6.0.tar.gz (124 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.0/125.0 kB 11.7 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [54 lines of output]
running egg_info
writing lib/PyYAML.egg-info/PKG-INFO
writing dependency_links to lib/PyYAML.egg-info/dependency_links.txt
writing top-level names to lib/PyYAML.egg-info/top_level.txt
Traceback (most recent call last):
File "/Users/jholim/workspace/codeai/demo/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/Users/jholim/workspace/codeai/demo/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jholim/workspace/codeai/demo/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/k8/27k28wmn09j3109z45qxk9xr0000gn/T/pip-build-env-gondi16m/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I passed that point using the following information, but other issues arose.

https://discuss.python.org/t/getting-requirements-to-build-wheel-did-not-run-successfully-exit-code-1/30365

Could you give some advise that I can solve this problem?
Or, please let me know your environment info - including python version.

Thanks in advance.

Recommended approach for local models? i.e. Swappable model support.

Looks like litellm does a (too) good job of encapsulating the calls to openai, making calls to local openai-api-based models require a proxy to intercept and re-route.

Is this the recommended approach for the time being? Any plans to drop the litellm dependency, use one that's a little more open, or write your own layer?

It would be nice to use this with swappable models especially since AC seems to generalize across general instruct models and not require function-calling models.

'gbk' codec can't encode character '\u22c5' and test_timeout_generate: 200

Meaning of parameters in 'configuration.toml'

Hi,

In the 'configuration.toml' file, I see a range of parameters, but not sure what those parameters control. Could you please provide one example config file that can produce the result in Table 1 and Table 2 of the paper? One example for each table will be great! Thank you!