Code Monkey home page Code Monkey logo

aixcoder-7b's Introduction

aiXcoder-7B Code Large Language Model

🏠 Official website|🛠 VS Code Plugin|🛠 Jetbrains Plugin|🤗 Model WeightsWeChatWeChat Official Account

Welcome to the official repository of aiXcoder-7B Code Large Language Model. This model is designed to understand and generate code across multiple programming languages, offering state-of-the-art performance in code completion, comprehension, generation, and more tasks about programming languages.

Table of Contents

  1. Model Introduction
  2. Quickstart
  3. Data for aiXcoder 7B
  4. Training
  5. Details of Experimental Results
  6. License
  7. Acknowledgments

Model Introduction

As the capabilities of large code models are gradually being unearthed, aiXcoder has consistently pondered on how to make these models more beneficial in real development scenarios. To this end, we have open-sourced aiXcoder 7B Base, which has undergone extensive training on 1.2T Unique Tokens, and the model's pre-training tasks as well as the contextual information have been uniquely designed for real-world code generation contexts.

aiXcoder 7B Base stands out as the most effective model in code completion scenarios among all models of similar parameter sizes, and it also surpasses mainstream models like codellama 34B and StarCoder2 15B in the average performance on the multilingual nl2code benchmark.

In our ongoing exploration to apply large code models, the release of aiXcoder 7B Base represents a significant milestone. The current version of aiXcoder 7B Base is a foundational model that focuses on improving the efficiency and accuracy of code completion and code generation tasks, aiming to provide robust support for developers in these scenarios. It is important to note that this version has not undergone specific instruct-tuning, which means it might not yet offer optimal performance for specialized higher-level tasks such as test case generation and code debugging.

However, we have plans for further development of the aiXcoder model series already in motion. In the near future, we aim to release new versions of the model that have been meticulously instruct-tuned for a wider range of programming tasks, including but not limited to test case generation and code debugging. Through these instruct-tuned models, we anticipate offering developers more comprehensive and deeper programming support, helping them to maximize efficiency at every stage of software development.

table_1

aiXcoder 7B surpasses mainstream models in nl2code benchmark. aiXcoder-7B is an enhancement of aiXcoder-7B-Base, fine-tuned on one hundred thousand data entries similar to Evol-instruct for one epoch.



table_3

aiXcoder 7B Base surpasses mainstream models in code completion scenarios.



Quickstart

Environment Requirements

Option 1: Build Env

To run the model inference code, you'll need the following environment setup:

  • Python 3.8 or higher
  • PyTorch 2.1.0 or higher
  • sentencepiece 0.2.0 or higher
  • transformers 4.34.1 or higher (if run inference by transformers library)

Please ensure all dependencies are installed using the following command:

conda create -n aixcoder-7b python=3.11
conda activate aixcoder-7b
git clone [email protected]:aixcoder-plugin/aiXcoder-7b.git
cd aiXcoder-7b
pip install -r requirements.txt

requirements.txt listed all necessary libraries and their versions.

To achieve faster inference speeds, especially for large models, we recommend installing flash attention. Flash attention is an optimized attention mechanism that significantly reduces computation time for transformer-based models without sacrificing accuracy.

Before proceeding, ensure your environment meets the CUDA requirements as flash attention leverages GPU acceleration. Follow these steps to install flash attention:

git clone [email protected]:Dao-AILab/flash-attention.git
cd flash-attention
MAX_JOBS=8 python setup.py install

Option 2: Docker

For a consistent and isolated environment, we recommend running the model inference code using Docker. Here's how to set up and use Docker for our model:

  1. Install Docker: If you haven't already, install Docker on your machine.

  2. Pull the Docker Image: Pull the Docker image from Docker Hub.

docker pull pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel
  1. Run the Container: Once the image is pulled, you can run the model inside a Docker container.
docker run --gpus all -it -v /dev/shm:/dev/shm --name aix_instance pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel /bin/bash
pip install sentencepiece
git clone [email protected]:aixcoder-plugin/aiXcoder-7b.git
cd aiXcoder-7b

This command starts a container named aix_instance from the pytorch image. You can interact with the model inside this container.

To achieve faster inference speeds, especially for large models, we recommend installing flash attention.

git clone [email protected]:Dao-AILab/flash-attention.git
cd flash-attention
MAX_JOBS=8 python setup.py install
  1. Model Inference: Within the Docker container, you can run the model inference code as described in the Inference Example section.

Using Docker provides a clean, controlled environment that minimizes issues related to software versions and dependencies.

Model Weights

You can download the model weights from the following link:

Inference Example

Command Line Execution

For a quick start, you can run the model inference directly from the command line:

torchrun --nproc_per_node 1 sess_megatron.py --model_dir "path/to/model_weights_dir"

Replace "path/to/model_weights_dir" with the actual path to your downloaded model weights.

or run inference with huggingface's transformers:

python sess_huggingface.py

Python Script Execution

Alternatively, you can invoke the model programmatically within your Python scripts. This method provides more flexibility for integrating the model into your applications or workflows. Here's a simple example on how to do it:

from sess_megatron import TestInference

infer = TestInference()
res = infer.run_infer(
    # for FIM style input, code_string stands for prefix context
    code_string="""# 快速排序算法""", 
    # for FIM style input, later_code stands for suffix context
    later_code="\n",
    # file_path should be a path from project to file
    file_path="test.py",
    # max num for generated tokens
    max_new_tokens=256,
)
print(res)

"""output:

def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    less = [i for i in arr[1:] if i <= pivot]
    greater = [i for i in arr[1:] if i > pivot]
    return quick_sort(less) + [pivot] + quick_sort(greater)


# 测试
arr = [3, 2, 1, 4, 5]
print(quick_sort(arr))  # [1, 2, 3, 4, 5]
"""
import torch
import sys
from hf_mini.utils import input_wrapper
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

tokenizer = AutoTokenizer.from_pretrained("aiXcoder/aixcoder-7b-base")
model = AutoModelForCausalLM.from_pretrained("aiXcoder/aixcoder-7b-base", torch_dtype=torch.bfloat16)


text = input_wrapper(
    # for FIM style input, code_string stands for prefix context
    code_string="# 快速排序算法",
    # for FIM style input, later_code stands for suffix context
    later_code="\n# 测试\narr = [3, 2, 1, 4, 5]\nprint(quick_sort(arr))  # [1, 2, 3, 4, 5]",
    # file_path should be a path from project to file
    path="test.py"
)

if len(text) == 0:
    sys.exit()

inputs = tokenizer(text, return_tensors="pt", return_token_type_ids=False)

inputs = inputs.to(device)
model.to(device)

outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))



"""output:
def quick_sort(arr):
    # 如果数组长度小于等于1,直接返回
    if len(arr) <= 1:
        return arr
    # 选择数组的第一个元素作为基准
    pivot = arr[0]
    # 初始化左右指针
    left, right = 1, len(arr) - 1
    # 循环直到左指针小于右指针
    while left < right:
        # 从右到左找到第一个小于基准的元素,与左指针元素交换
        if arr[right] < pivot:
            arr[left], arr[right] = arr[right], arr[left]
            left += 1
        # 从左到右找到第一个大于等于基准的元素,与右指针元素交换
        if arr[left] >= pivot:
            right -= 1
    # 将基准元素与左指针元素交换
    arr[left], arr[0] = arr[0], arr[left]
    # 对左半部分进行递归排序
    quick_sort(arr[:left])
    # 对右半部分进行递归排序
    quick_sort(arr[left + 1:])
    return arr</s>
"""

Quantized through bitsandbytes

We can also install Bitsandbytes through pip install bitsandbytes acceleration, and simply add configuration to perform int8 or int4 inference (if you need to further compress the temporary memory applied at runtime, it is recommended to install FlashAttention):

import sys
import torch
from hf_mini.utils import input_wrapper
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig    

# to use 4bit use `load_in_4bit=True` instead
bnb_config = BitsAndBytesConfig(load_in_8bit=True) 

device = "cuda" # the device to load the model onto

tokenizer = AutoTokenizer.from_pretrained("aiXcoder/aixcoder-7b-base")
model = AutoModelForCausalLM.from_pretrained("aiXcoder/aixcoder-7b-base", quantization_config=bnb_config, device_map=device, attn_implementation='flash_attention_2')

text = input_wrapper(
    code_string="# 快速排序算法",
    later_code="\n",
    path="test.py"
)

if len(text) == 0:
    sys.exit()

inputs = tokenizer(text, return_tensors="pt", return_token_type_ids=False)

inputs = inputs.to(device)    

outputs = model.generate(**inputs, max_new_tokens=256)
print(f"Model memory footprint: {model.get_memory_footprint() / 2**20:.2f} MB")
print(f"Torch max memory allocated: {torch.cuda.max_memory_allocated() / 2**20:.2f} MB")

"""
load_in_4bit=True:
    - Model memory footprint: 5656.52 MB
    - Torch max memory allocated: 6448.89 MB

load_in_8bit=True:
    - Model memory footprint: 9008.52 MB
    - Torch max memory allocated: 10061.51 MB
"""

Fine-tuning example

If you want to fine-tune on your own code, you can quickly get started with training using Huggingface's PEFT tools. Before doing so, you need to install the necessary libraries with pip install -r requirements_peft.txt.

Then, execute the training command:

accelerate launch finetune.py \
        --model_id "aiXcoder/aixcoder-7b-base" \
        --dataset_name "bigcode/the-stack-smol" \
        --subset "data/rust" \
        --dataset_text_field "content" \
        --split "train" \
        --max_seq_length 1024 \
        --max_steps 10000 \
        --micro_batch_size 1 \
        --gradient_accumulation_steps 8 \
        --learning_rate 5e-6 \
        --warmup_steps 20 \
        --fim_rate 0.5 \
        --num_proc "$(nproc)"

In the fine-tuning script, we have constructed a simple random FIM (Fill-In-the-Middle) training task that can train the model on the completion and generation capabilities on your own data. It should be noted that the aiXcoder-7b-base uses structured FIM during pre-training, which involves constructing a complete code block as the MIDDLE. However, creating such training data involves syntactic parsing, which may require developers to implement themselves.

Data for aiXcoder 7B

The data for aiXcoder is divided into a core dataset and an extended dataset. The core dataset comprises the programming languages commonly used in development, as well as natural languages closely related to code. The core dataset's programming languages mainly include nearly a hundred mainstream languages such as C++, Python, Java, and JavaScript, while the natural language component primarily consists of StackOverflow Q&As, technical blogs, code documentation, and computer science papers. The extended data mainly consists of filtered open-source code datasets, high-quality English natural language datasets, and high-quality Chinese natural language datasets.

The aiXcoder core dataset is mainly used to enhance the performance of the large code model in the aforementioned programming languages, undergoing a rigorous filtering and selection process. Specifically, this process includes the following steps: 1) Selection of raw data; 2) Comprehensive ranking and selection of projects; 3) Code deduplication and the removal of automatically generated code using methods such as MinHashes (Broder, 2000); 4) Identification and handling of personal sensitive information; 5) Cleaning of commented code; 6) Syntactic analysis to filter incorrect or anomalous code files; 7) Static analysis to detect and eliminate 163 types of high-risk bugs and 197 types of defects in mainstream programming languages such as Java, C++, Python, and JavaScript.

  1. Raw Data Selection
    • Exclude projects under copyleft licenses.
    • Deduplicate projects gathered from various code hosting platforms and open-source datasets
  2. Project-Level Comprehensive Ranking
    • Calculate project metrics, including the number of Stars, Git Commit counts, and the quantity of Test files.
    • Exclude the lowest 10% of data based on a comprehensive score.
  3. Code File-Level Filtering
    • Remove automatically generated code.
    • Employ near-deduplication for redundancy removal.
  4. Sensitive Information Removal
    • Use named entity recognition models to identify and delete sensitive information such as names, IP addresses, account passwords, and URLs.
  5. Commented Code
    • Randomly deleting large sections of commented code
  6. Syntax Analysis
    • Delete code with syntax parsing errors or syntactical errors in the top fifty languages.
  7. Static Analysis
    • Utilize static analysis tools to scan for and locate 161 types of Bugs affecting code reliability and maintainability, as well as 197 types of vulnerabilities impacting code security.
# "__init__" method should not return a value

# Noncompliant: a TypeError will be raised
class MyClass(object):
    def __init__(self):
        self.message = 'HelloWorld'
        return self  

# Compliant solution
class MyClass(object):
    def __init__(self):
        self.message = 'HelloWorld'

The mentioned code illustrates a bug pattern in Python where the init method should not return a value.

Training

Training Hyperparameters

Tokenizer:

  • Byte Pair Encoding (BPE) based on bytecode
  • Vocabulary size of 49,152

Model Structure:

  • RoPE (Rotary Positional Embedding) for relative position encoding
  • SwiGLU as the intermediate layer
  • Grouped Query Attention

Training Parameters:

  • Structured FIM (Fill in the middle) training tasks make up 70% of the training, while autoregressive training tasks account for 30%
  • Pretraining sequence length of 32,768

Batch processing method

After preprocessing, our code data is organized by project, with the order of files within a project considering both rules and randomness. Specifically, we attempt to cluster similar or dependent code files together using methods like Calling Graph, K-Means clustering, file path similarity, and TF-IDF distance, to help the model better understand the relationships between code files. However, the ordering of code files also incorporates randomness, since in real programming scenarios, projects are not complete, and code files with similarities or dependencies may not be fully developed yet.

By ensuring that the project code files overall exhibit randomness while locally having similar or dependent relationships, we stretch the project code files into a vector and organize the sequence of batches using the Transformer-XL style processing. Even though the sequence length of a single batch has already reached 32,768 during the pre-training process, this method still allows for the extension of the visible sequence length to be even longer.

Pre-training Tasks

Unlike other natural language large models or code models, in the context of code programming, aiXcoder considers the structural characteristics of code itself, aiming to have the model predict complete code nodes. In simple terms, the aiXcoder 7b training tasks combine the fill in the middle (FIM, Bavarian et al., 2022) and parser generator tool techniques. When constructing training data, we parse the code into an abstract syntax tree (AST) and randomly select a complete node to construct a FIM task. The rationale behind this approach is twofold: first, we need to ensure that the input data is relatively complete, with both the preceding and subsequent parts being at the same hierarchical level. Secondly, we also want the model's predictions to be more complete, with the generated code having a full hierarchical structure.

for i in range(20):
    if i % 5 == 0:
        print("Hello World")

table_0

Given that simple code can be parsed into an abstract syntax tree (AST), we will construct structured Fill In the Middle (FIM) training tasks based on the nodes of the AST.



Suppose we select the IF node in the above AST, then we will construct training samples from the IF node and its subtree. The following two examples are equivalent:

# fill in the middle, SPM mode
"<s>▁<AIX-SPAN-PRE>▁<AIX-SPAN-POST>        print(\"Hello World\")\n▁<AIX-SPAN-MIDDLE># the file path is: test.py\n# the code file is written by Python\nfor i in range(20):\n    if i % 5 == 0:<\s>"

# fill in the middle, PSM mode
"<s>▁<AIX-SPAN-PRE># the file path is: test.py\n# the code file is written by Python\nfor i in range(20):\n    if ▁<AIX-SPAN-POST>        print(\"Hello World\")\n▁<AIX-SPAN-MIDDLE>i % 5 == 0:<\s>"

Details of Experimental Results

NL2Code Benchmarks

Table 1 shows the performance of the aiXcoder-7B Base model on standalone method generation benchmarks. Our model achieves the current best results among the large-scale pre-trained base models within hundreds of billions of parameters.

table_1

Code Completion (Fill in the Middle)

Different from the standalone nl2code task in Table 1, in real-world programming scenarios, we need to consider the code completion capability in the context of the cursor position. Generally, various open-source large language models for code incorporate the Fill in the Middle (FIM) mode during pre-training to enhance the model's ability to generate more accurate results when considering the code context. Therefore, we will use FIM as the default code completion method to evaluate the performance of each model in real-world programming scenarios.

Currently, the mainstream evaluation dataset for context-aware code completion is the single-line evaluation method proposed by Santacoder (Ben Allal et al., 2023). This evaluation dataset extracts single lines of code from HumanEval or MultiPL-E and evaluates the Exact Match metric of the model's generated results, given the complete preceding and following context.

table_2

To further evaluate the code completion capabilities of large language models for code in a more fine-grained manner, aiXcoder has built an evaluation dataset that is larger in size, more diverse in the code being tested, longer in the context length of the code being tested, and closer to real-world development projects. This evaluation dataset will also be open-sourced on GitHub simultaneously. During the evaluation process, we ensure that different large language models for code use the same maximum sequence length of 16K and evaluate the generation performance in different scenarios, such as generating complete method blocks, conditional blocks, loop processing blocks, exception handling blocks, and a total of thirteen cases.

Table 3 shows the average generation performance of different models in different languages. The final evaluation results are the average of all completion scenarios and evaluation samples. The aiXcoder 7B Base model achieves the best performance across major programming languages and various evaluation criteria, indicating that aiXcoder 7B Base has the best basic code completion capability among all open-source models of the same scale and is the most suitable base model for providing code completion capabilities in real-world programming scenarios.

table_3

For each evaluation result in Table 3, there are more detailed evaluation dimensions. Tables 4 to 7 show the details of the multi-dimensional evaluation of different models in different languages:

  • Method signature indicates the model's capability to generate method signatures based on context.
  • Method body represents the model's ability to generate a complete method body based on context, including the function signature.
  • Single line refers to the completion of single lines of code.
  • Method with comment denotes generating a corresponding function body based on context, including function signatures and comments.
  • Empty indicates the model's ability to predict emptiness in the case of complete context.
  • Method body top, mid, bottom show the code generation performance respectively in the upper part of the function body, the middle part, and the lower part.
  • If, for, while, try, switch statement represent the effects of generating conditional code blocks, loop code blocks, exception catch blocks, and conditional branch blocks.

table_4

table_5

table_6

table_7

Cross-file Code Evaluation

Another important capability of large language models for code is the ability to understand code context across files, as developers often need to consider information from other files within the current project when writing code. Therefore, we adopted the CrossCodeEval (Ding et al., 2023) evaluation dataset to assess the model's ability to extract cross-file contextual information.

In Table 8, we fix the context length for all models at 16K and format the input using the PSM pattern in FIM. After the model completes inference, all output results are decoded using Greedy Search. First, as a baseline, we evaluate the generation capabilities of various large code models in a single-file scenario.

Then, using BM25 as the similarity metric, we search for the three most similar code blocks within the project as prompts to reassess the model's generation performance. Finally, "w/Ref." indicates that we assume we know what the correct Reference code looks like, and then search for the three most similar codes within the project as prompts to re-evaluate the model's generation performance.

Ultimately, the aiXcoder-7B model performs very well in all languages, demonstrating our model's ability to extract contextual information, especially cross-file contextual information.

table_8

License

The source code in this repository is licensed under the Apache-2.0 License - see the LICENSE file for details. The model weights are licensed under the Model License for academic research use; for commercial use, please apply by sending an email to [email protected].

Acknowledgments

We would like to thank all contributors to the open-source projects and datasets that made this work possible.

Thank you for your interest in our Code Large Language Model. We look forward to your contributions and feedback!

aixcoder-7b's People

Contributors

dingjz avatar eltociear avatar geeeekexplorer avatar horatiojsy avatar near500 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aixcoder-7b's Issues

Can I configure the JetBrains plugin to point to a locally self-hosted aiXcoder model?

I have deployed an aiXcoder model locally, but when using the aiXcoder plugin, I noticed that it requires me to log in. After a successful login, I couldn't find an option to configure the plugin to point to my self-hosted aiXcoder model. Is there a way to set the plugin to use a locally deployed model? Or are there any plans to introduce this configuration option in the future? Many thanks!

humaneval 效果测试

请问如果我要用aixcoder-7b做一个纯生成的任务,而不是FIM格式的,例如一个代码网页问答页面, 参数如何设置.
比如 aixcoder-7b在测试生成类题目,如human-eval python的时候, later_code和file_path,解码方式以及推理参数是如何设置的?

[Question] Training about aiXcoder-7B

Congratulations on this wonderful work! I noticed that the Evol-Instruct method is utilized in aiXcoder-7B training. There are some differences between the traditional implementation of Evol-Instruct and aiXcoder's prompts modified based on FIM. Is there any specific implementation strategy or example for it? Thanks!

For enabling scripting option via this code

We can record all the command that was hit during an activity using script functionality in linux and can use this code generator to create a script to perform that activity

-- All the parameters that are used during activity can be asked at one go in form of a batch file

Used Chatgpt to explain more

In our current workflow, documenting and replicating complex activities in Linux environments is time-consuming and prone to errors. Manually recording every command executed during a task is tedious and often leads to inconsistencies. Moreover, recreating these activities requires manual intervention and may result in deviations from the original process. This inefficiency not only hampers productivity but also poses risks to the reliability and stability of our operations.

Objective:
The primary objective of this project is to develop a tool that automates the recording of commands executed during activities in Linux environments. This tool will capture the command sequence and generate a script that can be used to replicate the activity accurately. Furthermore, the tool will incorporate functionality to create batch files that prompt users for parameters, simplifying the execution of tasks with varying inputs.

Solution Overview:
The proposed solution consists of two main components:

Command Recording Module: This module will intercept and record all commands executed within a designated session or timeframe. It will capture the command sequence along with relevant metadata such as timestamps and user identifiers. The recorded data will be stored in a structured format for further processing. -- > This part is already done using "script" command in linux

Script Generation Module: Upon completion of an activity, the recorded commands will be processed by the script generation module to create a reproducible script. This script will encapsulate the sequence of commands required to perform the activity, ensuring consistency and accuracy in subsequent executions. Additionally, the module will provide an option to generate batch files that prompt users for input parameters, enhancing flexibility and usability.

Benefits:

Time Savings: By automating the process of recording and scripting activities, we can significantly reduce the time required to document and replicate tasks.
Accuracy and Consistency: The generated scripts ensure that activities are performed consistently, reducing the risk of errors and deviations.
Usability: Batch file generation simplifies the execution of tasks by prompting users for input parameters, making it easier to adapt scripts to different scenarios.
Knowledge Sharing and Collaboration: Standardized scripts enable seamless sharing of best practices and facilitate collaboration among team members.
Audit Trail: The recorded command history provides a detailed audit trail of activities, enhancing accountability and compliance.

支持多卡部署吗?

作为一个新手,我在部署的时候遇到了OOM的问题,我想问一下怎么样才能够多卡部署,能够给我一个示例吗?我将万分感激!
img_v3_02a1_085fbe9a-e187-46fc-a913-5331f208122g

[New Feature] Add some instructions for Hugging Face based methods

While previous version didn't explain how the weights should be loaded, most runs used the default loading method (downloaded from the modelscope), this version some instructions will be added to help those who want to run the model based on the hugging face framework loading local weight.

能否提供预训练脚本

基于公司数据做持续预训练, 可否提供预训练脚本,包括 lr,max_seq_len, 及其他需要注意的细节等

补全如何使用

我看了下你的 sess_huggingface.py文件,我的需求是想做一个代码补全的demo,但是发现回复的内容太多,我在思考是否跟input_wrapper提供的参数设置有关系
调用例子如以下:
"code_string": "The programming language I am using is Java. I only need you to complete the possible code that may be written at the end of my code, without providing any extra explanation or description. Please directly complete the code for your response field. If it cannot be processed, return an empty string. The current code is public static void main(String[] args) {\n SpringApplication application \u003d new SpringApplication(TranCoderApplication.class);\n application.addInitializers(SpringBeanUtil::setApplicationContext);\n spring."
我希望接口返回run(args);
但是实际接口返回:

the file path is: test.py

the code file is written by Python

The programming language I am using is Java. I only need you to complete the possible code that may be written at the end of my code, without providing any extra explanation or description. Please directly complete the code for your response field. If it cannot be processed, return an empty string. The current code is public static void main(String[] args) {
SpringApplication application = new SpringApplication(TranCoderApplication.class);
application.addInitializers(SpringBeanUtil::setApplicationContext);
spring.application.run(args);
}

Assistant:

public static void main(String[] args) {
SpringApplication application = new SpringApplication(TranCoderApplication.class);
application.addInitializers(SpringBeanUtil::setApplicationContext);
spring.application.run(args);
}

普通问答模式的prompt应该怎么包装

在源码中 input_wrapper 可以很好的包装代码生成prompt。
image

1.请问普通问答模式的prompt应该怎么包装?像chatgtp那样问答
例如:
Q:请使用C#生成一个快速排序算法
A:xxxx

2.发现一个问题,就是代码生成完成之后的末尾,最后都会带上一个
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.