ChatGPT is a text-based AI assistant by OpenAI. This is an analysis of ChatGPT.
ChatGPT can be used with 2 different models:
- gpt-3.5-turbo
- gpt-4
The model gpt-3.5-turbo
is different from previous models. The model uses a new vocabulary cl100k_base
with 100.000 tokens and the Chat Markup Language.
If we send the message [{"role": "user", "content": "13+37="}]
to the model, we get the following chat completion response:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "50",
"role": "assistant"
}
}
],
"created": 123,
"id": "chatcmpl-XXX",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion",
"usage": {
"completion_tokens": 1,
"prompt_tokens": 12,
"total_tokens": 13
}
}
The number of prompt tokens and completion tokens are computed as follows:
prompt_tokens = ['<|im_start|>', 'user', '\n', '13', '+', '37', '=', '<|im_end|>', '\n', '<|im_start|>', 'assistant']
# len(tokens) is 11
It's unclear why the model returns 12 prompt tokens instead of 11. Maybe a newline is added after the word
assistant
.
completion_tokens = ['50']
# len(tokens) is 1
The gpt-4
model also uses the new vocabulary cl100k_base
but it returns a different number of prompt tokens compared to gpt-3.5-turbo
. If we send the message [{"role": "user", "content": "13+37="}]
to the model, it returns the following chat completion response:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "50",
"role": "assistant"
}
}
],
"created": 123,
"id": "chatcmpl-XXX",
"model": "gpt-4-0314",
"object": "chat.completion",
"usage": {
"completion_tokens": 1,
"prompt_tokens": 11,
"total_tokens": 12
}
}
We have evaluated the ChatGPT model gpt-4-0314
with the HumanEval dataset. Out of 164 programming problems, the model can solve 78.66%
.
Model name | Pass@1 | Date | Comment | Completions of evaluation run | Prompt |
---|---|---|---|---|---|
gpt-4-0314 | 78.66% | 2023-03-17 | https://openai.com/api/ | 2023-03-17-samples-gpt-4-0314.jsonl | Complete the following code:\n{code} |
gpt-3.5-turbo-0301 | 72.56% | 2023-03-01 | https://openai.com/api/ | 2023-03-01-samples-gpt-3.5-turbo-0301.jsonl | |
text-davinci-002-render-sha | 70.12% | 2023-02-19 | https://chat.openai.com/ | 2023-02-19-samples-text-davinci-002-render-sha.jsonl | |
text-davinci-002-render | 56.10% | 2022-12-03 | https://chat.openai.com/ | ||
cushman-ml | 56.10% | 2022-10-23 | Copilot | ||
code-davinci-002 | 46.95% | 2022-10-23 | https://openai.com/api/ | ||
code-cushman-001 | 32.93% | 2022-10-23 | https://openai.com/api/ |