When I modified “run_example.sh” and changed backend to vllm, I got

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Thanks <a class="user-mention notranslate" data-hovercard-type="user" dat

run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely about deepspeedexamples HOT 11 CLOSED

awan-10 commented on September 22, 2024

run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely

from deepspeedexamples.

Comments (11)

delock commented on September 22, 2024 2

Hi @lekurile the benchmark will proceed but will hit some other error when running on CPU. I'll check with vllm cpu engineers to investigate these errors. I also submitted a PR adding a flag allowing start the server in seperate command line.
#900

from deepspeedexamples.

awan-10 commented on September 22, 2024

@delock - FYI. Created this issue so we can track and fix it. Please work with folks assigned on this issue.

from deepspeedexamples.

lekurile commented on September 22, 2024

Hello @delock,

Thank you for raising this issue. I ran a local vllm benchmark with the microsoft/Phi-3-mini-4k-instruct model using the following code:

# Run benchmark
python ./run_benchmark.py \
        --model microsoft/Phi-3-mini-4k-instruct \
        --tp_size 1 \
        --num_replicas 1 \
        --max_ragged_batch_size 768 \
        --mean_prompt_length 2600 \
        --mean_max_new_tokens 60 \
        --stream \
        --backend vllm \
        --overwrite_results \

### Gernerate the plots
python ./src/plot_th_lat.py --data_dirs results_vllm/

echo "Find figures in ./plots/ and log outputs in ./results/"

I also had to add the "--trust-remote-code", argument to the vllm_cmd here:

DeepSpeedExamples/benchmarks/inference/mii/src/server.py

Line 39 in 1be0fc7

args.model,

Here's the resulting plot:

To reproduce the issue you show above, can you please provide a reproduction script so I can test on my end?

To answer your question:

Is it possible to run this script to benchmark a local API server? I kind of thinking run vllm serving in separate command, and use this benchmark to test the api server vllm started. So I would have better control on how the vllm server started and see all the error message from vllm server if it fails.

We can update the benchmarking script and add an additional argument, where existing local server information is provided and the script will not stand up a new server, but will instead target the existing server using the information provided.

from deepspeedexamples.

delock commented on September 22, 2024

@awan-10 @lekurile Thanks for start this thread. I met this error when I tried to run this example on Xeon server with CPU. I suspect this is a configuration issue. Currently, I plan to modify the script to run client code only, and start the server on seperate command line, so will be able to see more error message and get better understanding.

from deepspeedexamples.

delock commented on September 22, 2024

Hi @lekurile
Now I can start the server from seperate command line and run benchmark on this server with reduced test size (max batch 128, avg prompt128) to start with.

Yet I met the following error during post processing, I suspect this is due to transformers version. What is the transformers version you are using? My version is transformers==4.40.1

Traceback (most recent call last):
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/./run_benchmark.py", line 44, in <module>
    run_benchmark()
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/./run_benchmark.py", line 36, in run_benchmark
    print_summary(client_args, response_details)
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/src/utils.py", line 235, in print_summary
    ps = get_summary(vars(args), response_details)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/src/postprocess_results.py", line 80, in get_summary
    [
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/src/postprocess_results.py", line 81, in <listcomp>
    (len(get_tokenizer().tokenize(r.prompt)) + len(get_tokenizer().tokenize(r.generated_tokens)))
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gma/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 396, in tokenize
    return self.encode_plus(text=text, text_pair=pair, add_special_tokens=add_special_tokens, **kwargs).tokens()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gma/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3037, in encode_plus
    return self._encode_plus(
           ^^^^^^^^^^^^^^^^^^
  File "/home/gma/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
    batched_output = self._batch_encode_plus(
                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gma/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

from deepspeedexamples.

lekurile commented on September 22, 2024

Hi @delock,

I'm using transformers==4.40.1 as well.

After #895 was committed to the repo, I'm seeing the same error on my end as well.

  File "/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

Can you please try detaching your repo HEAD to fab5d06, one commit prior, and running again? I'll look into this PR and see if we need to revert or not.

Thanks,
Lev

from deepspeedexamples.

lekurile commented on September 22, 2024

@delock, here's the PR fixing the tokens_per_sec metric to work for both the streaming and non-streaming cases:
#897

You should be able to get past your error above with this PR, but I'm curious if you're seeing any failures still.

from deepspeedexamples.

delock commented on September 22, 2024

Yes, the latest version can going forward. Will see whether it can continue.

@delock, here's the PR fixing the tokens_per_sec metric to work for both the streaming and non-streaming cases: #897

You should be able to get past your error above with this PR, but I'm curious if you're seeing any failures still.

from deepspeedexamples.

loadams commented on September 22, 2024

Thanks @delock - can we close this issue for now?

from deepspeedexamples.

delock commented on September 22, 2024

Thanks @delock - can we close this issue for now?

Yes, this is no longer an issue now, thanks!

from deepspeedexamples.

loadams commented on September 22, 2024

Thanks!

from deepspeedexamples.

run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely about deepspeedexamples HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent