Comments (5)
I have no tools to measure it. So i handle it manually. I prompted 'please produce 500 tokens story of starwars' then copy pasted the text produced to count the tokens using the https://platform.openai.com/tokenizer and my stopwatch.
i have following results.
673 tokens in 50 sec : 13,46 t/sec
318 tokens in 25 sec : 12,72 t/sec
256 tokens in 23 sec : 11,13 t/sec
by the way, i noticed :
Usage: mlc_chat [--help] [--version] [--device-name VAR] [--artifact-path VAR] [--model VAR] [--dtype VAR] [--params VAR] [--evaluate]
Optional arguments:
-h, --help shows help message and exits
-v, --version prints version information and exits
--device-name [default: "auto"]
--artifact-path [default: "dist"]
--model [default: "vicuna-v1-7b"]
--dtype [default: "auto"]
--params [default: "auto"]
--evaluate
Is it just a matter of documentation that we would be able to already play with the arguments ?
from mlc-llm.
Hey thanks for the data! This is super valuable to us!
We updated mlc_chat_cli
this morning to include a command \stats
. Would you mind if you updated the conda environment to include this change?
To include this update, you will have to remove the package and install again (conda update
doesn't work for some reason):
conda remove mlc-chat-nightly
conda install -c conda-forge -c mlc-ai mlc-chat-nightly
Then the help message will show up when initializing the program, and you may use \stats
to get some details:
Thanks a bunch!
from mlc-llm.
Ok thx ! now with the /stats.
USER: /stats
encode: 35.4 tok/s, decode: 16.7 tok/s
USER: continue
ASSISTANT: In this epic (... removed ...)
USER: /stats
encode: 11.1 tok/s, decode: 14.0 tok/s
USER: continue
ASSISTANT: Sure, here's the continuation:
(... removed ...)
USER: /stats
encode: 37.7 tok/s, decode: 17.0 tok/s
from mlc-llm.
Thank you for sharing the information! We are currently gathering data points on runnable devices and their speed. Would you be willing to assist us in this effort by sharing the tokens/sec data on your GTX 1060?
from mlc-llm.
Thanks a lot for your swift response! The data is super valuable to us!
from mlc-llm.
Related Issues (20)
- mlc_llm/serve/engine.py", line 101, in _convert_model_info assert isinstance(chat_config.conv_template, Conversation) AssertionError HOT 2
- I have setup a fast api code on hugging face spaces to use phi 2 as a inference endpoint .
- [Bug] Generated texts not as expected on some models with ‘canonical simplification of LE’ problem HOT 1
- [Question] Installation of mlc-llm prebuilt package on ubuntu22.04 failed HOT 10
- InternalError: Check failed: (func.defined()) is false: Error: Cannot find PackedFunc vm.builtin.paged_attention_kv_cache_attention_with_fused_qkv in either Relax VM kernel library HOT 2
- site-packages/tvm/relay/op/contrib/ethosn.py", line 20, in <module> from distutils.version import LooseVersion ModuleNotFoundError: No module named 'distutils' HOT 2
- [Bug] can't compile to wasm, is it possible in Windows using nightly wheel? HOT 1
- InternalError: Check failed: (config_istream) is false: HOT 2
- [Question] Any way to get the raw token output from the model? HOT 1
- [Feature Request] shard_strategy attr for qwen1.5-14B. fully support for qwen2 HOT 2
- [Bug] PackedFunc mlc.create_paged_kv_cache_generic cannot find HOT 4
- [Bug] KVCache takes up too much memory when running mlc_llm.serve.server HOT 2
- [Bug] The app “MLCChat” has been killed by the operating system because it is using too much memory. HOT 2
- Android app does not run model HOT 9
- [Model Request] FairSeq model for MLC-LLM
- [Bug] Run Models through MLCChat CLI Error HOT 3
- [Question] Save/load kv cache for faster load times?
- [Bug] Failed to set the allowed dynamic shared memory size to 73728 HOT 3
- [Bug] ValueError: Check failed: (f != nullptr) is false: Cannot find function vm.builtin.kv_state_clear HOT 7
- [Bug] GPT2 paged kv-cache not working, gives TVMError HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlc-llm.