Comments (2)
Hi Nathan, for FP8 quantization, there are two currently offered choices - SmoothQuant and AWQ.
For SmoothQuant for example, to enable FP8 smoothquant, the options you can add are
option.quantize = smoothquant
option.smoothquant_alpha = 0.8
option.smoothquant_per_channel = true
option.smoothquant_per_token = true
option.dtype = fp8
from djl.
Ah thanks @ydm-amazon - I was aware of both, but am concerned about the quality difference in the model outputs given the reported MMLU decrease of SmoothQuant versus the "native" FP8. TGI recently added fp8 but indicate it only works on Hopper architecture onward. I suppose because it's the first architecture that natively supports fp8 operations.
Couple of follow-up questions:
- Are there still plans to support this "native" fp8 mentioned in TRT-LLM and recently added to TGI?
- Can you confirm that when using smoothquant that
dtype
should be set tofp8
? The examples in the DJL docs seem to keepoption.dtype = fp16
when using both smoothquant and awq. - I'm not sure smoothquant, but I believe AWQ requires calibration, and there are two
option.
parameters regarding calibration. Which dataset is used as the calibration set for calibrated quantization methods if we use JIT engine compilation? Is it possible to pack a calibration dataset with model files for JIT AWQ compilation if needed?
from djl.
Related Issues (20)
- 0.27调用tensorflow的pb模型崩溃(可以加载模型,推理时崩溃) HOT 12
- TorchScript inference slower than default torch model HOT 4
- [FATAL] extensions/tokenizers/rust/src/lib.rs crashes the process HOT 1
- CUBLAS_STATUS_NOT_INITIALIZED HOT 10
- How to run FLOAT16 OnnxRuntime models HOT 3
- UnsatisfiedLinkError: 'boolean ai.djl.pytorch.jni.PyTorchLibrary.torchIsContiguous(long)' HOT 2
- How can I implement the Adaline perceptron in DJL
- pytorch-model-zoo: PtSsdTranslator.Builder.self() returns null
- TextEmbeddingTranslator fails with "EngineException: Expected all tensors to be on the same device"
- tensorrt 的demo 有吗 yolov8的
- resize diff between java djl and python cv2 HOT 1
- Does Lightgbm support multi-class inference? HOT 1
- ONNX Engine Options Bug, ONNX features cannot be defined,It's a parameter type design problem HOT 2
- [pytorch] UnsatisfiedLinkError on Windows11/Intel HOT 3
- Windows libraries for pytorch-native-cpu are missing for version 2.3.0 from maven central. HOT 1
- ai.djl.nn.transformer IdEmbedding has memory leak. HOT 2
- PaddlePaddle引擎使用paddleocr v4版本的模型识别图像时报错,是还不支持paddleocr的v4模型吗? HOT 2
- TimeSeries API PyTorch Engine support
- TimeSeries API Bugs (frequency, context length, FEAT_DYNAMIC_REAL) HOT 4
- CPU Version of TensorFlow Native Package Incorrectly Includes GPU Dependencies HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from djl.