Comments (6)
@fxmarty Thanks for the prompt response. I can see that torch 2.0.1 helps in reducing the variance.
However I still see decent variance in results when comparing the models in fp16 mode ( by using model.half()
) .
from optimum.
@fxmarty
Adding model.half()
to the above code
from transformers import AutoModelForSequenceClassification , AutoTokenizer
from optimum.bettertransformer import BetterTransformer
original_model.half()
transformed_model.half()
tokenizer=AutoTokenizer.from_pretrained("BAAI/bge-reranker-large")
original_model = AutoModelForSequenceClassification.from_pretrained("BAAI/bge-reranker-large").to('cuda:0')
transformed_model = BetterTransformer.transform(original_model, keep_original_model=True).to('cuda:0')
sentences_batch=[['do you like fox cookies', 'fox big brown fox']]
inputs = tokenizer(sentences_batch,padding=True,truncation=True,return_tensors="pt",max_length=512,).to('cuda:0')
better_transformer_scores = transformed_model(**inputs, return_dict=True).logits.view(-1).float()
print(f"BetterTransfomer output: {better_transformer_scores.detach().cpu().numpy().tolist()}")
vanilla_model_scores = original_model(**inputs, return_dict=True).logits.view(-1).float()
print(f"Vanilla model output :{vanilla_model_scores.detach().cpu().numpy().tolist()}")
produces output
BetterTransfomer output: [-7.35546875]
Vanilla model output :[-7.3515625]
I've also observed a higher degree of variability when trying out input of more than 1 batch size .
For instance, with
sentences_batch=[
['do you like fox cookies', 'fox big brown fox']
,['do you like fox cookies', 'fox big big brown fox']
,['do you like fox cookies', 'fox small tiny brown fox'],
['n the middl just loading a mookies', 'fox small tiny brown fox happen in the middl just loading a monthly rollup table']
,['do you like fox cookies', 'fox big hello world from the Since most of these loads happen in the middl just loading a monthly rollup table when the regular table load happens. I chose a replace into option, brown fox']
]
I see output as below, where the relative difference for 4th item is high ( around 0.23% )
BetterTransfomer output: [-7.35546875, -7.51171875, -8.21875, -1.27734375, -9.4765625]
Vanilla model output :[-7.35546875, -7.515625, -8.2265625, -1.2802734375, -9.4765625]
from optimum.
Hi @kapilsingh93, thank you, I can reproduce (only on CUDA device though), this is not expected, sorry for the issue. Let me fix shortly.
from optimum.
@kapilsingh93 Interestingly downgrading to torch 2.0.1 fixes the issue... It may be a torch regression. I hit the issue even with torch.backends.cuda.sdp_kernel(enable_flash=False, enable_math=True, enable_mem_efficient=False)
, only on CUDA device
from optimum.
@kapilsingh93 It would help to debug if you can confirm whether using torch 2.0.1 helps bringing back equal performance.
from optimum.
@kapilsingh93 Can you share a reproduction in fp16?
from optimum.
Related Issues (20)
- Support for `torch.export.export` HOT 2
- Lift upper version limit of transformers for habana HOT 4
- Add support for export SigLIP models HOT 8
- I had a problem mimicking this site: https://community.amd.com/t5/ai/developer-blog-build-a-chatbot-with-ryzen-ai-processors/ba-p/680693
- TypeError: QDQQuantizer.__init__() got an unexpected keyword argument 'static'
- [Bettertransformer] `Transformers 4.41.0 (torch.SDPA-Bert)` breaks bettertransformers Bert, but works in `Transformers 4.40.2` HOT 7
- Custom MT5 model ONNX export not able to run HOT 1
- Add feature to export CLIPVisionModel from transformers to onnx
- Add feature to export CLIPVisionModel from transformers to onnx
- exporting ORTModelForVision2Seq doesn't work correctly on Pytorch 1.11 HOT 4
- Add ORT support for the StableDiffusionControlNetPipeline. HOT 1
- Could you provide the official onnx model of Qwen-VL-Chat(-Int4)?
- `merge_decoders` mixes up `decoder_model` and `decoder_model_with_past`
- ONNX export support for 4-bit quantized models HOT 1
- Support for phi3-v Vision Model HOT 1
- Quantization by optimum-cli ignores 'use_external_data_format' flag in ORTConfig and fails to quantize models >2GB
- Support for Florence 2 model HOT 3
- ONNX support for OneFormer Semantic Segmentation Task HOT 2
- Wrong output from ONNX speculative decoding (assistant_model) HOT 1
- is_cupy_available() can't find CuPy wheel distributions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optimum.