Comments (8)
I knew it will break when I rush the fix without unittest :(
This is a regression caused by last weeks PR. Fixed with #3095
from djl.
You are right, SentenceTransformer
has an extra Linear layer. We use huggingface AutoModel to trace the model with doesn't include the pooling and dens. In our TranslatorFactory we applied the pooling, but we didn't add Linear.
There two solutions:
- you can trace the model with
SentenceTransformer
, so the script model will contain the missing layer. - Write a custom Translator, manually load Linear weights and apply linear layer.
from djl.
Here is a demo code that can produce the identical result as SentenceTransformer
:
Criteria<String, float[]> criteria =
Criteria.builder()
.setTypes(String.class, float[].class)
.optModelUrls("djl://ai.djl.huggingface.pytorch/sentence-transformers/clip-ViT-B-32-multilingual-v1")
.optTranslatorFactory(new TextEmbeddingTranslatorFactory())
.optArgument("normalize", "false")
.optEngine("PyTorch")
.build();
try (ZooModel<String, float[]> model = criteria.loadModel();
Predictor<String, float[]> predictor = model.newPredictor()) {
float[] embeddings = predictor.predict("fast blue car");
NDManager manager = model.getNDManager();
try (InputStream is = Files.newInputStream(Paths.get("model.safetensors"))) {
NDList list = NDList.decode(manager, is);
NDArray weight = list.get(0);
NDArray array = manager.create(embeddings);
NDArray output = array.getNDArrayInternal().linear(array, weight, null).get(0);
float[] finalEmbeddings = output.toFloatArray();
System.out.println(finalEmbeddings.length);
}
}
from djl.
@frankfliu - many thanks for the sample code. Once I downloaded the model.safetensors
file so it is accessible, I ran your code, but it only returned 1, rather than 512. To be precise, it returned a float array with this value: [0.0038979053]
. Any ideas what is wrong here?
I also tried your first suggestion of tracing the model, using this code:
from opensearch_py_ml.ml_models import SentenceTransformerModel
model_id = "sentence-transformers/clip-ViT-B-32-multilingual-v1"
folder_path = "/data/output/djl/clip-ViT-B-32-multilingual-v1"
pre_trained_model = SentenceTransformerModel(model_id=model_id, folder_path=folder_path, overwrite=True)
model_path = pre_trained_model.save_as_pt(model_id=model_id, sentences=["fast blue car"])
However, when I try an use this model from within djl-serving, I seem to get this error:
TRACE Predictor Predictor input data: [ND: (5) cpu() int64
[ 101, 15040, 23254, 13000, 102]
, ND: (5) cpu() int64
[ 1, 1, 1, 1, 1]
]
DEBUG JniUtils Expected at most 2 argument(s) for operator 'forward', but received 3 argument(s). Declaration: forward(__torch__.sentence_transformers.SentenceTransformer.SentenceTransformer self, Dict(str, Tensor) input) -> Dict(str, Tensor)
Exception raised from checkAndNormalizeInputs at ../aten/src/ATen/core/function_schema_inl.h:392 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x77360a6b405b in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xbf (0x77360a6aef6f in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/libc10.so)
frame #2: void c10::FunctionSchema::checkAndNormalizeInputs<c10::Type>(std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) const + 0x6ae (0x7734c1dbd03e in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/libtorch_cpu.so)
frame #3: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) const + 0x173 (0x7734c4dc0b53 in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/libtorch_cpu.so)
frame #4: <unknown function> + 0x12c950 (0x77353172c950 in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/0.26.0-libdjl_torch.so)
frame #5: Java_ai_djl_pytorch_jni_PyTorchLibrary_moduleRunMethod + 0x183 (0x77353172cbbc in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/0.26.0-libdjl_torch.so)
frame #6: [0x7735ec80fa70]
WARN WorkerThread Failed to predict
ai.djl.translate.TranslateException: ai.djl.engine.EngineException: Expected at most 2 argument(s) for operator 'forward', but received 3 argument(s). Declaration: forward(__torch__.sentence_transformers.SentenceTransformer.SentenceTransformer self, Dict(str, Tensor) input) -> Dict(str, Tensor)
at ai.djl.inference.Predictor.batchPredict(Predictor.java:192) ~[api-0.26.0.jar:?]
at ai.djl.serving.wlm.ModelInfo$ModelThread.lambda$run$1(ModelInfo.java:1088) ~[wlm-0.26.0.jar:?]
at ai.djl.serving.wlm.Job.runAll(Job.java:68) ~[wlm-0.26.0.jar:?]
at ai.djl.serving.wlm.ModelInfo$ModelThread.run(ModelInfo.java:1088) ~[wlm-0.26.0.jar:?]
at ai.djl.serving.wlm.WorkerThread.runJobs(WorkerThread.java:137) ~[wlm-0.26.0.jar:?]
at ai.djl.serving.wlm.WorkerThread.run(WorkerThread.java:87) [wlm-0.26.0.jar:?]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: ai.djl.engine.EngineException: Expected at most 2 argument(s) for operator 'forward', but received 3 argument(s). Declaration: forward(__torch__.sentence_transformers.SentenceTransformer.SentenceTransformer self, Dict(str, Tensor) input) -> Dict(str, Tensor)
at ai.djl.pytorch.jni.PyTorchLibrary.moduleRunMethod(Native Method) ~[pytorch-engine-0.26.0.jar:?]
at ai.djl.pytorch.jni.IValueUtils.forward(IValueUtils.java:57) ~[pytorch-engine-0.26.0.jar:?]
at ai.djl.pytorch.engine.PtSymbolBlock.forwardInternal(PtSymbolBlock.java:146) ~[pytorch-engine-0.26.0.jar:?]
at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:79) ~[api-0.26.0.jar:?]
at ai.djl.nn.Block.forward(Block.java:127) ~[api-0.26.0.jar:?]
at ai.djl.inference.Predictor.predictInternal(Predictor.java:143TRACE Predictor Predictor input data: [ND: (5) cpu() int64
[ 101, 15040, 23254, 13000, 102]
, ND: (5) cpu() int64
[ 1, 1, 1, 1, 1]
]
DEBUG JniUtils Expected at most 2 argument(s) for operator 'forward', but received 3 argument(s). Declaration: forward(__torch__.sentence_transformers.SentenceTransformer.SentenceTransformer self, Dict(str, Tensor) input) -> Dict(str, Tensor)
Exception raised from checkAndNormalizeInputs at ../aten/src/ATen/core/function_schema_inl.h:392 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x77360a6b405b in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xbf (0x77360a6aef6f in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/libc10.so)
frame #2: void c10::FunctionSchema::checkAndNormalizeInputs<c10::Type>(std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) const + 0x6ae (0x7734c1dbd03e in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/libtorch_cpu.so)
frame #3: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) const + 0x173 (0x7734c4dc0b53 in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/libtorch_cpu.so)
frame #4: <unknown function> + 0x12c950 (0x77353172c950 in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/0.26.0-libdjl_torch.so)
frame #5: Java_ai_djl_pytorch_jni_PyTorchLibrary_moduleRunMethod + 0x183 (0x77353172cbbc in /data/nuix/djl-serving/cache/pytorch/2.0.1-cpu-linux-x86_64/0.26.0-libdjl_torch.so)
frame #6: [0x7735ec80fa70]
WARN WorkerThread Failed to predict
ai.djl.translate.TranslateException: ai.djl.engine.EngineException: Expected at most 2 argument(s) for operator 'forward', but received 3 argument(s). Declaration: forward(__torch__.sentence_transformers.SentenceTransformer.SentenceTransformer self, Dict(str, Tensor) input) -> Dict(str, Tensor)
at ai.djl.inference.Predictor.batchPredict(Predictor.java:192) ~[api-0.26.0.jar:?]
at ai.djl.serving.wlm.ModelInfo$ModelThread.lambda$run$1(ModelInfo.java:1088) ~[wlm-0.26.0.jar:?]
at ai.djl.serving.wlm.Job.runAll(Job.java:68) ~[wlm-0.26.0.jar:?]
at ai.djl.serving.wlm.ModelInfo$ModelThread.run(ModelInfo.java:1088) ~[wlm-0.26.0.jar:?]
at ai.djl.serving.wlm.WorkerThread.runJobs(WorkerThread.java:137) ~[wlm-0.26.0.jar:?]
at ai.djl.serving.wlm.WorkerThread.run(WorkerThread.java:87) [wlm-0.26.0.jar:?]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: ai.djl.engine.EngineException: Expected at most 2 argument(s) for operator 'forward', but received 3 argument(s). Declaration: forward(__torch__.sentence_transformers.SentenceTransformer.SentenceTransformer self, Dict(str, Tensor) input) -> Dict(str, Tensor)
at ai.djl.pytorch.jni.PyTorchLibrary.moduleRunMethod(Native Method) ~[pytorch-engine-0.26.0.jar:?]
at ai.djl.pytorch.jni.IValueUtils.forward(IValueUtils.java:57) ~[pytorch-engine-0.26.0.jar:?]
at ai.djl.pytorch.engine.PtSymbolBlock.forwardInternal(PtSymbolBlock.java:146) ~[pytorch-engine-0.26.0.jar:?]
at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:79) ~[api-0.26.0.jar:?]
at ai.djl.nn.Block.forward(Block.java:127) ~[api-0.26.0.jar:?]
at ai.djl.inference.Predictor.predictInternal(Predictor.java:143) ~[api-0.26.0.jar:?]
at ai.djl.inference.Predictor.batchPredict(Predictor.java:170) ~[api-0.26.0.jar:?]
... 10 more) ~[api-0.26.0.jar:?]
at ai.djl.inference.Predictor.batchPredict(Predictor.java:170) ~[api-0.26.0.jar:?]
... 10 more
I've probably done something dumb here.. any ideas what? The input predictor data I believe is as expected, and is handled basically this sort of code:
private final HuggingFaceTokenizer tokenizer;
HuggingFaceTokenizer.newInstance("sentence-transformers/clip-ViT-B-32-multilingual-v1");
@Override
public NDList processInput(TranslatorContext ctx, Input input)
{
Encoding encoding = tokenizer.encode(input.getAsString(0));
return encoding.toNDList(ctx.getNDManager(), false);
}
from djl.
@frankfliu - many thanks for your work on this. I've also updated my djl checkout to get your changes and ran this command:
djl/extensions/tokenizers$ python3 src/main/python/model_zoo_importer.py -m sentence-transformers/clip-ViT-B-32-multilingual-v1
I then copied the ZIP file to the appropriate location so I can run this unit test, and I'm using 0.27.0-SNAPSHOT versions of the DJL libraries.
@Test
public void testMe() throws Exception
{
Criteria<String, float[]> criteria = Criteria.builder()
.setTypes(String.class, float[].class)
.optModelPath(Paths.get("/data/djl-serving/models/clip-ViT-B-32-multilingual-v1.zip"))
.optTranslatorFactory(new DeferredTranslatorFactory())
.optProgress(new ProgressBar()).build();
try (ZooModel<String, float[]> model = criteria.loadModel();
Predictor<String, float[]> predictor = model.newPredictor())
{
System.out.println("Embeddings length is: " + predictor.predict("fast blue car").length);
}
}
However the embeddings length still comes back at 768:
[Test worker] 4993 INFO ai.djl.pytorch.engine.PtEngine - Number of intra-op threads is 14
[Test worker] 5372 INFO ai.djl.translate.DeferredTranslatorFactory - Using TranslatorFactory: ai.djl.huggingface.translator.TextEmbeddingTranslatorFactory
[Test worker] 5375 INFO ai.djl.util.Platform - Found matching platform from: jar:file:/home/sits/.gradle/caches/modules-2/files-2.1/ai.djl.huggingface/tokenizers/0.27.0-SNAPSHOT/10c26f2548c7993477443654c28030d140a443ec/tokenizers-0.27.0-SNAPSHOT.jar!/native/lib/tokenizers.properties
[Test worker] 5379 INFO ai.djl.huggingface.tokenizers.jni.LibUtils - Extracting native/lib/linux-x86_64/libtokenizers.so to cache ...
Embeddings length is: 768
Any ideas what is wrong here?
from djl.
The changes didn't get into 0.27.0 release. You need to use 0.28.0-SNAPSHOT release.
The model zoo model has been updated, you can use "djl://ai.djl.huggingface.pytorch/sentence-transformers/clip-ViT-B-32-multilingual-v1" now. But please make sure clean the cache to get the latest model from model zoo.
from djl.
Thanks @frankfliu - everything is working now as expected.
from djl.
@frankfliu - something has changed in the latest 0.28.0 snapshots, as my test code I posted on this issue is now failing again. Any ideas?
from djl.
Related Issues (20)
- how to limit OnnxRuntime cpu usage? HOT 1
- 0.27调用tensorflow的pb模型崩溃(可以加载模型,推理时崩溃) HOT 12
- TorchScript inference slower than default torch model HOT 4
- [FATAL] extensions/tokenizers/rust/src/lib.rs crashes the process HOT 1
- CUBLAS_STATUS_NOT_INITIALIZED HOT 10
- How to run FLOAT16 OnnxRuntime models HOT 3
- UnsatisfiedLinkError: 'boolean ai.djl.pytorch.jni.PyTorchLibrary.torchIsContiguous(long)' HOT 2
- How can I implement the Adaline perceptron in DJL
- pytorch-model-zoo: PtSsdTranslator.Builder.self() returns null
- TextEmbeddingTranslator fails with "EngineException: Expected all tensors to be on the same device"
- tensorrt 的demo 有吗 yolov8的
- resize diff between java djl and python cv2 HOT 1
- Does Lightgbm support multi-class inference? HOT 1
- ONNX Engine Options Bug, ONNX features cannot be defined,It's a parameter type design problem HOT 4
- [pytorch] UnsatisfiedLinkError on Windows11/Intel HOT 7
- Windows libraries for pytorch-native-cpu are missing for version 2.3.0 from maven central. HOT 1
- ai.djl.nn.transformer IdEmbedding has memory leak. HOT 2
- PaddlePaddle引擎使用paddleocr v4版本的模型识别图像时报错,是还不支持paddleocr的v4模型吗? HOT 2
- TimeSeries API PyTorch Engine support
- TimeSeries API Bugs (frequency, context length, FEAT_DYNAMIC_REAL) HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from djl.