deepjavalibrary / djl Goto Github PK
View Code? Open in Web Editor NEWAn Engine-Agnostic Deep Learning Framework in Java
Home Page: https://djl.ai
License: Apache License 2.0
An Engine-Agnostic Deep Learning Framework in Java
Home Page: https://djl.ai
License: Apache License 2.0
Create normalize class which has two method:
Normalize object must also have parameter - min, max (what is the minimum and maximum number of our number range) and interval (interval: 0 to 1, interval: -1 to 1)
Example:
we have range of real numbers that needs to be normalized: [1, 5, 7, 12, 16, 19, 23, 3, 6, 33]
Normalize class will have:
With all this information, we can normalize number and it will be prepared for train/test model.
Each number, which enter network input as normalized number will have normalized class defined.
On training we can easily de-normalize every number and compare it with our test data set (which also needs to be de-normalized)
Yes, Normalization should be part of data set. Each number in INDArray should have also normalization object. So each number that comes in network input is normalized - 0 to 1 or -1 to 1. Also when network is training and we use listener, we can easily de-normalize number in the data set - predicted numbers can easily be de-normalized and then compared with de-normalized test data set.
Everybody, normalized data set will be simplified with provided normalization/de-normalization of numbers which enters network input model and also when network training is in progress
Example: https://github.com/eclipse/deeplearning4j/blob/b5f0ec072f3fd0da566e32f82c0e43ca36553f39/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/MultiDataNormalization.java
I think this one is not good enough, it is simple normalization
The "BBC Japan" example is great, but doesn't quite give enough info for the user to start experimenting on their own.
paragraph
variable?QAInput(question, paragraph, 384)
- should it be adjusted up or down, and when?predict
?No.
New users.
examples/docs/BERT_question_and_answer.md
Early stopping configuration: Specifies the various configuration options for running training with early stopping.
We can configure when model training will stop, when one of condition above is met.
Training should be implemented as listener, early stop configuration will listen for any conditions above and terminate training.
Everybody, we can easily configure when learning will end.
Reference implementation:
https://github.com/eclipse/deeplearning4j/blob/b5f0ec072f3fd0da566e32f82c0e43ca36553f39/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/EarlyStoppingConfiguration.java
There are other implementation in different NN framework.
Can i use my own pre-trained model for inferencing?
Hello, guys. Could you help me?
When I tried to build app I've got an error
"Unable to load library 'C:\Users\Default.mxnet\cache\1.6.0-b-SNAPSHOT-20200120mkl-win-x86_64\mxnet.dll':
2020-01-23T03:20:04.060+0300 [ERROR] [system.err] %1 is not a valid Win32 application."
Command for build is "./gradlew run -Dmain=ai.djl.examples.training.TrainMnist"
It's strange, because I have the environment showed below:
java version "1.8.0_241"
Java(TM) SE Runtime Environment (build 1.8.0_241-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)
Microsoft Windows [Version 10.0.17763.864]
OS Name Microsoft Windows 10 Enterprise LTSC
Version 10.0.17763 Build 17763
System Type x64-based PC
Gradle 6.0.1
Build time: 2019-11-18 20:25:01 UTC
Revision: fad121066a68c4701acd362daf4287a7c309a0f5
Kotlin: 1.3.50
Groovy: 2.5.8
Ant: Apache Ant(TM) version 1.10.7 compiled on September 1 2019
JVM: 1.8.0_162 (Oracle Corporation 25.162-b12)
OS: Windows 10 10.0 amd64
I'm newish to maven, but I don't think this is normally what is done:
from mvn dependency:tree
[INFO] +- ai.djl:examples:jar:0.2.1:compile
[INFO] | +- commons-cli:commons-cli:jar:1.4:runtime
[INFO] | +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.12.1:runtime
[INFO] | | +- org.apache.logging.log4j:log4j-api:jar:2.12.1:runtime
[INFO] | | \- org.apache.logging.log4j:log4j-core:jar:2.12.1:runtime
Because it gives off warnings like
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/ME/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.12.1/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/ME/.m2/repository/org/slf4j/slf4j-simple/1.7.30/slf4j-simple-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
All the other ai.djl:*
dependencies look ok.
DJL should include a NN web or GUI visualization tool like DL4J to help optimize NN parameters and NN layers.
This will be a new API.
Users who want to monitor training jobs and help determine why their model is not training successfully.
To quickly compile the artifact sometimes I want to skip tests while building, but this is prohibited in by the gradle task ':integration:jacocoTestReport'.
The project should be able to compile without any test
$ ./gradlew build -x test
> Configure project :basicdataset
GPU 0: GeForce RTX 2070 (UUID: GPU-ccda497c-7a55-7df7-49ad-b68e31743286)
> Configure project :examples
GPU 0: GeForce RTX 2070 (UUID: GPU-ccda497c-7a55-7df7-49ad-b68e31743286)
> Configure project :integration
GPU 0: GeForce RTX 2070 (UUID: GPU-ccda497c-7a55-7df7-49ad-b68e31743286)
> Configure project :mxnet:mxnet-engine
[WARN ] Header file has been changed in open source project: mxnet/c_api.h.
> Task :integration:jacocoTestReport FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':integration:jacocoTestReport'.
> Unable to read execution data file /home/peng/git-release/djl/integration/build/jacoco/test.exec
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
* Get more help at https://help.gradle.org
Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.0/userguide/command_line_interface.html#sec:command_line_warnings
BUILD FAILED in 1m 4s
66 actionable tasks: 32 executed, 34 up-to-date
just running ./gradlew build -x test
under project
(Paste the commands you ran that produced the error.)
./gradlew build -x test
Please provide the following information:
I tried to run this demo to get familiar with djl, but could not run training on GPU because I have CUDA 10.2 installed:
Caused by: java.lang.UnsatisfiedLinkError: Unable to load library '/home/andrej/.mxnet/cache/1.6.0-a-20191127cu101mkl-linux-x86_64/libmxnet.so':
libcudart.so.10.1: cannot open shared object file: No such file or directory
libcudart.so.10.1: cannot open shared object file: No such file or directory
Native library (home/andrej/.mxnet/cache/1.6.0-a-20191127cu101mkl-linux-x86_64/libmxnet.so) not found in resource path (/home/andrej/projects/djl-demo/build/classes/java/main:/home/andrej/projects/djl-demo/build/resources/main:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl.mxnet/mxnet-model-zoo/0.2.0/355dfb3163430430f25c7e86267fc5c621276179/mxnet-model-zoo-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl.mxnet/mxnet-engine/0.2.0/7841bd1c3fc2f44fe76cb2e8a083dfead4de3f9a/mxnet-engine-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/net.java.dev.jna/jna/5.3.0/4654d1da02e4173ba7b64f7166378847db55448a/jna-5.3.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/org.apache.httpcomponents/httpcore/4.4.12/21ebaf6d532bc350ba95bd81938fa5f0e511c132/httpcore-4.4.12.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/commons-cli/commons-cli/1.4/c51c00206bb913cd8612b24abd9fa98ae89719b1/commons-cli-1.4.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-simple/1.7.29/82ae07f95088577987a15d90171de12b00d81847/slf4j-simple-1.7.29.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/org.apache.commons/commons-csv/1.7/cb5d05520f8fe1b409aaf29962e47dc5764f8f39/commons-csv-1.7.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl/basicdataset/0.2.0/fa73e42fb774b56f23a030a6c95159a1987d8110/basicdataset-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl/model-zoo/0.2.0/dbe300ddc19ec809002ed9a6214dac11e39a1055/model-zoo-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl/repository/0.2.0/266c3a327e89b82234c03a713f05067567c2e9dd/repository-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/com.google.code.gson/gson/2.8.5/f645ed69d595b24d4cf8b3fbb64cc505bede8829/gson-2.8.5.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl/api/0.2.0/c83672c1e7178830ea9c43b98603d5fa7737fd78/api-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl.mxnet/mxnet-native-cu101mkl/1.6.0-a/c67432f4f6ba4273a13c3f9efff52e5f2710c888/mxnet-native-cu101mkl-1.6.0-a-linux-x86_64.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-api/1.7.29/e56bf4473a4c6b71c7dd397a833dce86d1993d9d/slf4j-api-1.7.29.jar)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:302)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:455)
at com.sun.jna.Library$Handler.<init>(Library.java:192)
at com.sun.jna.Native.load(Native.java:596)
at com.sun.jna.Native.load(Native.java:570)
at ai.djl.mxnet.jna.LibUtils.loadLibrary(LibUtils.java:80)
at ai.djl.mxnet.jna.JnaUtils.<clinit>(JnaUtils.java:68)
at ai.djl.mxnet.engine.MxEngine.<init>(MxEngine.java:36)
at ai.djl.mxnet.engine.MxEngineProvider.<clinit>(MxEngineProvider.java:21)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:779)
... 9 more
Suppressed: java.lang.UnsatisfiedLinkError: libcudart.so.10.1: cannot open shared object file: No such file or directory
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:191)
... 22 more
CUDA 10.2 seems to be mainstream at the moment and it's easier to find installation instructions for this version.
Rolling back to CUDA 10.1 might be troublesome and will repel some users from using this library.
No
All potential users of this djl including myself
Hi,
This is not really a feature request but I thought it would be easy to reach out to the DJL community here, so please close this issue whenever you wish.
I represent the official group responsible of maintaining and enhancing the support of TensorFlow on the JVM. We have just heard of your initiative and we are very excited about it. We would like to know if there is anything we can do to help with the integration of TensorFlow in DJL.
Also, we would like to open a discussion about the NDArray standardization. There is already a few implementations of this interface available on the market (e.g. MXNet has one, DL4J has one, we have just created one and now AWS has one). To improve portability between various frameworks and libraries, we believe that such an interface should eventually end up in the JDK itself and it would be a good candidate for a JSR/JEP. It would be interesting to see all parties actually involved in the development of a "NumPy equivalent" for Java to agree on a common interface that could then be proposed to the Java community, on top of which higher-level APIs can be built.
If you are interested, it is possible to reach us directly on one the following channels:
Google Group: [email protected]
Gitter: tensorflow/sig-jvm
GitHub: https://github.com/tensorflow/java
Thanks, hoping to hear from you soon,
Karl
TrainWithOptimizers throws TrainingDivergedException
Possibly caused by Metric name not found: epoch
Run most things in ai.djl.examples.training.* out of the box.
mymac:examples me $ ./gradlew run -Dmain=ai.djl.examples.training.TrainWithOptimizers
> Task :run
[INFO ] - Running ExampleTrainingListener on: cpu(0).
[INFO ] - Load library 1.6.0 in 0.106 ms.
Training: 0% |โ | Accuracy: 0.19, SoftmaxCrossEntropyLoss: 4.72, speed: 8.18 images/sec [INFO ] - Training: 1562 batches15s]
[INFO ] - Validation: 312 batches
[INFO ] - train P50: 3914.031 ms, P90: 3914.031 ms
[INFO ] - forward P50: 20.862 ms, P90: 21.610 ms
[INFO ] - training-metrics P50: 3606.591 ms, P90: 5841.697 ms
[INFO ] - backward P50: 8.703 ms, P90: 11.023 ms
[INFO ] - step P50: 39.751 ms, P90: 39.751 ms
Exception in thread "main" ai.djl.TrainingDivergedException: The Loss became NaN, try reduce learning rate,add clipGradient option to your optimizer, check input data and loss calculation.
at ai.djl.examples.training.util.ExampleTrainingListener.onTrainingBatch(ExampleTrainingListener.java:87)
at ai.djl.mxnet.engine.MxTrainer.lambda$trainBatch$3(MxTrainer.java:142)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
at ai.djl.mxnet.engine.MxTrainer.trainBatch(MxTrainer.java:142)
at ai.djl.examples.training.util.TrainingUtils.fit(TrainingUtils.java:47)
at ai.djl.examples.training.TrainWithOptimizers.runExample(TrainWithOptimizers.java:107)
at ai.djl.examples.training.TrainWithOptimizers.main(TrainWithOptimizers.java:69)
Suppressed: java.lang.IllegalArgumentException: Metric name not found: epoch
at ai.djl.metric.Metrics.percentile(Metrics.java:135)
at ai.djl.examples.training.util.ExampleTrainingListener.onTrainingEnd(ExampleTrainingListener.java:168)
at ai.djl.mxnet.engine.MxTrainer.lambda$close$10(MxTrainer.java:344)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
at ai.djl.mxnet.engine.MxTrainer.close(MxTrainer.java:344)
at ai.djl.examples.training.TrainWithOptimizers.runExample(TrainWithOptimizers.java:96)
... 1 more
> Task :run FAILED
FAILURE: Build failed with an exception.
git clone from repository, then start running examples.
./gradlew run -Dmain=ai.djl.examples.training.TrainWithOptimizers
Please provide the following information:
ND4J, the accelerated linear algebra backend that powers Eclipse DeepLearning4J, should have all the necessary features to become a proper backend for DJL.
No API changes should be necessary for the basics. If ND4J provides any opportunity an API enhancement we can discuss those as a separate issue.
DL4J has support for more native architectures (including ARM64 (Raspberry Pi/iOS/Android) and OpenPower) so it will allow people with those devices to take advantage of DJL. DL4J supports importing Keras, Tensorflow and ONNX models so if we design this correctly we can expand the types of pre-trained models the users can use.
Mish is a novel activation function proposed in this paper.
It has shown promising results so far and has been adopted in several packages including:
All benchmarks, analysis and links to official package implementations can be found in this repository
Mish also was recently used for a submission on the Stanford DAWN Cifar-10 Training Time Benchmark where it obtained 94% accuracy in just 10.7 seconds which is the current best score on 4 GPU and second fastest overall. Additionally, Mish has shown to improve convergence rate by requiring less epochs. Reference -
Mish also has shown consistent improved ImageNet scores and is more robust. Reference -
Additional ImageNet benchmarks along with Network architectures and weights are avilable on my repository.
Summary of Vision related results:
It would be nice to have Mish as an option within the activation function group.
This is the comparison of Mish with other conventional activation functions in a SEResNet-50 for CIFAR-10:
I am working on an implementation of the covid19-detection example code, but using Quarkus to serve the model and also support GraalVM Native Image.
The project is located here:
https://github.com/murphye/djl-demo/tree/master/covid19-detection-quarkus
The application runs fine on the JVM, but when running in native mode, the TensorFlow libraries are not being loaded (i.e. System.loadLibrary
).
Important: I cannot find a reference in the DJL code for System.loadLibrary
and I do not understand how the TensorFlow libraries are actually loaded. If I better understood how the mechanism worked, I could better diagnose it. It does seem to be related to Bytedeco which I am not familiar with.
Here is the code that I am running to demonstrate the issue:
LibUtils.loadLibrary(); // Forcing the library to load to demo error
System.out.println("Library Path: " + System.getProperty("org.bytedeco.javacpp.platform.preloadpath"));
// See if TF loaded correctly or not. If not, expect java.lang.UnsatisfiedLinkError
TfEngine.getInstance().debugEnvironment();
Here is the output of the error when running this code. The TF libraries are downloaded and placed in /Users/ermurphy/.tensorflow/cache/2.1.0-a-SNAPSHOT-cpu-osx-x86_64
correctly.
__ ____ __ _____ ___ __ ____ ______
--/ __ \/ / / / _ | / _ \/ //_/ / / / __/
-/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/
2020-05-11 19:22:10,136 INFO [io.quarkus] (main) covid19-detection-quarkus 1.0-SNAPSHOT (powered by Quarkus 1.4.2.Final) started in 0.056s. Listening on: http://0.0.0.0:8080
2020-05-11 19:22:10,137 INFO [io.quarkus] (main) Profile prod activated.
2020-05-11 19:22:10,137 INFO [io.quarkus] (main) Installed features: [cdi, resteasy, resteasy-jackson]
2020-05-11 19:22:23,584 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libstdc++.6.dylib ...
2020-05-11 19:22:24,752 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libjnitensorflow.dylib ...
2020-05-11 19:22:24,987 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libgcc_s.1.dylib ...
2020-05-11 19:22:25,203 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading THIRD_PARTY_TF_JNI_LICENSES ...
2020-05-11 19:22:25,448 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libtensorflow.2.dylib ...
2020-05-11 19:22:37,506 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libjnimklml.dylib ...
2020-05-11 19:22:37,656 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libiomp5.dylib ...
2020-05-11 19:22:38,182 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libmkldnn.0.dylib ...
2020-05-11 19:22:38,934 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading LICENSE ...
2020-05-11 19:22:39,038 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libmklml.dylib ...
2020-05-11 19:22:42,655 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libjnimkldnn.dylib ...
2020-05-11 19:22:42,833 INFO [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libgomp.1.dylib ...
Library Path: /Users/ermurphy/.tensorflow/cache/2.1.0-a-SNAPSHOT-cpu-osx-x86_64
2020-05-11 19:22:42,979 INFO [ai.djl.eng.Engine] (executor-thread-1) Engine name: TensorFlow
2020-05-11 19:22:42,980 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (executor-thread-1) HTTP Request to /predict failed, error id: 45121a3e-fbf8-4684-911c-4e9250ed8f41-1: java.lang.UnsatisfiedLinkError: org.tensorflow.internal.c_api.global.tensorflow.TF_Version()Lorg/bytedeco/javacpp/BytePointer; [symbol: Java_org_tensorflow_internal_c_1api_global_tensorflow_TF_1Version or Java_org_tensorflow_internal_c_1api_global_tensorflow_TF_1Version__]
at com.oracle.svm.jni.access.JNINativeLinkage.getOrFindEntryPoint(JNINativeLinkage.java:145)
at com.oracle.svm.jni.JNIGeneratedMethodSupport.nativeCallAddress(JNIGeneratedMethodSupport.java:57)
at org.tensorflow.internal.c_api.global.tensorflow.TF_Version(tensorflow.java)
at org.tensorflow.TensorFlow.version(TensorFlow.java:37)
at ai.djl.tensorflow.engine.TfEngine.getVersion(TfEngine.java:64)
at ai.djl.engine.Engine.debugEnvironment(Engine.java:171)
at com.examples.ExampleService.<init>(ExampleService.java:42)
Running in GraalVM Native Image executable, the libaries should be loaded and usable through JNI bridge. I have proven this in the past with this PoC:
https://github.com/murphye/quarkus-tensorflow-inception/blob/master/src/main/java/io/quarkus/tensorflow/LoadTensorFlow.java
I need some guidance on how TensorFlow is loaded in DJL if it's not using System.loadLibrary
as shown here:
https://github.com/murphye/quarkus-tensorflow-inception/blob/master/src/main/java/io/quarkus/tensorflow/LoadTensorFlow.java#L98
How else does the TensorFlow library get loaded, and how can I further diagnose the issue when running in Native mode?
We want to have multi array labels/prediction. Currently API can handle only one label (first element in NDArray). Obvious examples are Multi digit number recognition, where we can predict multiple digits from provided input. Example of this: https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/captcharecognition/MultiDigitNumberRecognition.java
Also there are many cases when we want multiple array labels, especially when dealing with just numbers (outputs) and predictions.
There is small thread about this (python):
https://datascience.stackexchange.com/questions/23614/keras-multiple-softmax-in-last-layer-possible
Documentation for this is very hidden or non existent - the same is with examples. Everyone is using just single array label, multiple classes (out neurons):
[0.2, 0.3, 0.6,...0.12] -> 0.6
We want to have multiple array labels, multiple classes (out neurons):
[0.2, 0.3, 0.6,...0.12] -> 0.6
[0.4, 0.2, 0.7,...0.88] -> 0.88
[0.11, 0.77, 0.55,...0.33] -> 0.77
:
:
Hopefully only new loss function (class) can be implemented, which will calculate multi array labels (Split tensor to parts, then compute softmax separately per part and concatenate tensor parts at end) - SoftmaxCrossEntropyLossMulti.
can you provide an LSTM example code with training and inference, preferably with time series data
no
any one who wants to build an LSTM model using the api
Please add the ability to load modules from TensorFlow Hub. For example, I'd want to be able to download, save and use their universal-sentence-encoder model.
Implement training as listener, so we can subscribe to training listener. We need more information to each of the provided listeners:
In this way we can do whatever we want:
Yes, training should be implemented as listener, so we can subscribe to listener.
Interface should have:
onStart
onEpoch
onCompletion
Everybody, subscribe to the listener and we have all information we need
Reference implementation: https://github.com/eclipse/deeplearning4j/blob/b5f0ec072f3fd0da566e32f82c0e43ca36553f39/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/listener/EarlyStoppingListener.java
but we need more information inside each of onStart, onEpoch and onCompletion method - see above
When running training examples in multi-GPU, I came across CUDA illegal memory access for examples with LSTM operators
at ai.djl.examples.training.TrainSeq2Seq.runExample(TrainSeq2Seq.java:107)
at ai.djl.examples.training.TrainSeq2Seq.main(TrainSeq2Seq.java:64)
Suppressed: java.lang.IllegalArgumentException: Metric name not found: step
at ai.djl.metric.Metrics.percentile(Metrics.java:135)
at ai.djl.training.listener.LoggingTrainingListener.onTrainingEnd(LoggingTrainingListener.java:167)
at ai.djl.training.Trainer.lambda$close$5(Trainer.java:348)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at ai.djl.training.Trainer.close(Trainer.java:348)
at ai.djl.examples.training.TrainSeq2Seq.runExample(TrainSeq2Seq.java:119)
... 1 more
Suppressed: ai.djl.engine.EngineException: MXNet engine call failed: cuDNN: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) : CUDNN_STATUS_INTERNAL_ERROR
Stack trace:
File "src/operator/./rnn-inl.h", line 768
at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1788)
at ai.djl.mxnet.jna.JnaUtils.waitAll(JnaUtils.java:466)
at ai.djl.mxnet.engine.MxModel.close(MxModel.java:161)
at ai.djl.examples.training.TrainSeq2Seq.runExample(TrainSeq2Seq.java:122)
... 1 more
[18:23:19] src/resource.cc:230: Ignore CUDA Error [18:23:19] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:73: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered
[18:23:19] src/resource.cc:279: Ignore CUDA Error [18:23:19] src/storage/./pooled_storage_manager.h:97: CUDA: an illegal memory access was encountered
[18:23:19] src/resource.cc:279: Ignore CUDA Error [18:23:19] src/storage/./pooled_storage_manager.h:97: CUDA: an illegal memory access was encountered
[18:23:19] src/engine/threaded_engine_perdevice.cc:275: Ignore CUDA Error [18:23:19] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered
We can reproduce by running the TrainSeq2Seq example on a machine with more than 1 GPU.
(Paste the commands you ran that produced the error.)
Operations on MxNDArray
that take a single Number
argument truncate the arguments decimal places which leads to erroneous calculation results. This seems to be due to a German Locale setting of the host.
E.g.:
System.out.println(NDManager.newBaseManager().create(1.3).add(0.7));
prints:
ND: () gpu(0) float64
1.3
This seems to affect all math operations with a single Number
argument like add
, gt
, gte
, lt
etc.
float
and double
values are passed correctly, mathematical operations yield correct results. The above line should print:
ND: () gpu(0) float64
2.0
N/A
An example of the error with the console output on a German system can be found here:
https://gist.github.com/chenkelmann/2bfa9627d79a9aaab34a46227d81aea5
Run the main method in the above example on a Linux system with LC_NUMERIC=de_DE.UTF-8
The problem can be circumvented by creating an NDArray with the argument instead of using the methods that take Number
:
System.out.println(manager.create(1.3).add(manager.create(new double[]{0.7})));
Setting LC_NUMERIC="en_US.UTF-8" for the current process also fixes the issue (but is very fragile, as the correct working of the code depends on the current environment variables...)
Please run the command ./gradlew debugEnv
from the root directory of DJL (if necessary, clone DJL first). It will output information about your system, environment, and installation that can help us debug your issue. Paste the output of the command below:
[INFO ] - ----------System Properties----------
[INFO ] - sun.cpu.isalist:
[INFO ] - sun.desktop: gnome
[INFO ] - sun.io.unicode.encoding: UnicodeLittle
[INFO ] - sun.cpu.endian: little
[INFO ] - java.vendor.url.bug: http://bugreport.sun.com/bugreport/
[INFO ] - file.separator: /
[INFO ] - java.vendor: Private Build
[INFO ] - sun.boot.class.path: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/classes
[INFO ] - java.ext.dirs: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext:/usr/java/packages/lib/ext
[INFO ] - java.version: 1.8.0_242
[INFO ] - java.vm.info: mixed mode
[INFO ] - awt.toolkit: sun.awt.X11.XToolkit
[INFO ] - org.apache.logging.log4j.assignedSequences: 8786
[INFO ] - user.language: en
[INFO ] - java.specification.vendor: Oracle Corporation
[INFO ] - sun.java.command: ai.djl.integration.util.DebugEnvironment
[INFO ] - java.home: /usr/lib/jvm/java-8-openjdk-amd64/jre
[INFO ] - sun.arch.data.model: 64
[INFO ] - java.vm.specification.version: 1.8
[INFO ] - java.class.path: /home/christoph/IdeaProjects/djl/integration/build/classes/java/main:/home/christoph/IdeaProjects/djl/integration/build/resources/main:/home/christoph/.gradle/caches/modules-2/files-2.1/commons-cli/commons-cli/1.4/c51c00206bb913cd8612b24abd9fa98ae89719b1/commons-cli-1.4.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.apache.logging.log4j/log4j-slf4j-impl/2.12.1/14973e22497adaf0196d481fb99c5dc2a0b58d41/log4j-slf4j-impl-2.12.1.jar:/home/christoph/IdeaProjects/djl/basicdataset/build/libs/basicdataset-0.5.0-SNAPSHOT.jar:/home/christoph/IdeaProjects/djl/model-zoo/build/libs/model-zoo-0.5.0-SNAPSHOT.jar:/home/christoph/IdeaProjects/djl/testing/build/libs/testing-0.5.0-SNAPSHOT.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.testng/testng/6.8.1/8aebea980eee079365df20f0cf7fcac900d50250/testng-6.8.1.jar:/home/christoph/IdeaProjects/djl/mxnet/mxnet-model-zoo/build/libs/mxnet-model-zoo-0.5.0-SNAPSHOT.jar:/home/christoph/IdeaProjects/djl/mxnet/mxnet-engine/build/libs/mxnet-engine-0.5.0-SNAPSHOT.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/ai.djl.mxnet/mxnet-native-auto/1.7.0-a-SNAPSHOT/a65beb2ad0ce1f49012bda3e5898979320278027/mxnet-native-auto-1.7.0-a-SNAPSHOT.jar:/home/christoph/IdeaProjects/djl/api/build/libs/api-0.5.0-SNAPSHOT.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-api/1.7.26/77100a62c2e6f04b53977b9f541044d7d722693d/slf4j-api-1.7.26.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.apache.logging.log4j/log4j-core/2.12.1/4382e93136c06bfb34ddfa0bb8a9fb4ea2f3df59/log4j-core-2.12.1.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.apache.logging.log4j/log4j-api/2.12.1/a55e6d987f50a515c9260b0451b4fa217dc539cb/log4j-api-2.12.1.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.beanshell/bsh/2.0b4/a05f0a0feefa8d8467ac80e16e7de071489f0d9c/bsh-2.0b4.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/com.beust/jcommander/1.27/58c9cbf0f1fa296f93c712f2cf46de50471920f9/jcommander-1.27.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.yaml/snakeyaml/1.6/a1e23e31c424d566ee27382e373d73a28fdabd88/snakeyaml-1.6.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/com.google.code.gson/gson/2.8.5/f645ed69d595b24d4cf8b3fbb64cc505bede8829/gson-2.8.5.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/net.java.dev.jna/jna/5.3.0/4654d1da02e4173ba7b64f7166378847db55448a/jna-5.3.0.jar
[INFO ] - user.name: christoph
[INFO ] - file.encoding: UTF-8
[INFO ] - java.specification.version: 1.8
[INFO ] - java.awt.printerjob: sun.print.PSPrinterJob
[INFO ] - user.timezone: Europe/Berlin
[INFO ] - user.home: /home/christoph
[INFO ] - os.version: 5.3.0-46-generic
[INFO ] - sun.management.compiler: HotSpot 64-Bit Tiered Compilers
[INFO ] - java.specification.name: Java Platform API Specification
[INFO ] - java.class.version: 52.0
[INFO ] - java.library.path: /usr/local/cuda/lib64::/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
[INFO ] - sun.jnu.encoding: UTF-8
[INFO ] - os.name: Linux
[INFO ] - user.variant:
[INFO ] - java.vm.specification.vendor: Oracle Corporation
[INFO ] - java.io.tmpdir: /tmp
[INFO ] - line.separator:
[INFO ] - java.endorsed.dirs: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/endorsed
[INFO ] - os.arch: amd64
[INFO ] - java.awt.graphicsenv: sun.awt.X11GraphicsEnvironment
[INFO ] - java.runtime.version: 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
[INFO ] - java.vm.specification.name: Java Virtual Machine Specification
[INFO ] - user.dir: /home/christoph/IdeaProjects/djl/integration
[INFO ] - sun.java.launcher: SUN_STANDARD
[INFO ] - user.country: US
[INFO ] - sun.os.patch.level: unknown
[INFO ] - java.vm.name: OpenJDK 64-Bit Server VM
[INFO ] - file.encoding.pkg: sun.io
[INFO ] - path.separator: :
[INFO ] - java.vm.vendor: Private Build
[INFO ] - java.vendor.url: http://java.oracle.com/
[INFO ] - sun.boot.library.path: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64
[INFO ] - java.vm.version: 25.242-b08
[INFO ] - java.runtime.name: OpenJDK Runtime Environment
[INFO ] -
[INFO ] - ----------Environment Variables----------
[INFO ] - PATH: /usr/local/cuda/bin:/home/christoph/.local/bin:/home/christoph/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
[INFO ] - LC_MEASUREMENT: de_DE.UTF-8
[INFO ] - XAUTHORITY: /home/christoph/.Xauthority
[INFO ] - LC_TELEPHONE: de_DE.UTF-8
[INFO ] - XDG_DATA_DIRS: /usr/share/cinnamon:/usr/share/gnome:/home/christoph/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
[INFO ] - GDMSESSION: cinnamon
[INFO ] - DBUS_SESSION_BUS_ADDRESS: unix:path=/run/user/1000/bus
[INFO ] - XDG_CURRENT_DESKTOP: X-Cinnamon
[INFO ] - SSH_AGENT_PID: 1493
[INFO ] - COLORTERM: truecolor
[INFO ] - LD_LIBRARY_PATH: /usr/local/cuda/lib64:
[INFO ] - LC_PAPER: de_DE.UTF-8
[INFO ] - SESSION_MANAGER: local/bishop:@/tmp/.ICE-unix/1428,unix/bishop:/tmp/.ICE-unix/1428
[INFO ] - LOGNAME: christoph
[INFO ] - PWD: /home/christoph/IdeaProjects/djl
[INFO ] - LANGUAGE: en_US
[INFO ] - GJS_DEBUG_TOPICS: JS ERROR;JS LOG
[INFO ] - SHELL: /bin/bash
[INFO ] - LESSOPEN: | /usr/bin/lesspipe %s
[INFO ] - LC_ADDRESS: de_DE.UTF-8
[INFO ] - OLDPWD: /home/christoph/IdeaProjects/djl
[INFO ] - GNOME_DESKTOP_SESSION_ID: this-is-deprecated
[INFO ] - GNOME_TERMINAL_SCREEN: /org/gnome/Terminal/screen/cdbd2b41_45b6_4c94_aa70_241e01b6353f
[INFO ] - GTK_MODULES: gail:atk-bridge
[INFO ] - XDG_SESSION_PATH: /org/freedesktop/DisplayManager/Session0
[INFO ] - XDG_SESSION_DESKTOP: cinnamon
[INFO ] - LS_COLORS: rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
[INFO ] - SHLVL: 1
[INFO ] - LC_IDENTIFICATION: de_DE.UTF-8
[INFO ] - LESSCLOSE: /usr/bin/lesspipe %s %s
[INFO ] - LC_MONETARY: de_DE.UTF-8
[INFO ] - TERM: xterm-256color
[INFO ] - XDG_CONFIG_DIRS: /etc/xdg/xdg-cinnamon:/etc/xdg
[INFO ] - GNOME_TERMINAL_SERVICE: :1.84
[INFO ] - LANG: en_US.UTF-8
[INFO ] - XDG_SEAT_PATH: /org/freedesktop/DisplayManager/Seat0
[INFO ] - XDG_SESSION_ID: c2
[INFO ] - XDG_SESSION_TYPE: x11
[INFO ] - DISPLAY: :0
[INFO ] - CINNAMON_VERSION: 4.4.8
[INFO ] - LC_NAME: de_DE.UTF-8
[INFO ] - _: ./gradlew
[INFO ] - GDM_LANG: en_US
[INFO ] - XDG_GREETER_DATA_DIR: /var/lib/lightdm-data/christoph
[INFO ] - GPG_AGENT_INFO: /run/user/1000/gnupg/S.gpg-agent:0:1
[INFO ] - DESKTOP_SESSION: cinnamon
[INFO ] - USER: christoph
[INFO ] - VTE_VERSION: 5202
[INFO ] - QT_ACCESSIBILITY: 1
[INFO ] - LC_NUMERIC: de_DE.UTF-8
[INFO ] - GJS_DEBUG_OUTPUT: stderr
[INFO ] - SSH_AUTH_SOCK: /run/user/1000/keyring/ssh
[INFO ] - XDG_SEAT: seat0
[INFO ] - GTK_OVERLAY_SCROLLING: 1
[INFO ] - QT_QPA_PLATFORMTHEME: qt5ct
[INFO ] - XDG_VTNR: 7
[INFO ] - XDG_RUNTIME_DIR: /run/user/1000
[INFO ] - HOME: /home/christoph
[INFO ] -
[INFO ] - ----------Default Engine----------
(the output did not print anything about the Default Engine, it hangs for minutes after that, I had to abort)
The output of locale
is:
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=de_DE.UTF-8
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=de_DE.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=de_DE.UTF-8
LC_NAME=de_DE.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=de_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_ALL=
The culprit is LC_NUMERIC
. If set to en_US.UTF-8
the calculations yield the expected result.
There is a bug when trying to get "-1" from NDIndex, can be reproduced by change the following test:
@Test
public void testGet() {
try (NDManager manager = NDManager.newBaseManager()) {
NDArray original = manager.create(new float[] {1f, 2f, 3f, 4f}, new Shape(2, 2));
Assert.assertEquals(original.get(new NDIndex()), original);
original.get("-1");
Work around:
use get(":-1")
works
strack trace:
[ERROR] - Test ai.djl.integration.tests.ndarray.NDArrayOtherOpTest.testGet FAILED
[ERROR] -
ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Check failed: dshape[axes[i]] == 1 (0 vs. 1) : cannot select an axis to squeeze out which has size=0 not equal to one
Stack trace:
File "../src/operator/numpy/np_matrix_op.cc", line 438
at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1788) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
at ai.djl.mxnet.jna.JnaUtils.imperativeInvoke(JnaUtils.java:500) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
at ai.djl.mxnet.jna.FunctionInfo.invoke(FunctionInfo.java:82) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
at ai.djl.mxnet.jna.FunctionInfo.invoke(FunctionInfo.java:66) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
at ai.djl.mxnet.engine.MxNDManager.invoke(MxNDManager.java:319) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
at ai.djl.mxnet.engine.MxNDManager.invoke(MxNDManager.java:337) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
at ai.djl.mxnet.engine.MxNDArray.squeeze(MxNDArray.java:1200) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
at ai.djl.mxnet.engine.MxNDArray.get(MxNDArray.java:431) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
at ai.djl.ndarray.NDArray.get(NDArray.java:500) ~[api-0.6.0-SNAPSHOT.jar:?]
at ai.djl.integration.tests.ndarray.NDArrayOtherOpTest.testGet(NDArrayOtherOpTest.java:33) ~[main/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_231]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_231]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
at ai.djl.integration.IntegrationTest$TestClass.runTest(IntegrationTest.java:350) [main/:?]
at ai.djl.integration.IntegrationTest.runTests(IntegrationTest.java:111) [main/:?]
at ai.djl.integration.IntegrationTest.runTests(IntegrationTest.java:80) [main/:?]
at ai.djl.integration.IntegrationTests.runIntegrationTests(IntegrationTests.java:23) [test/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_231]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_231]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84) [testng-6.8.1.jar:?]
at org.testng.internal.Invoker.invokeMethod(Invoker.java:714) [testng-6.8.1.jar:?]
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901) [testng-6.8.1.jar:?]
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231) [testng-6.8.1.jar:?]
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127) [testng-6.8.1.jar:?]
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111) [testng-6.8.1.jar:?]
at org.testng.TestRunner.privateRun(TestRunner.java:767) [testng-6.8.1.jar:?]
at org.testng.TestRunner.run(TestRunner.java:617) [testng-6.8.1.jar:?]
at org.testng.SuiteRunner.runTest(SuiteRunner.java:334) [testng-6.8.1.jar:?]
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329) [testng-6.8.1.jar:?]
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291) [testng-6.8.1.jar:?]
at org.testng.SuiteRunner.run(SuiteRunner.java:240) [testng-6.8.1.jar:?]
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) [testng-6.8.1.jar:?]
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) [testng-6.8.1.jar:?]
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224) [testng-6.8.1.jar:?]
at org.testng.TestNG.runSuitesLocally(TestNG.java:1149) [testng-6.8.1.jar:?]
at org.testng.TestNG.run(TestNG.java:1057) [testng-6.8.1.jar:?]
at org.gradle.api.internal.tasks.testing.testng.TestNGTestClassProcessor.runTests(TestNGTestClassProcessor.java:141) [gradle-testing-jvm-6.4.1.jar:6.4.1]
at org.gradle.api.internal.tasks.testing.testng.TestNGTestClassProcessor.stop(TestNGTestClassProcessor.java:90) [gradle-testing-jvm-6.4.1.jar:6.4.1]
at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:61) [gradle-testing-base-6.4.1.jar:6.4.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_231]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_231]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) [gradle-messaging-6.4.1.jar:6.4.1]
at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) [gradle-messaging-6.4.1.jar:6.4.1]
at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) [gradle-messaging-6.4.1.jar:6.4.1]
at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) [gradle-messaging-6.4.1.jar:6.4.1]
at com.sun.proxy.$Proxy2.stop(Unknown Source) [?:?]
at org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:132) [gradle-testing-base-6.4.1.jar:6.4.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_231]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_231]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) [gradle-messaging-6.4.1.jar:6.4.1]
at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) [gradle-messaging-6.4.1.jar:6.4.1]
at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) [gradle-messaging-6.4.1.jar:6.4.1]
at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164) [gradle-messaging-6.4.1.jar:6.4.1]
at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:413) [gradle-messaging-6.4.1.jar:6.4.1]
at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) [gradle-base-services-6.4.1.jar:6.4.1]
at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) [gradle-base-services-6.4.1.jar:6.4.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231]
at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) [gradle-base-services-6.4.1.jar:6.4.1]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]
Unclear which dependencies are required to run the examples, because the examples are hosted in the same module.
Example pom.xml that allows an example to run.
Exception in thread "main" java.util.ServiceConfigurationError: ai.djl.engine.EngineProvider: Provider ai.djl.mxnet.engine.MxEngineProvider could not be instantiated
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:581)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:803)
at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:721)
at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1394)
at ai.djl.engine.Engine.initEngine(Engine.java:47)
at ai.djl.engine.Engine.<clinit>(Engine.java:42)
at ai.djl.Model.newInstance(Model.java:69)
at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:96)
at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:85)
at ai.djl.repository.zoo.ModelLoader.loadModel(ModelLoader.java:84)
at org.example.HelloKt.predictAnswer(Hello.kt:48)
at org.example.HelloKt.main(Hello.kt:72)
at org.example.HelloKt.main(Hello.kt)
Caused by: java.lang.UnsatisfiedLinkError: Unable to load library 'mxnet':
dlopen(libmxnet.dylib, 9): image not found
dlopen(libmxnet.dylib, 9): image not found
Native library (darwin/libmxnet.dylib) not found in resource path (/Users/MYNAME/Desktop/workspace/untitled/target/classes:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk8/1.3.61/kotlin-stdlib-jdk8-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib/1.3.61/kotlin-stdlib-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-common/1.3.61/kotlin-stdlib-common-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/annotations/13.0/annotations-13.0.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk7/1.3.61/kotlin-stdlib-jdk7-1.3.61.jar:/Users/MYNAME/.m2/repository/ai/djl/api/0.2.1/api-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/basicdataset/0.2.1/basicdataset-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/repository/0.2.1/repository-0.2.1.jar:/Users/MYNAME/.m2/repository/com/google/code/gson/gson/2.8.5/gson-2.8.5.jar:/Users/MYNAME/.m2/repository/ai/djl/model-zoo/0.2.1/model-zoo-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/mxnet/mxnet-model-zoo/0.2.1/mxnet-model-zoo-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/mxnet/mxnet-engine/0.2.1/mxnet-engine-0.2.1.jar:/Users/MYNAME/.m2/repository/net/java/dev/jna/jna/5.3.0/jna-5.3.0.jar:/Users/MYNAME/.m2/repository/org/slf4j/slf4j-api/1.7.30/slf4j-api-1.7.30.jar:/Users/MYNAME/.m2/repository/org/slf4j/slf4j-simple/1.7.30/slf4j-simple-1.7.30.jar)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:302)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:455)
at com.sun.jna.Library$Handler.<init>(Library.java:192)
at com.sun.jna.Native.load(Native.java:596)
at com.sun.jna.Native.load(Native.java:570)
at ai.djl.mxnet.jna.LibUtils.loadLibrary(LibUtils.java:80)
at ai.djl.mxnet.jna.JnaUtils.<clinit>(JnaUtils.java:68)
at ai.djl.mxnet.engine.MxEngine.<init>(MxEngine.java:36)
at ai.djl.mxnet.engine.MxEngineProvider.<clinit>(MxEngineProvider.java:21)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:779)
... 11 more
Suppressed: java.lang.UnsatisfiedLinkError: dlopen(libmxnet.dylib, 9): image not found
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:191)
... 24 more
Suppressed: java.lang.UnsatisfiedLinkError: dlopen(libmxnet.dylib, 9): image not found
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:204)
... 24 more
Suppressed: java.io.IOException: Native library (darwin/libmxnet.dylib) not found in resource path (/Users/MYNAME/Desktop/workspace/untitled/target/classes:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk8/1.3.61/kotlin-stdlib-jdk8-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib/1.3.61/kotlin-stdlib-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-common/1.3.61/kotlin-stdlib-common-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/annotations/13.0/annotations-13.0.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk7/1.3.61/kotlin-stdlib-jdk7-1.3.61.jar:/Users/MYNAME/.m2/repository/ai/djl/api/0.2.1/api-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/basicdataset/0.2.1/basicdataset-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/repository/0.2.1/repository-0.2.1.jar:/Users/MYNAME/.m2/repository/com/google/code/gson/gson/2.8.5/gson-2.8.5.jar:/Users/MYNAME/.m2/repository/ai/djl/model-zoo/0.2.1/model-zoo-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/mxnet/mxnet-model-zoo/0.2.1/mxnet-model-zoo-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/mxnet/mxnet-engine/0.2.1/mxnet-engine-0.2.1.jar:/Users/MYNAME/.m2/repository/net/java/dev/jna/jna/5.3.0/jna-5.3.0.jar:/Users/MYNAME/.m2/repository/org/slf4j/slf4j-api/1.7.30/slf4j-api-1.7.30.jar:/Users/MYNAME/.m2/repository/org/slf4j/slf4j-simple/1.7.30/slf4j-simple-1.7.30.jar)
at com.sun.jna.Native.extractFromResourcePath(Native.java:1095)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:276)
... 24 more
fun main() {
val paragraph = ("""BBC Japan was a general entertainment Channel.
Which operated between December 2004 and April 2006.
It ceased operations after its Japanese distributor folded.""")
val criteria = mapOf(
"backbone" to "bert",
"dataset" to "book_corpus_wiki_en_uncased"
)
arrayOf(
"When did BBC Japan start broadcasting?",
"When did BBC Japan stop broadcasting?"
).forEach { question ->
val input = QAInput(question, paragraph, 384)
println("Paragraph: ${input.paragraph}")
println("Question: ${input.question}")
MxModelZoo.BERT_QA.loadModel(criteria, ProgressBar()).use { model ->
model.newPredictor().use { predictor ->
println("Answer: ${predictor.predict(input)}")
}
}
}
}
pom.xml with
<dependency>
<groupId>ai.djl</groupId>
<artifactId>api</artifactId>
<version>0.2.1</version>
</dependency>
<dependency>
<groupId>ai.djl</groupId>
<artifactId>api</artifactId>
<version>0.2.1</version>
</dependency>
<dependency>
<groupId>ai.djl</groupId>
<artifactId>basicdataset</artifactId>
<version>0.2.1</version>
</dependency>
<dependency>
<groupId>ai.djl</groupId>
<artifactId>model-zoo</artifactId>
<version>0.2.1</version>
</dependency>
<dependency>
<groupId>ai.djl.mxnet</groupId>
<artifactId>mxnet-model-zoo</artifactId>
<version>0.2.1</version>
</dependency>
<dependency>
<groupId>ai.djl.mxnet</groupId>
<artifactId>mxnet-engine</artifactId>
<version>0.2.1</version>
</dependency>
<dependency>
<groupId>ai.djl.mxnet</groupId>
<artifactId>mxnet-native-cu92mkl</artifactId>
<version>1.6.0-b</version>
<classifier>macosx-x86_64-gpu</classifier>
</dependency>
<dependency>
<groupId>ai.djl.mxnet</groupId>
<artifactId>mxnet-native-cu101mkl</artifactId>
<version>1.6.0-b</version>
<classifier>macosx-x86_64-gpu</classifier>
</dependency>
<dependency>
<groupId>ai.djl.mxnet</groupId>
<artifactId>mxnet-native-mkl</artifactId>
<version>1.6.0-b</version>
<classifier>macosx-x86_64-gpu</classifier>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.30</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.30</version>
</dependency>
It ran successfully from the checkout of this project. I'm trying to get it to run using released libs.
I tried various values in the <classifier>macosx-x86_64-gpu</classifier>
Please provide the following information:
This issue is my proposal to rename the Block
class (https://github.com/awslabs/djl/blob/master/api/src/main/java/ai/djl/nn/Block.java) to LearnedFunction
. I am hoping to collect feedback and have a discussion with the community about it.
Right now, we use Block as the main class for representing a neural network. We chose Block because it conveyed the idea of composability: that the various Blocks can combine like lego blocks. This addresses the question of how neural networks are build up using small differentiable functions (operators) into a full network.
My concern with Block is that it doesn't convey a sense of freedom. Blocks are more rigid and can only go together in relatively fixed ways. However, the ways Blocks can go together is not quite clear. Are SequentialBlock and ParallelBlock sufficient for everything you need? Can blocks have variable number of children or is it fixed? How does conditionals or loops fit into the analogy?
That is why I am thinking that LearnedFunction might be a clearer representation. It can do pretty much anything a function can do and any programmer should be aware of what functions do. This makes it clear you can do things like composition, call other functions, and use control flow.
It is also a more clearer representation of what the Block class actually represents. The first two paragraphs of the Block javadoc, copied below, clearly show the ideology of a LearnedFunction:
A {@code Block} is a composable function that forms a neural network.
Blocks serve a purpose similar to functions that convert an input NDList to an output NDList. They can represent single operations, parts of a neural network, and even the whole neural network. What makes blocks special is that they contain a number of parameters that are used in their function and are trained during deep learning. As these parameters are trained, the functions represented by the blocks get more and more accurate. Each block consists of the following components:
There are also some concerns about this rename. First, the name of Block is used by other frameworks like Gluon (although new TF/Keras use layers, PT uses Module).
The other concern is that LearnedFunction is a more abstract concept than Block. Block, although not a perfectly accurate description, would be easier to understand. This could make it easier for new users to adapt to deep learning with DJL. Using a very abstract concept, on the other hand, would make it more difficult.
Please comment below if you have any other thoughts, ideas, or concerns regarding this. Also, add a reaction to the main description with thumbs up (+1) if you agree with the rename and a thumbs down (-1) if you think it is a bad idea.
DJL main project compiles with Gradle using the test option disabled (-x test). Unfortunately, I cannot build the examples folder with Gradle tool. It is showing some annoying compile errors. Based on my investigation, some class files from DJL are not visible from the example repository.
Example repository should compile.
Microsoft Windows [Version 10.0.18362.657]
(c) 2019 Microsoft Corporation. All rights reserved.
C:\MyWorks\DJL-AI\djl\examples>gradlew jar
Found C:\MyWorks\DJL-AI\djl\examples\\gradle\wrapper\gradle-wrapper.jar
> Task :compileJava FAILED
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ActionRecognition.java:20: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
^
symbol: class Criteria
location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\benchmark\util\AbstractBenchmark.java:21: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
^
symbol: class Criteria
location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\BertQaInference.java:20: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
^
symbol: class Criteria
location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\InstanceSegmentation.java:21: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
^
symbol: class Criteria
location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ObjectDetection.java:21: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
^
symbol: class Criteria
location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:24: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
^
symbol: class Criteria
location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\TrainWithOptimizers.java:38: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
^
symbol: class Criteria
location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\transferlearning\TrainResnetWithCifar10.java:38: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
^
symbol: class Criteria
location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ActionRecognition.java:54: error: cannot find symbol
Criteria<BufferedImage, Classifications> criteria =
^
symbol: class Criteria
location: class ActionRecognition
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ActionRecognition.java:55: error: cannot find symbol
Criteria.builder()
^
symbol: variable Criteria
location: class ActionRecognition
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\benchmark\util\AbstractBenchmark.java:191: error: package Criteria does not exist
Criteria.Builder<I, O> builder =
^
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\benchmark\util\AbstractBenchmark.java:192: error: cannot find symbol
Criteria.builder()
^
symbol: variable Criteria
location: class AbstractBenchmark<I,O>
where I,O are type-variables:
I extends Object declared in class AbstractBenchmark
O extends Object declared in class AbstractBenchmark
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\BertQaInference.java:64: error: cannot find symbol
Criteria<QAInput, String> criteria =
^
symbol: class Criteria
location: class BertQaInference
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\BertQaInference.java:65: error: cannot find symbol
Criteria.builder()
^
symbol: variable Criteria
location: class BertQaInference
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\InstanceSegmentation.java:58: error: cannot find symbol
Criteria<BufferedImage, DetectedObjects> criteria =
^
symbol: class Criteria
location: class InstanceSegmentation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\InstanceSegmentation.java:59: error: cannot find symbol
Criteria.builder()
^
symbol: variable Criteria
location: class InstanceSegmentation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ObjectDetection.java:58: error: cannot find symbol
Criteria<BufferedImage, DetectedObjects> criteria =
^
symbol: class Criteria
location: class ObjectDetection
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ObjectDetection.java:59: error: cannot find symbol
Criteria.builder()
^
symbol: variable Criteria
location: class ObjectDetection
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:76: error: cannot find symbol
Criteria<BufferedImage, DetectedObjects> criteria =
^
symbol: class Criteria
location: class PoseEstimation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:77: error: cannot find symbol
Criteria.builder()
^
symbol: variable Criteria
location: class PoseEstimation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:114: error: cannot find symbol
Criteria<BufferedImage, Joints> criteria =
^
symbol: class Criteria
location: class PoseEstimation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:115: error: cannot find symbol
Criteria.builder()
^
symbol: variable Criteria
location: class PoseEstimation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\TrainWithOptimizers.java:128: error: package Criteria does not exist
Criteria.Builder<BufferedImage, Classifications> builder =
^
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\TrainWithOptimizers.java:129: error: cannot find symbol
Criteria.builder()
^
symbol: variable Criteria
location: class TrainWithOptimizers
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\transferlearning\TrainResnetWithCifar10.java:123: error: package Criteria does not exist
Criteria.Builder<BufferedImage, Classifications> builder =
^
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\transferlearning\TrainResnetWithCifar10.java:124: error: cannot find symbol
Criteria.builder()
^
symbol: variable Criteria
location: class TrainResnetWithCifar10
26 errors
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':compileJava'.
> Compilation failed; see the compiler error output for details.
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
* Get more help at https://help.gradle.org
BUILD FAILED in 1s
1 actionable task: 1 executed
C:\MyWorks\DJL-AI\djl\examples>
Used the official version from GitHub.
(Paste the commands you ran that produced the error.)
https://github.com/awslabs/djl
gradlew build -x test
cd examples
gradlew jar
Please run the command ./gradlew debugEnv
from the root directory of DJL (if necessary, clone DJL first). It will output information about your system, environment, and installation that can help us debug your issue. Paste the output of the command below:
Microsoft Windows [Version 10.0.18362.657]
(c) 2019 Microsoft Corporation. All rights reserved.
C:\MyWorks\DJL-AI\djl>gradlew debugEnv
Found C:\MyWorks\DJL-AI\djl\\gradle\wrapper\gradle-wrapper.jar
> Configure project :mxnet:mxnet-engine
[WARN ] Header file has been changed in open source project: mxnet/c_api.h.
[WARN ] Header file has been changed in open source project: nnvm/c_api.h.
> Task :integration:debugEnv
[INFO ] - ----------System Properties----------
[INFO ] - sun.desktop: windows
[INFO ] - awt.toolkit: sun.awt.windows.WToolkit
[INFO ] - java.specification.version: 12
[INFO ] - sun.cpu.isalist: amd64
[INFO ] - sun.jnu.encoding: Cp1252
[INFO ] - java.class.path: C:\MyWorks\DJL-AI\djl\integration\build\classes\java\main;C:\MyWorks\DJL-AI\djl\integration\build\resources\main;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\commons-cli\commons-cli\1.4\c51c00206bb913cd8612b24abd9fa98ae89719b1\commons-cli-1.4.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.apache.logging.log4j\log4j-slf4j-impl\2.12.1\14973e22497adaf0196d481fb99c5dc2a0b58d41\log4j-slf4j-impl-2.12.1.jar;C:\MyWorks\DJL-AI\djl\basicdataset\build\libs\basicdataset-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\model-zoo\build\libs\model-zoo-0.3.0-SNAPSHOT.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.testng\testng\6.8.1\8aebea980eee079365df20f0cf7fcac900d50250\testng-6.8.1.jar;C:\MyWorks\DJL-AI\djl\mxnet\mxnet-model-zoo\build\libs\mxnet-model-zoo-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\mxnet\mxnet-engine\build\libs\mxnet-engine-0.3.0-SNAPSHOT.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\ai.djl.mxnet\mxnet-native-auto\1.6.0-c-SNAPSHOT\88086d340572c8452ce22c76b233e05974add594\mxnet-native-auto-1.6.0-c-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\pytorch\pytorch-model-zoo\build\libs\pytorch-model-zoo-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\pytorch\pytorch-engine\build\libs\pytorch-engine-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\repository\build\libs\repository-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\api\build\libs\api-0.3.0-SNAPSHOT.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.slf4j\slf4j-api\1.7.26\77100a62c2e6f04b53977b9f541044d7d722693d\slf4j-api-1.7.26.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.apache.logging.log4j\log4j-core\2.12.1\4382e93136c06bfb34ddfa0bb8a9fb4ea2f3df59\log4j-core-2.12.1.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.apache.logging.log4j\log4j-api\2.12.1\a55e6d987f50a515c9260b0451b4fa217dc539cb\log4j-api-2.12.1.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.beanshell\bsh\2.0b4\a05f0a0feefa8d8467ac80e16e7de071489f0d9c\bsh-2.0b4.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\com.beust\jcommander\1.27\58c9cbf0f1fa296f93c712f2cf46de50471920f9\jcommander-1.27.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.yaml\snakeyaml\1.6\a1e23e31c424d566ee27382e373d73a28fdabd88\snakeyaml-1.6.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\com.google.code.gson\gson\2.8.5\f645ed69d595b24d4cf8b3fbb64cc505bede8829\gson-2.8.5.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\net.java.dev.jna\jna\5.3.0\4654d1da02e4173ba7b64f7166378847db55448a\jna-5.3.0.jar
[INFO ] - java.vm.vendor: Oracle Corporation
[INFO ] - sun.arch.data.model: 64
[INFO ] - user.variant:
[INFO ] - java.vendor.url: https://java.oracle.com/
[INFO ] - user.timezone: America/Toronto
[INFO ] - java.vm.specification.version: 12
[INFO ] - os.name: Windows 10
[INFO ] - org.apache.logging.log4j.assignedSequences: 159
[INFO ] - user.country: CA
[INFO ] - sun.java.launcher: SUN_STANDARD
[INFO ] - sun.boot.library.path: C:\Program Files\Java\jdk-12.0.2\bin
[INFO ] - sun.java.command: ai.djl.integration.util.DebugEnvironment
[INFO ] - jdk.debug: release
[INFO ] - sun.cpu.endian: little
[INFO ] - user.home: C:\Users\MasudRahman
[INFO ] - user.language: en
[INFO ] - java.specification.vendor: Oracle Corporation
[INFO ] - java.version.date: 2019-07-16
[INFO ] - java.home: C:\Program Files\Java\jdk-12.0.2
[INFO ] - file.separator: \
[INFO ] - java.vm.compressedOopsMode: 32-bit
[INFO ] - line.separator:
[INFO ] - java.vm.specification.vendor: Oracle Corporation
[INFO ] - java.specification.name: Java Platform API Specification
[INFO ] - java.awt.graphicsenv: sun.awt.Win32GraphicsEnvironment
[INFO ] - user.script:
[INFO ] - sun.management.compiler: HotSpot 64-Bit Tiered Compilers
[INFO ] - java.runtime.version: 12.0.2+10
[INFO ] - user.name: MasudRahman
[INFO ] - path.separator: ;
[INFO ] - os.version: 10.0
[INFO ] - java.runtime.name: Java(TM) SE Runtime Environment
[INFO ] - file.encoding: windows-1252
[INFO ] - java.vm.name: Java HotSpot(TM) 64-Bit Server VM
[INFO ] - java.vendor.url.bug: https://bugreport.java.com/bugreport/
[INFO ] - java.io.tmpdir: C:\Users\MASUDR~1\AppData\Local\Temp\
[INFO ] - java.version: 12.0.2
[INFO ] - user.dir: C:\MyWorks\DJL-AI\djl\integration
[INFO ] - os.arch: amd64
[INFO ] - java.vm.specification.name: Java Virtual Machine Specification
[INFO ] - sun.os.patch.level:
[INFO ] - java.library.path: C:\Program Files\Java\jdk-12.0.2\bin;C:\windows\Sun\Java\bin;C:\windows\system32;C:\windows;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\windows\System32\OpenSSH\;C:\Program Files\MiKTeX 2.9\miktex\bin\x64\;C:\Program Files\Git\cmd;C:\Program Files\Java\jdk-12.0.2\bin;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2019.4.0\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\WiX Toolset v3.11\bin;C:\Program Files\dotnet\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;C:\MyWorks\MySofts\apache-maven-3.6.3\bin;C:\MyWorks\DJL-AI\gradle-6.2\bin;C:\Users\MasudRahman\AppData\Local\Programs\Python\Python37\Scripts\;C:\Users\MasudRahman\AppData\Local\Programs\Python\Python37\;C:\Users\MasudRahman\AppData\Local\Microsoft\WindowsApps;C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.3\bin;;C:\Users\MasudRahman\AppData\Local\Programs\MiKTeX 2.9\miktex\bin\x64\;C:\Users\MasudRahman\.dotnet\tools;.
[INFO ] - java.vm.info: mixed mode, sharing
[INFO ] - java.vendor: Oracle Corporation
[INFO ] - java.vm.version: 12.0.2+10
[INFO ] - sun.io.unicode.encoding: UnicodeLittle
[INFO ] - java.class.version: 56.0
[INFO ] -
[INFO ] - ----------Environment Variables----------
[INFO ] - USERDOMAIN_ROAMINGPROFILE: LAPTOP-9GR27E2K
[INFO ] - PROCESSOR_LEVEL: 6
[INFO ] - RegionCode: NA
[INFO ] - SESSIONNAME: Console
[INFO ] - ALLUSERSPROFILE: C:\ProgramData
[INFO ] - PROCESSOR_ARCHITECTURE: AMD64
[INFO ] - PSModulePath: C:\Program Files\WindowsPowerShell\Modules;C:\windows\system32\WindowsPowerShell\v1.0\Modules
[INFO ] - SystemDrive: C:
[INFO ] - MOZ_PLUGIN_PATH: C:\Program Files (x86)\Foxit Software\Foxit Reader\plugins\
[INFO ] - DIRNAME: C:\MyWorks\DJL-AI\djl\
[INFO ] - USERNAME: MasudRahman
[INFO ] - CMD_LINE_ARGS: debugEnv
[INFO ] - ProgramFiles(x86): C:\Program Files (x86)
[INFO ] - APP_HOME: C:\MyWorks\DJL-AI\djl\
[INFO ] - CUDA_PATH_V10_1: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
[INFO ] - PATHEXT: .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
[INFO ] - DriverData: C:\Windows\System32\Drivers\DriverData
[INFO ] - OneDriveConsumer: C:\Users\MasudRahman\OneDrive
[INFO ] - platformcode: KV
[INFO ] - PyCharm Community Edition: C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.3\bin;
[INFO ] - ProgramData: C:\ProgramData
[INFO ] - ProgramW6432: C:\Program Files
[INFO ] - HOMEPATH: \Users\MasudRahman
[INFO ] - NVCUDASAMPLES10_1_ROOT: C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.1
[INFO ] - PROCESSOR_IDENTIFIER: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
[INFO ] - ProgramFiles: C:\Program Files
[INFO ] - PUBLIC: C:\Users\Public
[INFO ] - windir: C:\windows
[INFO ] - =::: ::\
[INFO ] - _SKIP: 2
[INFO ] - LOCALAPPDATA: C:\Users\MasudRahman\AppData\Local
[INFO ] - USERDOMAIN: LAPTOP-9GR27E2K
[INFO ] - LOGONSERVER: \\LAPTOP-9GR27E2K
[INFO ] - JAVA_HOME: C:\Program Files\Java\jdk-12.0.2
[INFO ] - PROMPT: $P$G
[INFO ] - OneDrive: C:\Users\MasudRahman\OneDrive
[INFO ] - =C:: C:\MyWorks\DJL-AI\djl
[INFO ] - APPDATA: C:\Users\MasudRahman\AppData\Roaming
[INFO ] - DOWNLOAD_URL: "https://raw.githubusercontent.com/gradle/gradle/master/gradle/wrapper/gradle-wrapper.jar"
[INFO ] - JAVA_EXE: C:\Program Files\Java\jdk-12.0.2/bin/java.exe
[INFO ] - NVTOOLSEXT_PATH: C:\Program Files\NVIDIA Corporation\NvToolsExt\
[INFO ] - CommonProgramFiles: C:\Program Files\Common Files
[INFO ] - Path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\windows\System32\OpenSSH\;C:\Program Files\MiKTeX 2.9\miktex\bin\x64\;C:\Program Files\Git\cmd;C:\Program Files\Java\jdk-12.0.2\bin;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2019.4.0\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\WiX Toolset v3.11\bin;C:\Program Files\dotnet\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;C:\MyWorks\MySofts\apache-maven-3.6.3\bin;C:\MyWorks\DJL-AI\gradle-6.2\bin;C:\Users\MasudRahman\AppData\Local\Programs\Python\Python37\Scripts\;C:\Users\MasudRahman\AppData\Local\Programs\Python\Python37\;C:\Users\MasudRahman\AppData\Local\Microsoft\WindowsApps;C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.3\bin;;C:\Users\MasudRahman\AppData\Local\Programs\MiKTeX 2.9\miktex\bin\x64\;C:\Users\MasudRahman\.dotnet\tools
[INFO ] - OS: Windows_NT
[INFO ] - COMPUTERNAME: LAPTOP-9GR27E2K
[INFO ] - NVCUDASAMPLES_ROOT: C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.1
[INFO ] - CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
[INFO ] - OnlineServices: Online Services
[INFO ] - PROCESSOR_REVISION: 8e0c
[INFO ] - CLASSPATH: C:\MyWorks\DJL-AI\djl\\gradle\wrapper\gradle-wrapper.jar
[INFO ] - CommonProgramW6432: C:\Program Files\Common Files
[INFO ] - ComSpec: C:\windows\system32\cmd.exe
[INFO ] - APP_BASE_NAME: gradlew
[INFO ] - NVCUDASAMPLES9_0_ROOT: C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0
[INFO ] - SystemRoot: C:\windows
[INFO ] - TEMP: C:\Users\MASUDR~1\AppData\Local\Temp
[INFO ] - HOMEDRIVE: C:
[INFO ] - USERPROFILE: C:\Users\MasudRahman
[INFO ] - TMP: C:\Users\MASUDR~1\AppData\Local\Temp
[INFO ] - CUDA_PATH_V9_0: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
[INFO ] - CommonProgramFiles(x86): C:\Program Files (x86)\Common Files
[INFO ] - NUMBER_OF_PROCESSORS: 8
[INFO ] -
[INFO ] - ----------Default Engine----------
Exception in thread "main" java.util.ServiceConfigurationError: ai.djl.engine.EngineProvider: Provider ai.djl.mxnet.engine.MxEngineProvider could not be instantiated
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:583)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:805)
at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:723)
at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1395)
at ai.djl.engine.Engine.initEngine(Engine.java:46)
at ai.djl.engine.Engine.<clinit>(Engine.java:41)
at ai.djl.integration.util.DebugEnvironment.main(DebugEnvironment.java:51)
Caused by: java.lang.ExceptionInInitializerError
at ai.djl.mxnet.engine.MxEngine.<init>(MxEngine.java:40)
at ai.djl.mxnet.engine.MxEngineProvider.<clinit>(MxEngineProvider.java:21)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:781)
... 5 more
Caused by: java.lang.IllegalStateException: Failed to download MXNet native library
at ai.djl.mxnet.jna.LibUtils.findLibraryInClasspath(LibUtils.java:134)
at ai.djl.mxnet.jna.LibUtils.getLibName(LibUtils.java:76)
at ai.djl.mxnet.jna.LibUtils.loadLibrary(LibUtils.java:67)
at ai.djl.mxnet.jna.JnaUtils.<clinit>(JnaUtils.java:69)
... 13 more
Caused by: java.nio.file.FileAlreadyExistsException: C:\Users\MasudRahman\.mxnet\cache\1.6.0-c-SNAPSHOT-20200218mkl-win-x86_64
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:351)
at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
at java.base/java.nio.file.Files.move(Files.java:1424)
at ai.djl.mxnet.jna.LibUtils.downloadMxnet(LibUtils.java:316)
at ai.djl.mxnet.jna.LibUtils.findLibraryInClasspath(LibUtils.java:132)
... 16 more
> Task :integration:debugEnv FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':integration:debugEnv'.
> Process 'command 'C:\Program Files\Java\jdk-12.0.2\bin\java.exe'' finished with non-zero exit value 1
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
* Get more help at https://help.gradle.org
Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.0.1/userguide/command_line_interface.html#sec:command_line_warnings
BUILD FAILED in 10s
29 actionable tasks: 2 executed, 27 up-to-date
C:\MyWorks\DJL-AI\djl>
Visual representation of neural network, model score, parameter ratios, learning parameter,...
All aspect of neural network parameter that can be visualized in timely manner (iteration).
Present already provided parameters from network, visually.
Everybody, with visualization you can quickly see network over fitting, under fitting, network behavior. It makes easier to determine right network parameter and right network architecture.
Good example of very good visualization (web based, local or remote): https://deeplearning4j.konduit.ai/tuning-and-training/visualization
There are a few bugs in DJL right now when you try to use multiple engines.
For example if we use MXNet and TensorFlow engine together. If I set -Dai.djl.default_engine=MXNet, and call a TfEngine or TfModelZoo method, MxEngine and MxModelZoo is actually returned.
TfEngine.getInstance()
will return Default Engine instead of TfEngine.
TfModelZoo.RESNET.loadModel()
will return MxModel.RESNET if default engine is MXNet, will return PyModel.RESNET if default engine is Pytorch. But user already specified to use TfModelZoo
In Criteria.builder()
the .optEngine("TensorFlow")
option is not used by ModelZoo during loading model.
Right now the 2 ways to work with multiple engines are:
ai.djl.default_engine
back and forthnewInstance
with engineName will return the correct implementation under that engine:Model tfModel = Model.newInstance(Device.defaultDevice(), "TensorFlow")
Implement a YOLO model and add it to the DJL model zoo
Is there any guide on how to train an instance segmentation model for DJL?
There is a potential ArrayIndexOutOfBoundsException
in the method latestMetric
in class ai.djl.metric.Metrics
.
In the following code snippet, if list
is empty (not null
), the return list.get(list.size() - 1);
statement will has an ArrayIndexOutOfBoundsException
.
public Metric latestMetric(String name) {
List<Metric> list = metrics.get(name);
if (list == null) {
throw new IllegalArgumentException("Could not find metric: " + name);
}
return list.get(list.size() - 1);
}
`
change if (list == null)
to if (list == null || list.isEmpty())
DJL currently has :
notation support for ndarray access using NDIndex
but no ...
notation which is typically used to signify unspecified dimensions.
Existing :
based index can be simplified eg. :, :, :, 2
=> ..., 2
It should also make it easier to port python examples to DJL that use such notation.
https://python-reference.readthedocs.io/en/latest/docs/brackets/ellipsis.html
When I specify a model contained in a .zip
such as:
-Dai.djl.repository.zoo.location=https://djl-tensorflow-javacpp.s3.amazonaws.com/tensorflow-models/covid-19/saved_model.zip
This is out output of the extraction:
% ls /Users/ermurphy/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/9cd10ffd7f1adba3a00d0425403b69f7
saved_model
Notice a subdirectory of saved_model
is present. This is where the model files reside.
% ls /Users/ermurphy/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/9cd10ffd7f1adba3a00d0425403b69f7/saved_model
assets saved_model.pb variables
Here is the output when running the model from this .zip
:
org.tensorflow.exceptions.TensorFlowException: Could not find SavedModel .pb or .pbtxt at supplied export directory path: /Users/ermurphy/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/9cd10ffd7f1adba3a00d0425403b69f7
This results in an error because it's looking in the parent directory, and not the saved_model
directory. See how saved_model
is missing from the path from the TensorFlowException
.
There should be a way to specify a subdirectory for where the model files reside from a .zip
. It cannot be expected that they will be in the parent directory. The DJL Java code might also search the subdirectories first to find the .pb
before giving a path to TensorFlow.
Criteria when loading the model. This comes from the COVID example code.
Criteria<BufferedImage, Classifications> criteria = Criteria.builder()
.setTypes(BufferedImage.class, Classifications.class).optTranslator(new MyTranslator())
.optProgress(new ProgressBar()).build();
Dynamically modified learning parameter over time. We want learning parameter to change dynamically when network is learning:
Cycle (learning parameter alternation) mode should have multiple types, with configurable time interval and min., max. learning parameter value, for example:
Yes, implement learning parameter which dynamically changes over iteration/epoch
Everybody, this will enable network to learn more quickly.
Here is dynamic learning parameter change implementation, kind of saw wave over time:
https://github.com/eclipse/deeplearning4j/blob/b5f0ec072f3fd0da566e32f82c0e43ca36553f39/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule/CycleSchedule.java
We want to extend this implementation - see above.
This is a task to add at least one part of speech tagging dataset. These datasets help provide an example of an NLP token classification task, as well as having some use for training multi-purpose NLP models. A good example might be one from Universal Dependencies.
When I try to run inference on pre-trained embeddings while training NLP models, I see NullPointerException as Predictor is not designed to work with multiple devices.
[INFO ] - Load MXNet Engine Version 1.7.0 in 0.181 ms.
[INFO ] - forward P50: 3.519 ms, P90: 3.519 ms
[INFO ] - training-metrics P50: 0.048 ms, P90: 0.048 ms
[INFO ] - backward P50: 1.721 ms, P90: 1.721 ms
Exception in thread "main" java.lang.NullPointerException
at ai.djl.training.ParameterStore.getValue(ParameterStore.java:105)
at ai.djl.nn.core.Embedding.opInputs(Embedding.java:257)
at ai.djl.nn.core.Embedding.forward(Embedding.java:162)
at ai.djl.nn.Block.forward(Block.java:118)
at ai.djl.inference.Predictor.predict(Predictor.java:117)
at ai.djl.inference.Predictor.batchPredict(Predictor.java:144)
at ai.djl.inference.Predictor.predict(Predictor.java:112)
at ai.djl.modality.nlp.embedding.ModelZooTextEmbedding.embedText(ModelZooTextEmbedding.java:57)
at ai.djl.examples.training.TrainSentimentAnalysis$EmbeddingDataManager.getData(TrainSentimentAnalysis.java:277)
at ai.djl.training.Trainer.trainBatch(Trainer.java:159)
at ai.djl.examples.training.util.TrainingUtils.fit(TrainingUtils.java:36)
at ai.djl.examples.training.TrainSentimentAnalysis.runExample(TrainSentimentAnalysis.java:133)
at ai.djl.examples.training.TrainSentimentAnalysis.main(TrainSentimentAnalysis.java:89)
Suppressed: java.lang.IllegalArgumentException: Metric name not found: step
at ai.djl.metric.Metrics.percentile(Metrics.java:135)
at ai.djl.training.listener.LoggingTrainingListener.onTrainingEnd(LoggingTrainingListener.java:167)
at ai.djl.training.Trainer.lambda$close$5(Trainer.java:349)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at ai.djl.training.Trainer.close(Trainer.java:349)
at ai.djl.examples.training.TrainSentimentAnalysis.runExample(TrainSentimentAnalysis.java:159)
... 1 more
The documentation in DJL was originally written with the expectation that users are reasonably familiar with deep learning. So, it does not go out of the way to define and explain some of the key concepts. To help users who are newer to deep learning, we created a documentation convention for what explanation is required to get a basic understanding of the relevant topics. We now need to update the existing javadocs to contain all the required information.
Activation
)
Pool
)This issue is fairly big for a single person, so I want to set this up for multiple people to work on. Comment below if you are interested in helping with any of the documentation and which of the items above you want to work on. Also comment if you notice any other javadoc that does not match the convention. I will edit this description to keep it up to date as the documentation is updated.
I am attempting to reproduce this ObjectDetection example in Scala. However, when I specify the model criteria as illustrated in the example code, I get an error that the model is not found.
Model should be loaded and objects in image should be detected as in example
See error message below:
[[A[2019-12-15 12:38:57,736] [ERROR] [akka.actor.SupervisorStrategy] [HelloAkkaHttpServer-akka.actor.default-dispatcher-13] [akka://HelloAkkaHttpServer/user] - Model not found.
ai.djl.repository.zoo.ModelNotFoundException: Model not found.
at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:71)
at ai.djl.repository.zoo.ModelLoader.loadModel(ModelLoader.java:84)
To reproduce:
build.sbt file
libraryDependencies += "ai.djl" % "api" % "0.2.0"
libraryDependencies += "ai.djl" % "model-zoo" % "0.2.0"
sbt import log
[error] (update) sbt.librarymanagement.ResolveException: Error downloading ai.djl.mxnet:basicdataset:0.2.0
[error] Not found
[error] Not found
[error] not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/basicdataset/0.2.0/ivys/ivy.xml
[error] not found: https://repo1.maven.org/maven2/ai/djl/mxnet/basicdataset/0.2.0/basicdataset-0.2.0.pom
[error] Error downloading ai.djl.mxnet:examples:0.2.0
[error] Not found
[error] Not found
[error] not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/examples/0.2.0/ivys/ivy.xml
[error] not found: https://repo1.maven.org/maven2/ai/djl/mxnet/examples/0.2.0/examples-0.2.0.pom
[error] Error downloading ai.djl.mxnet:mxnet-native-mkl:0.2.0
[error] Not found
[error] Not found
[error] not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/mxnet-native-mkl/0.2.0/ivys/ivy.xml
[error] not found: https://repo1.maven.org/maven2/ai/djl/mxnet/mxnet-native-mkl/0.2.0/mxnet-native-mkl-0.2.0.pom
[error] (ssExtractDependencies) sbt.librarymanagement.ResolveException: Error downloading ai.djl.mxnet:basicdataset:0.2.0
[error] Not found
[error] Not found
[error] not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/basicdataset/0.2.0/ivys/ivy.xml
[error] not found: https://repo1.maven.org/maven2/ai/djl/mxnet/basicdataset/0.2.0/basicdataset-0.2.0.pom
[error] Error downloading ai.djl.mxnet:examples:0.2.0
[error] Not found
[error] Not found
[error] not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/examples/0.2.0/ivys/ivy.xml
[error] not found: https://repo1.maven.org/maven2/ai/djl/mxnet/examples/0.2.0/examples-0.2.0.pom
[error] Error downloading ai.djl.mxnet:mxnet-native-mkl:0.2.0
[error] Not found
[error] Not found
[error] not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/mxnet-native-mkl/0.2.0/ivys/ivy.xml
[error] not found: https://repo1.maven.org/maven2/ai/djl/mxnet/mxnet-native-mkl/0.2.0/mxnet-native-mkl-0.2.0.pom
[error] Total time: 3 s, completed Dec 15, 2019 1:07:13 PM
import scala.collection.JavaConverters._
val img = BufferedImageUtils.fromUrl(inputImageUrl)
val criteria = Map(
"size" -> "512",
"backbone" -> "resnet50",
"flavor" -> "v1",
"dataset" -> "voc"
).asJava
try {
val model = ModelZoo.SSD.loadModel(criteria, new ProgressBar())
val predictor = model.newPredictor()
val detectedObjects: DetectedObjects = predictor.predict(img)
detectedObjects
}
(Paste the commands you ran that produced the error.)
Please provide the following information:
Coco Detection creates and download Coco data into a wrong directory structure:
/root/.djl.ai/cache/repo/dataset/cv/ai/djl/basicdataset/coco/1.0/
-- annotations
---- annotations
-- train2017
----train2017
--val2017
-----val2017
and
/root/.djl.ai/cache/repo/dataset/cv/ai/djl/basicdataset/coco/1.0/annotations/
--annotations
---- captions_train2017.json
---- captions_val2017.json
---- instances_train2017.json
---- instances_val2017.json
---- person_keypoints_train2017.json
---- person_keypoints_val2017.json
.json files and and images should be in the upper directories
If we use CocoDataset for preparing data for a training pipeline similar to TrainPikachu, following error will appear
root@e1414f287bc3:~/git/danhlephuoc/djl/examples# mvn exec:java -Dexec.mainClass="ai.djl.examples.training.TrainCoco"
[INFO] Scanning for projects...
[INFO]
[INFO] --------------------------< ai.djl:examples >---------------------------
[INFO] Building examples 0.6.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ examples ---
[WARNING]
java.nio.file.NoSuchFileException: /root/.djl.ai/cache/repo/dataset/cv/ai/djl/basicdataset/coco/1.0/annotations/instances_train2017.json
at sun.nio.fs.UnixException.translateToIOException (UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException (UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException (UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel (UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel (Files.java:361)
at java.nio.file.Files.newByteChannel (Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream (FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream (Files.java:152)
at java.nio.file.Files.newBufferedReader (Files.java:2784)
at java.nio.file.Files.newBufferedReader (Files.java:2816)
at ai.djl.basicdataset.CocoUtils.prepare (CocoUtils.java:54)
at ai.djl.basicdataset.CocoDetection.prepareData (CocoDetection.java:146)
at ai.djl.repository.dataset.ZooDataset.prepare (ZooDataset.java:104)
at ai.djl.examples.training.TrainCoco.getDataset (TrainCoco.java:147)
at ai.djl.examples.training.TrainCoco.runExample (TrainCoco.java:79)
at ai.djl.examples.training.TrainCoco.main (TrainCoco.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
at java.lang.Thread.run (Thread.java:748)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.145 s
[INFO] Finished at: 2020-05-31T07:57:03+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project examples: An exception occured while executing the Java class. /root/.djl.ai/cache/repo/dataset/cv/ai/djl/basicdataset/coco/1.0/annotations/instances_train2017.json -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
I manually moved .json files and .jpg files to upper directories, the error is gone.
Add the dataset to the basic datasets.
I suggest renaming ModelZoo, ZooModel classes because:
Just my 2 cents :)
I noticed multiple instances where the code contains the following Javadoc: /** {@inheritDoc} */
This behavior is already the default. Try removing the Javadoc and regenerating the documentation and you will notice that nothing has changed.
The only time you need to use {@inheritDoc}
is if you plan to augment the inherited Javadoc, which in the above case you do not.
I suggest removing such instances from the source-code.
Add the WikiText2 dataset to the basic datasets.
Feature to allow passing argument overrides when loading models. Such argument overrides may, for example, allow changing default threshold for translators or adjust other parameters.
Yes, the API will change, specifically, load methods.
For example at present the method signature is:
ai.djl.repository.zoo.ModelLoader#loadModel(Map<java.lang.String,java.lang.String> criteria);
It would be nice to get a flavor like:
ai.djl.repository.zoo.ModelLoader#loadModel(Map<java.lang.String,java.lang.String> criteria, Map<> argumentOverrides);
The above is a trivialized approach that may be too simplistic.
Library consumers who would like to load models and adjust default parameters.
This may be beneficial if customers are trying to externalize configuration parameters needed to load models, predictors, translators and allow creating generic configuration in IoC environments like Spring.
ObjectDetection example with the pre-trained Yolo models (dataset=coco, backbone=darknet53) return error:
"MXNet engine call failed: CUDA: Check failed: e == cudaSuccess: an illegal memory access was encountered"
Object Detection example should work on different pre-trained Yolo models. Note, Yolo models trained with Pascal VOC work just fine.
INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ examples ---
Loading: 100% |\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588|
[11:32:04] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.6.0. Attempting to upgrade...
[11:32:04] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
model yolo
[11:32:13] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[WARNING]
ai.djl.engine.EngineException: MXNet engine call failed: CUDA: Check failed: e == cudaSuccess: an illegal memory access was encountered
Stack trace:
File "/codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h", line 81
at ai.djl.mxnet.jna.JnaUtils.checkCall (JnaUtils.java:1788)
at ai.djl.mxnet.jna.JnaUtils.syncCopyToCPU (JnaUtils.java:473)
at ai.djl.mxnet.engine.MxNDArray.toByteBuffer (MxNDArray.java:283)
at ai.djl.ndarray.NDArray.toIntArray (NDArray.java:279)
at ai.djl.modality.cv.translator.YoloTranslator.processOutput (YoloTranslator.java:40)
at ai.djl.modality.cv.translator.YoloTranslator.processOutput (YoloTranslator.java:26)
at ai.djl.inference.Predictor.processOutputs (Predictor.java:202)
at ai.djl.inference.Predictor.batchPredict (Predictor.java:160)
at ai.djl.inference.Predictor.predict (Predictor.java:112)
at ai.djl.examples.inference.ObjectDetectionBench.predict (ObjectDetectionBench.java:71)
at ai.djl.examples.inference.ObjectDetectionBench.main (ObjectDetectionBench.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
at java.lang.Thread.run (Thread.java:748)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16.461 s
[INFO] Finished at: 2020-05-30T11:32:18+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project examples: An exception occured while executing the Java class. MXNet engine call failed: CUDA: Check failed: e == cudaSuccess: an illegal memory access was encountered
[ERROR] Stack trace:
[ERROR] File "/codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h", line 81
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[11:32:18] src/resource.cc:279: Ignore CUDA Error [11:32:18] src/storage/./pooled_storage_manager.h:97: CUDA: an illegal memory access was encountered
[[[[11:32:18] 11:32:18] src/engine/threaded_engine_perdevice.cc11:32:18src/engine/threaded_engine_perdevice.cc] src/engine/threaded_engine_perdevice.cc:27511:32:18:275:: 275: Ignore CUDA Error [11:32:18] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered
] src/engine/threaded_engine_perdevice.cc:275Ignore CUDA Error [11:32:18] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered
: : Ignore CUDA Error [11:32:18] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered
Ignore CUDA Error [11:32:18] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered
terminate called after throwing an instance of 'dmlc::Error'
what(): [11:32:18] src/storage/./pooled_storage_manager.h:97: CUDA: an illegal memory access was encountered
Aborted (core dumped)
from
.optFilter("backbone", "resnet50")"
to
.optFilter("dataset", "coco")
.optFilter("imageSize","416")
.optFilter("backbone", "darknet53")
Ubuntu, CUDA 10.2, GPU V100
Can you please allow loading native library from jar? I didn't find a way to do it. But you could you NativeUtils for loading libraries from jar. And e.g. expose some system variable for specifying path in jar.
No
End users
Could you provide an jupyter notebook sample for BERT inference demo using pytorch?
Just like current BERT inference demo using mxnet
People already pretrained / fine-tuned their own model using pytorch can easily migrate into this project.
Thanks.
Weight decay regularization besides, l1 and l2 regularization.
Another implementation of regularization.
Everybody, learning will be faster.
Am trying out djl
and really excited to finally try out object detection in Java. What are the steps to label images for object detection?
I see that images needs to be separated in to train and test directories and also see index.file
containing the coordinates of annotated objects. What tool can be used to annotate images to generate index.file
?
Right now there are builds of mxnet-native* for osx-x86_64 and linux-x86_64.
It would benefit DJL a lot if there will be a possibility of running it on IBM's Power servers(ppc64le).
No API changes.
All who's using IBM's Power servers
DJL seems to complain when trying to set multiple elements via function when using NDIndex slicing.
I was expecting the example below to apply sigmoid function to elements 2-3 in each array
Setting just a number works though:
array.set(new NDIndex(":, 2:"), 2);
ND: (2, 4) gpu(0) float32
[[7.4022, 9.2099, 0.3902, 9.6896],
[9.2514, 4.4635, 6.6732, 1.0993],
]
Exception in thread "main" ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Check failed: src.Size() == dst->Size() (4 vs. 0) : Cannot reshape array of size 4 into shape [2,0]
Stack trace:
File "C:\source\mxnet-distro\mxnet-build\src\operator\numpy\np_matrix_op.cc", line 145
at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1788)
at ai.djl.mxnet.jna.JnaUtils.imperativeInvoke(JnaUtils.java:500)
at ai.djl.mxnet.jna.FunctionInfo.invoke(FunctionInfo.java:82)
at ai.djl.mxnet.jna.FunctionInfo.invoke(FunctionInfo.java:66)
at ai.djl.mxnet.engine.MxNDManager.invoke(MxNDManager.java:329)
at ai.djl.mxnet.engine.MxNDManager.invoke(MxNDManager.java:347)
at ai.djl.mxnet.engine.MxNDArray.reshape(MxNDArray.java:1167)
at ai.djl.mxnet.engine.MxNDArray.set(MxNDArray.java:348)
at ai.djl.ndarray.NDArray.set(NDArray.java:472)
at Main.main(Main.java:11)
public static void main(String[] args) {
NDManager manager = NDManager.newBaseManager();
NDArray array = manager.randomUniform(0, 10, new Shape(2, 4));
System.out.println(array);
array.set(new NDIndex(":, 2:"), arr -> arr.getNDArrayInternal().sigmoid());
System.out.println(array);
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.