Comments (25)
I've updated PR #197 with the fix discussed in opensearch-project/OpenSearch#1649.
from tribuo.
The fix has ben merged into main
and we've released v4.1.1
today. It'll take several days to get it on Maven Central, there's an internal process we need to go through as we don't have automatic deployments setup yet.
from tribuo.
4.2.0
has been tagged, and both 4.1.1
and 4.2.0
are now available in Maven Central. The search indices are slowly updating, but the jars themselves are available.
from tribuo.
Is there a specific security.policy file that triggers this or is it any installation of a SecurityManager?
from tribuo.
Also can you paste the stack trace you get? Tribuo doesn't use the default ForkJoinPool anywhere so I'm surprised we're hitting this issue.
from tribuo.
I ran a quick check and ended up with this stack trace, which is what I assume you're getting.
java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "modifyThread")
at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at java.base/java.security.AccessController.checkPermission(AccessController.java:897)
at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:322)
at java.base/java.util.concurrent.ForkJoinPool.checkPermission(ForkJoinPool.java:667)
at java.base/java.util.concurrent.ForkJoinPool.<init>(ForkJoinPool.java:2320)
at java.base/java.util.concurrent.ForkJoinPool.<init>(ForkJoinPool.java:2165)
at org.tribuo.clustering.kmeans.KMeansTrainer.train(KMeansTrainer.java:197)
at org.tribuo.clustering.kmeans.KMeansTrainer.train(KMeansTrainer.java:298)
It's documented that the ForkJoinPool
needs this permission in the javadoc for it's constructor (https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/concurrent/ForkJoinPool.html#%3Cinit%3E(int)) so this isn't unexpected behaviour. The solution is to modify the security policy to grant Tribuo's jar the following permission:
grant codebase "file:/path/to/tribuo-clustering-kmeans-4.1.0.jar" {
permission java.lang.RuntimePermission "modifyThread";
};
We'll update the docs to note that this permission is necessary when running with a security manager.
For single threaded operation we could potentially modify the code not to use a ForkJoinPool
, but it would require extensive refactoring to migrate multithreaded use over to a ThreadPoolExecutor
and the performance may suffer.
Some of the modes of KNNModel
also use a ForkJoinPool
, we'll add a note to the docs there as well. There are modes of that model which use a regular executor service so those should be unaffected.
from tribuo.
@Craigacp Thanks a lot! BTW, are you running on Java 8?
from tribuo.
I think there are deployments on 8 and 11, and we run the CI on 8, 11 and 17. Not sure if any deployments are using a SecurityManager though.
Did granting that privilege fix your issue?
from tribuo.
No, this is the stack trace
java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "modifyThreadGroup")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:?]
at java.security.AccessController.checkPermission(AccessController.java:1036) ~[?:?]
at java.lang.SecurityManager.checkPermission(SecurityManager.java:408) ~[?:?]
at org.opensearch.secure_sm.SecureSM.checkThreadGroupAccess(SecureSM.java:205) ~[opensearch-secure-sm-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at org.opensearch.secure_sm.SecureSM.checkAccess(SecureSM.java:160) ~[opensearch-secure-sm-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at java.lang.ThreadGroup.checkAccess(ThreadGroup.java:311) ~[?:?]
at java.lang.Thread.<init>(Thread.java:421) ~[?:?]
at java.lang.Thread.<init>(Thread.java:707) ~[?:?]
at java.lang.Thread.<init>(Thread.java:540) ~[?:?]
at java.util.concurrent.ForkJoinWorkerThread.<init>(ForkJoinWorkerThread.java:100) ~[?:?]
at java.util.concurrent.ForkJoinPool$DefaultForkJoinWorkerThreadFactory$1.run(ForkJoinPool.java:720) ~[?:?]
at java.util.concurrent.ForkJoinPool$DefaultForkJoinWorkerThreadFactory$1.run(ForkJoinPool.java:717) ~[?:?]
at java.security.AccessController.doPrivileged(AccessController.java:391) ~[?:?]
at java.util.concurrent.ForkJoinPool$DefaultForkJoinWorkerThreadFactory.newThread(ForkJoinPool.java:716) ~[?:?]
at java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1329) ~[?:?]
at java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1353) ~[?:?]
at java.util.concurrent.ForkJoinPool.signalWork(ForkJoinPool.java:1478) ~[?:?]
at java.util.concurrent.ForkJoinPool.externalPush(ForkJoinPool.java:1912) ~[?:?]
at java.util.concurrent.ForkJoinPool.externalSubmit(ForkJoinPool.java:1930) ~[?:?]
at java.util.concurrent.ForkJoinPool.submit(ForkJoinPool.java:2506) ~[?:?]
at org.tribuo.clustering.kmeans.KMeansTrainer.train(KMeansTrainer.java:189) ~[tribuo-clustering-kmeans-4.0.2.jar:?]
at org.tribuo.clustering.kmeans.KMeansTrainer.train(KMeansTrainer.java:253) ~[tribuo-clustering-kmeans-4.0.2.jar:?]
But looks like caused by org.opensearch.secure_sm.SecureSM
, will take a look at that part
from tribuo.
Do you have the part of the stack trace which has a Tribuo class in it?
from tribuo.
I looked into the SecureSM
class a little bit, and that notes that it has tighter thread group restrictions than the default SecurityManager
- https://github.com/opensearch-project/OpenSearch/blob/main/libs/secure-sm/src/main/java/org/opensearch/secure_sm/SecureSM.java#L62. That's presumably why I didn't see this when I was testing before as I didn't realise ElasticSearch/OpenSearch had it's own implementation of SecurityManager
.
Try giving the Tribuo jar modifyThreadGroup
too:
grant codebase "file:/path/to/tribuo-clustering-kmeans-4.1.0.jar" {
permission java.lang.RuntimePermission "modifyThread";
permission java.lang.RuntimePermission "modifyThreadGroup";
};
from tribuo.
Do you have the part of the stack trace which has a Tribuo class in it?
Updated the original comment by adding the Tribuo class part. We are using
compile group: 'org.tribuo', name: 'tribuo-clustering-kmeans', version: '4.0.2'
Try giving the Tribuo jar modifyThreadGroup too:
I tried, not work. Like @spbjss said "the security policies doesn’t get propagated to ForkJoinPool worker threads if a SecurityManager is present"
from tribuo.
Well, I've managed to get it to work using Tribuo 4.0.2 and the built in Java SecurityManager
. The policy needs to be granted to all the code in the stack trace as the FJP creation isn't inside AccessController.doPrivileged
. I successfully ran one of the unit tests using the following policy (saved as tribuo.policy
):
grant codeBase "file:/path/to/tribuo/test/classes" {
permission java.lang.RuntimePermission "modifyThread";
permission java.lang.RuntimePermission "modifyThreadGroup";
};
grant codeBase "file:/path/to/tribuo/Clustering/KMeans/target/tribuo-clustering-kmeans-4.0.2-jar-with-dependencies.jar" {
permission java.lang.RuntimePermission "modifyThread";
permission java.lang.RuntimePermission "modifyThreadGroup";
permission java.lang.RuntimePermission "accessDeclaredMembers";
permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
permission java.util.logging.LoggingPermission "control";
permission java.io.FilePermission "<<ALL FILES>>", "read";
permission java.util.PropertyPermission "*", "read,write";
};
The extra permissions are used to allow our reflective provenance system to work, and could be minimized further depending on which file locations are used. I used the following command line:
tribuo$ java -cp Clustering/KMeans/target/tribuo-clustering-kmeans-4.0.2-jar-with-dependencies.jar:. -Djava.security.manager -Djava.security.policy=tribuo.policy TestKMeans
Whenever I try to activate SecureSM on JVM startup in Java 11 it causes the JVM to crash. Can you provide me a JVM startup string which successfully enables the SecureSM?
I was using the following command line:
tribuo$ java -cp opensearch-secure-sm-1.2.0.jar:Clustering/KMeans/target/tribuo-clustering-kmeans-4.0.2-jar-with-dependencies.jar:. -Djava.security.manager=org.opensearch.secure_sm.SecureSM -Djava.security.policy=tribuo.policy TestKMeans
from tribuo.
Looks like adding System.setSecurityManager(new org.opensearch.secure_sm.SecureSM());
to the start of my program correctly installs SecureSM
and then I can replicate the issue. This seems to be a specific interaction between SecureSM
and ForkJoinPool
, which isn't present when using the default Java SecurityManager
, so it's really a problem for the developers of SecureSM, or OpenJDK's core-libs team.
Is single threaded execution sufficient when running under SecureSM
? We could modify KMeansTrainer
so it doesn't use a ForkJoinPool
for a single thread, which should allow it to run, but as I mentioned above rewriting it to use a different threading mechanism for multi-threaded execution would be a breaking change and take some time. Given SecurityManager
is deprecated for removal in Java 17, I'm not sure rearchitecting Tribuo to allow running with a non-standard SecurityManager
like SecureSM
is worthwhile.
from tribuo.
Thanks @Craigacp.
Yes, SecureSM
has stricter check than default SecurityManager
. Kmeans will break as we are using SecureSM
.
I think it's reasonable to remove ForkJoinPool
for a single thread. But we may need to look for other ML lib if single-thread Kmeans doesn't perform well. So it will be good if you can replace ForkJoinPool
for multi-thread as well, so we can save effort to integrate another ML lib.
from tribuo.
Ok, I've got a branch which doesn't use the FJP in single threaded computation - 4c62478. Do you know if the data you are clustering is sparse or not? If it's actually dense, then there are further performance optimisations which will let Tribuo use dense vectors rather than sparse ones which we can apply. We did this for linear sgd models in 4.1, but haven't got around to applying that optimisation everywhere yet.
We can probably backport the single threaded fix into 4.1.1
which will arrive around the same time 4.2.0
lands, but it's not going to be backported to 4.0.2
.
from tribuo.
Thanks a lot! I think that's ok to not backport to 4.0.2
, we can use 4.1.1
, or upgrade to 4.2.0
if it works.
from tribuo.
@Craigacp , thanks for adding custom thread factory. Is this still true for your release plan?
We can probably backport the single threaded fix into 4.1.1 which will arrive around the same time 4.2.0 lands, but it's not going to be backported to 4.0.2.
from tribuo.
Yes, that's still the plan. I also implemented the dense vector optimization for KMeansTrainer
, but I'm undecided if that should go into 4.2.0
as it makes a backwards incompatible change to a protected method signature. I've not seen anyone subclass KMeansTrainer
, but that doesn't mean it hasn't happened.
from tribuo.
Got it. If the BWC is a problem, how about split the PR so you can merge custom thread factory fix first? You can have some time to rethink how to support BWC and send a separate PR. One way I can think is adding a new method and deprecate current one.
from tribuo.
Sorry, I was unclear, the performance optimization is in a separate branch. We will merge the thread factory fix first and that will definitely be in 4.2.0
and 4.1.1
.
from tribuo.
Oh, I misunderstood, thought the optimization is in the same PR. That sounds good. The performance optimization is not blocking issue for us now. Will be good if we can have that in next release 4.2.0, but respect your plan.
from tribuo.
@Craigacp seems v4.1.1 still not ready, I checked this https://mvnrepository.com/artifact/org.tribuo/tribuo-clustering-kmeans
from tribuo.
Yeah, sorry it got held up a little bit by some internal process. Still working through the 4.2.0 release process as well, which should hopefully be tagged next week, though the binaries for that will take longer as everyone is on vacation between Christmas & New Year.
from tribuo.
Sure, thanks a lot. No worry, not so urgent. Happy holiday!
from tribuo.
Related Issues (20)
- Error on irises-tribuo-v4.ipynb HOT 1
- TransformedModel doesn't have a protobuf
- mRMR HOT 1
- FS using wrapper approaches HOT 7
- Docs recommending IJava HOT 2
- Problem deserializing the XGBoostModel HOT 1
- Do you have any plans to support time-series predictions? HOT 1
- When packaged into docker container: FileNotFoundException: File /lib/linux-musl/x86_64/libxgboost4j.so HOT 6
- Memory and SQLDataSource HOT 1
- About csvLoader.loadDataSource HOT 4
- Configuring HyperParameters HOT 4
- Question about input feature mapping HOT 9
- Llama APIs HOT 1
- Load data from List obj in memory HOT 1
- MLP HOT 1
- TensorFlow Isuue
- Training loss HOT 1
- Weight and Bias in NN HOT 3
- HDBSCAN implementation in 4.3+ HOT 4
- Clustering Issue with Loading the Data HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tribuo.