cheng-li / pyramid Goto Github PK
View Code? Open in Web Editor NEWOpen source Machine Learning library written in Java
License: Apache License 2.0
Open source Machine Learning library written in Java
License: Apache License 2.0
[chengli@fiji11 pyramid-0.1.0]$ python visualization/visualizer.py /huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/train_reports
Traceback (most recent call last):
File "visualization/visualizer.py", line 1624, in
main()
File "visualization/visualizer.py", line 1602, in main
f1 = open(directoryName + configName, 'r')
IOError: [Errno 2] No such file or directory: '/huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/data_config.json'
[chengli@fiji11 pyramid-0.1.0]$ python visualization/visualizer.py /huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/train_reports/
Json:/huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/train_reports/report_1.json load successfully.
When I run CBM with 10K training samples, 10K features and 12 labels (70 label sets) , it works fine. But when I increase the number of training samples to 150K, it throws the following error.. Most of the features are real valued. The dataset is around 95% sparse. The code uses around 50GB memory out of 128GB.
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at edu.neu.ccs.pyramid.application.AppLauncher.invokeMain(AppLauncher.java:72)
at edu.neu.ccs.pyramid.application.AppLauncher.launch(AppLauncher.java:39)
at edu.neu.ccs.pyramid.application.AppLauncher.main(AppLauncher.java:24)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.split(String.java:2338)
at java.lang.String.split(String.java:2410)
at edu.neu.ccs.pyramid.dataset.TRECFormat.fillMultiLabelClfDataSet(TRECFormat.java:332)
at edu.neu.ccs.pyramid.dataset.TRECFormat.loadMultiLabelClfDataSet(TRECFormat.java:159)
at edu.neu.ccs.pyramid.dataset.TRECFormat.loadMultiLabelClfDataSet(TRECFormat.java:106)
at edu.neu.ccs.pyramid.application.App5.train(App5.java:58)
at edu.neu.ccs.pyramid.application.App5.main(App5.java:38)
... 7 more
From web, what I got to know is, this error is raised when most of the run time is consumed in Garbage collection and the progress become too slow. Java shows this error as it suspects the program may never finish. As per the paper, CBM is able to handle TMC2007 dataset which is relatively large. So I am hoping there is a solution to this issue.
Any idea how to fix this issue?
[INFO] pyramid 0.12.6 ..................................... SUCCESS [ 0.238 s]
[INFO] phrase-count-plugin 1.0 ............................ SUCCESS [ 4.893 s]
[INFO] pyramid 0.12.6 ..................................... FAILURE [ 15.589 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 20.851 s
[INFO] Finished at: 2020-03-07T20:08:16-05:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.5.5:single (default) on project pyramid: Execution default of goal org.apache.maven.plugins:maven-assembly-plugin:2.5.5:single failed: user id '16779829' is too big ( > 2097151 ). -> [Help 1]
I have attached the error message. Do you know how to deal with it?
In the output folder, model files are being saved. What do the model files contain? How to use them?
How can we get the predictions for the test samples?
Hi @cheng-li ,
Is it possible to save the confidence scores of the predictions i.e. for a test data point when a label subset is predicted what is the confidence (or probability) of that label subset. This will help us to identify the hard-to-classify data points.
Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.