cheng-li / pyramid Goto Github PK

View Code? Open in Web Editor NEW

187.0 31.0 57.0 4.93 MB

Open source Machine Learning library written in Java

License: Apache License 2.0

Java 98.32% HTML 1.68%

machine-learning machine-learning-library java gradient-boosting multi-label-classification

pyramid's People

Contributors

Stargazers

Watchers

Forkers

deyb binbinbian mindis jinghan92 bssrdf xiang525 fujianhai wangkan0128 ieyer jygan ssanbu08 tien-le-grenoble appcoreopc antoine-tran maoqiuzi styanddty yangvict c3p0hz miffy1216 zhouyonglong jychen129 shravankumar147 fancyerii knowledgehacker melodylail ustcliao tang16 alecnicol nakeryang nevernester ghacupha chenshengxmu linrongbin daisy1992 evan332211 yuhonghong66 apete batermj chaosju zwleagle kun-cockpit-tech waldstein1983 prem2017 dotrado yuan776 mierlaile francescoz93 anamorphicoptimus daijitao paulrich1234 sunnycd yongqian10 lkampoli jiwanchao rk19016 5l1v3r1 bibibiwen

pyramid's Issues

the visualizer folder parameter has to end with "/"

[chengli@fiji11 pyramid-0.1.0]$ python visualization/visualizer.py /huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/train_reports
Traceback (most recent call last):
File "visualization/visualizer.py", line 1624, in
main()
File "visualization/visualizer.py", line 1602, in main
f1 = open(directoryName + configName, 'r')
IOError: [Errno 2] No such file or directory: '/huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/data_config.json'
[chengli@fiji11 pyramid-0.1.0]$ python visualization/visualizer.py /huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/train_reports/
Json:/huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/train_reports/report_1.json load successfully.

CBM: java.lang.OutOfMemoryError: GC overhead limit exceeded

When I run CBM with 10K training samples, 10K features and 12 labels (70 label sets) , it works fine. But when I increase the number of training samples to 150K, it throws the following error.. Most of the features are real valued. The dataset is around 95% sparse. The code uses around 50GB memory out of 128GB.

Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at edu.neu.ccs.pyramid.application.AppLauncher.invokeMain(AppLauncher.java:72)
at edu.neu.ccs.pyramid.application.AppLauncher.launch(AppLauncher.java:39)
at edu.neu.ccs.pyramid.application.AppLauncher.main(AppLauncher.java:24)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.split(String.java:2338)
at java.lang.String.split(String.java:2410)
at edu.neu.ccs.pyramid.dataset.TRECFormat.fillMultiLabelClfDataSet(TRECFormat.java:332)
at edu.neu.ccs.pyramid.dataset.TRECFormat.loadMultiLabelClfDataSet(TRECFormat.java:159)
at edu.neu.ccs.pyramid.dataset.TRECFormat.loadMultiLabelClfDataSet(TRECFormat.java:106)
at edu.neu.ccs.pyramid.application.App5.train(App5.java:58)
at edu.neu.ccs.pyramid.application.App5.main(App5.java:38)
... 7 more

From web, what I got to know is, this error is raised when most of the run time is consumed in Garbage collection and the progress become too slow. Java shows this error as it suspects the program may never finish. As per the paper, CBM is able to handle TMC2007 dataset which is relatively large. So I am hoping there is a solution to this issue.

Any idea how to fix this issue?

Compilation Error "maven-assembly-plugin:2.5.5:single failed: user id '16779829' is too big ( > 2097151 )"

[INFO] pyramid 0.12.6 ..................................... SUCCESS [ 0.238 s]
[INFO] phrase-count-plugin 1.0 ............................ SUCCESS [ 4.893 s]
[INFO] pyramid 0.12.6 ..................................... FAILURE [ 15.589 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 20.851 s
[INFO] Finished at: 2020-03-07T20:08:16-05:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.5.5:single (default) on project pyramid: Execution default of goal org.apache.maven.plugins:maven-assembly-plugin:2.5.5:single failed: user id '16779829' is too big ( > 2097151 ). -> [Help 1]

I have attached the error message. Do you know how to deal with it?

How to get the predictions for the test samples

In the output folder, model files are being saved. What do the model files contain? How to use them?

How can we get the predictions for the test samples?

Confidence scores of the Predictions

Hi @cheng-li ,
Is it possible to save the confidence scores of the predictions i.e. for a test data point when a label subset is predicted what is the confidence (or probability) of that label subset. This will help us to identify the hard-to-classify data points.

Thanks in advance.

cheng-li / pyramid Goto Github PK

pyramid's People

Contributors

Stargazers

Watchers

Forkers

pyramid's Issues

the visualizer folder parameter has to end with "/"

CBM: java.lang.OutOfMemoryError: GC overhead limit exceeded

Compilation Error "maven-assembly-plugin:2.5.5:single failed: user id '16779829' is too big ( > 2097151 )"

How to get the predictions for the test samples

Confidence scores of the Predictions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent