Code Monkey home page Code Monkey logo

rma's People

Contributors

aik07 avatar aik7 avatar jeckstei avatar

rma's Issues

Error: bad_alloc for parallel RMA with printBBdetails=true

$ mpirun -np 2 ./build/rma --branchSelection=1 --perCachedCutPts=1  --printBBdetails=true ../../data/paper/spam.data 
User-specified solver options: 
branchSelection 1
perCachedCutPts 1
printBBdetails true

(mxn): 4601	40
m^+ m^-: 2788	1813
[0] Using default values for all solver options

PEBBL Configuration:
-------------------
1 cluster of size 2
2 processors
1 pure worker  ( 50.0%)
1 worker-hub   ( 50.0%)

Target timeslice: 0.01 seconds.

[0] GRMA Solution: +0.430559	CPU Time: 0.003221


Best Solution:  Value = 0.46142143012388614487

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
[LCC-155978:07566] *** Process received signal ***
[LCC-155978:07566] Signal: Aborted (6)
[LCC-155978:07566] Signal code:  (-6)
[LCC-155978:07566] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f7a1e75a8a0]
[LCC-155978:07566] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f7a1e395f47]
[LCC-155978:07566] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f7a1e3978b1]
[LCC-155978:07566] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c957)[0x7f7a1efa9957]
[LCC-155978:07566] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92ae6)[0x7f7a1efafae6]
[LCC-155978:07566] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b21)[0x7f7a1efafb21]
[LCC-155978:07566] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92d54)[0x7f7a1efafd54]
[LCC-155978:07566] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x932dc)[0x7f7a1efb02dc]
[LCC-155978:07566] [ 8] ./build/rma(_ZNSt6vectorIjSaIjEE17_M_default_appendEm+0xc3)[0x55e631178e03]
[LCC-155978:07566] [ 9] ./build/rma(_ZrsIjERN6utilib12UnPackBufferES2_RSt6vectorIT_SaIS4_EE+0x5bb)[0x55e631188c9b]
[LCC-155978:07566] [10] ./build/rma(_ZTv0_n112_N8pebblRMA11rmaSolution14unpackContentsERN6utilib12UnPackBufferE+0x2e)[0x55e631191b5e]
[LCC-155978:07566] [11] ./build/rma(_ZN5pebbl8solution6unpackERN6utilib12UnPackBufferE+0x98)[0x55e6311a4928]
[LCC-155978:07566] [12] ./build/rma(_ZN5pebbl17parallelBranching14unpackSolutionERN6utilib12UnPackBufferE+0x19b)[0x55e6311ca1db]
[LCC-155978:07566] [13] ./build/rma(_ZN5pebbl17parallelBranching11getSolutionEi+0x2a4)[0x55e631221544]
[LCC-155978:07566] [14] ./build/rma(_ZN5pebbl17parallelBranching13printSolutionEPKcS2_RSo+0x45)[0x55e6312242c5]
[LCC-155978:07566] [15] ./build/rma(_ZN5pebbl17parallelBranching14solutionToFileEv+0x189)[0x55e6312201a9]
[LCC-155978:07566] [16] ./build/rma(_ZN5pebbl17parallelBranching5solveEv+0xcb)[0x55e6311c870b]
[LCC-155978:07566] [17] ./build/rma(_ZN3rma9DriverRMA13solveExactRMAEv+0x11d)[0x55e63117a36d]
[LCC-155978:07566] [18] ./build/rma(_ZN3rma9DriverRMA8solveRMAEv+0x473)[0x55e63117ab73]
[LCC-155978:07566] [19] ./build/rma(main+0x33)[0x55e6311637e3]
[LCC-155978:07566] [20] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f7a1e378b97]
[LCC-155978:07566] [21] ./build/rma(_start+0x2a)[0x55e63116460a]
[LCC-155978:07566] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node LCC-155978 exited on signal 6 (Aborted).
--------------------------------------------------------------------------

An error occurred in MPI_Testsome

[e3c-167:21948] *** An error occurred in MPI_Testsome
[e3c-167:21948] *** reported by process [140737472954369,1]
[e3c-167:21948] *** on communicator MPI_COMM_WORLD
[e3c-167:21948] *** MPI_ERR_TRUNCATE: message truncated
[e3c-167:21948] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[e3c-167:21948] ***    and potentially your MPI job)

Parallel RMA: sometimes slower than serial RMA

$ ./build/rma  ../../data/paper/skin.data
Using default values for all solver options
(mxn): 245057	3
m^+ m^-: 194198	50859
Using default values for all solver options
GRMA Solution: +0.674843	CPU Time: 0.022393
#5 pool=3  inc=0.6748430 bnd=0.7843400 gap=13.960%

Best Solution:  Value = 0.67484299571120176520

Subproblems
-----------
Created            280  100.0%
Started Bounding   280  100.0%
Bounded            280  100.0%
Started Splitting  185   66.1%
Split              185   66.1%
Dead                95   33.9%

CPU run time          = 19.3 seconds
CPU total time        = 19.3 seconds
Wall clock total time = 19.3 seconds

x2<=170, 
ERMA Solution: 0.674843	CPU time: 19.281	Num of Nodes: 280

$ mpirun -np 2 ./build/rma  ../../data/paper/skin.data
Using default values for all solver options
(mxn): 245057	3
m^+ m^-: 194198	50859
[0] Using default values for all solver options

PEBBL Configuration:
-------------------
1 cluster of size 2
2 processors
1 pure worker  ( 50.0%)
1 worker-hub   ( 50.0%)

Target timeslice: 0.01 seconds.

[0] GRMA Solution: +0.674843	CPU Time: 0.02843
[0] h#3 pool=2  inc=0.674843 bnd=0.7912 gap=14.706%
[0] h#109 pool=28  inc=0.674843 bnd=0.68654 gap=1.704%


Best Solution:  Value = 0.6748429957112017652

x2<=170, 
Subproblems
-----------
Created                    276  100.0%
Started Bounding           276  100.0%
Bounded                    276  100.0%
Started Splitting          190   68.8%
Split                      190   68.8%
Dead                        86   31.2%

Bounded during ramp up       2    0.7%
In pool at end of ramp up    1    0.4%
Tokens at Hub              124   44.9%
Scattered to Hub            44   15.9%
Rebalanced to Hub           80   29.0%
Dispatched from a Hub      124   44.9%
Moved between Workers       52   18.8%

Average search time (CPU)           24.4 seconds.
Maximum search time (CPU)           24.4 seconds.
Average search time (Wall clock)    30.8 seconds.
Maximum search time (Wall clock)    30.8 seconds.

0 quiescence polls, 1 termination check.

                           Average   % of % where       Messages   % of
Thread/Function     Number Seconds  Total  Active   COV Received   Msgs
------------------- ------ ------- ------ ------- ----- -------- ------
Problem Broadcast        2     0.1   0.4%    0.4%  0.24        2   0.5%
Preprocessing            2     0.0   0.0%    0.0%  0.00        0   0.0%
Ramp-up                  2     6.4  26.3%   26.3%  0.00       33   8.9%
Worker                   2    16.4  67.1%   67.1%  0.09        0   0.0%
Hub                      1     0.0   0.0%    0.0%  0.00      186  50.1%
Incumbent Broadcast      2     0.0   0.0%    0.0%  0.00        0   0.0%
Subproblem Receiver      2     0.0   0.0%    0.0%  0.07       49  13.2%
Subproblem Server        2     0.0   0.0%    0.0%  1.00       51  13.7%
Termination Check        1     0.0   0.0%    0.0%  0.00        1   0.3%
Worker Auxiliary         1     0.0   0.0%    0.0%  0.00       44  11.9%
Cut Point Receiver       2     0.0   0.0%    0.0%  0.29        4   1.1%
Scheduler                2     0.5   2.1%    2.1%  0.99        0   0.0%
Solution Output          2     0.0   0.0%    0.0%  0.65        1   0.3%
Other Overhead           2     0.0   0.0%    0.0%  0.37        0   0.0%
Idle                     2     1.0   3.9%    3.9%  1.00        0   0.0%
                                   ------               -------- ------
Total                    2    24.4 100.0%  100.0%  0.00      371 100.0%

Messages per subproblem: 1.34
Hub loading factor: 2%

[0] ERMA Solution: 0.674843	CPU time: 24.2982	Num of Nodes: 276

setPebblParameters is really ugly

The setPebblParameters routine is really ugly -- we shouldn't need something like this. Try to rethink how the parameters are structured.

Confusing observation indices

When using dataOrigTrain, dataIntTrain, dataStandTrain, make sure to use indTrain(idx) to get the proper observation index.

In RMA, we remove zero-weight observations from the training set, but we should still integerize and standardize all data.

A stack trace disabled

[0] /home/kagawa/Projects/thesis/installpebbl/include/pebbl/utilib/stl_auxiliary.h:320: !is: operator>> - unpack problem.(PN0)
[0] [stack trace disabled: compile with UTILIB_HAVE_EXECINFO_H]

It happens sometimes in parallel RMA.

Need to copy PEBBL parameters

`
// RMA constructor
RMA::RMA(pebblParams *param) : workingSol(this), numCC_SP(0) { //, numTotalCutPts(0)

this->debug = param->debug;
this->maxCPUMinutes = param->maxCPUMinutes;
`

'std::runtime_error' : Attempt to destruct a solution with refCounter=1 -- use dispose() instead of delete

In pebbl_sol branch, added the following line at /src/solveRMA.cpp

+      delete rma; 

I get the following error.

terminate called after throwing an instance of 'std::runtime_error'
  what():  /home/aik/RMA/external/pebbl/src/pebbl/bb/branching.h:661: Attempt to destruct a solution with refCounter=1 -- use dispose() instead of delete(PN0) 

If I try,

+      rma->workingSol.dispose();
+
+      delete rma; 

I get this error.

free(): invalid pointer
[eckstein-enthoo:07700] *** Process received signal ***
[eckstein-enthoo:07700] Signal: Aborted (6)
[eckstein-enthoo:07700] Signal code:  (-6)

the incumbent does not work if you set --initialGuess=false

$ ./rma --initGuess=false --debug=0 ../data/cleveland.dat 
User-specified solver options: 
initGuess false
debug 0

(mxn): 297	35
m^+ m^-: 137	160
Using default values for all solver options
ERMA Solution: inf	CPU time: 0.018902	Num of Nodes: 1

$ ./rma --initGuess=true --debug=0 ../data/cleveland.dat 
User-specified solver options: 
initGuess true
debug 0

(mxn): 297	35
m^+ m^-: 137	160
Using default values for all solver options
GRMA Solution: -0.329966	CPU Time: 0.004389
ERMA Solution: 0.329966	CPU time: 0.335751	Num of Nodes: 206

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.