HI, all In quick start demo, I have tested LR, WE+LR, WE+CNN successfully. But

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

LSTM running error ! about paddle HOT 13 CLOSED

lqniunjunlper commented on May 8, 2024

LSTM running error !

from paddle.

Comments (13)

reyoung commented on May 8, 2024 1

OK, we locate the problem here. It seems that the lstm layer is use some AVX instructions. We will fix it in few days.

from paddle.

lqniunjunlper commented on May 8, 2024

Before building paddle, "version GLIBC_2.14 no found" occured, so i update glibc from 2.12 to 2.14. Is this OK?

from paddle.

reyoung commented on May 8, 2024

It's very strange that PaddlePaddle didn't print call stack. If you are convenient, can you rebuild PaddlePaddle with flag '-DCMAKE_BUILD_TYPE=Debug', and rerun this training? Or can you give us the core dump files?

And you can refer this link http://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault

from paddle.

lqniunjunlper commented on May 8, 2024

@reyoung
I0907 17:26:32.151026 1053 Util.cpp:144] commandline: /data11/dis_ml/deeplearning/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output_lstm --trainer_count=4 --log_period=1000 --num_passes=15 --use_gpu=false --show_parameter_stats_period=2000 --test_all_data_in_one_period=1
I0907 17:26:32.151208 1053 Util.cpp:113] Calling runInitFunctions
I0907 17:26:32.151401 1053 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 17:26:32,723 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 17:26:32,723 networks.py:1125] The output order is [cost_0]
I0907 17:26:32.740944 1053 Trainer.cpp:169] trainer mode: Normal
I0907 17:26:32.826501 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856484 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856694 1053 GradientMachine.cpp:134] Initing parameters..
I0907 17:26:33.070418 1053 GradientMachine.cpp:141] Init parameters done.
I0907 17:26:33.346114 1062 ThreadLocal.cpp:39] thread use undeterministic rand seed:1063
I0907 17:26:33.367995 1065 ThreadLocal.cpp:39] thread use undeterministic rand seed:1066
I0907 17:26:33.373780 1064 ThreadLocal.cpp:39] thread use undeterministic rand seed:1065
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473240393 (unix time) try "date -d @1473240393" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 1053 (TID 0x7f50fe12e700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f510f76c710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f510e8743d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f510f7649d1 start_thread
Current Layer forward/backward stack is
@ 0x7f510e0598fd clone
/data11/dis_ml/deeplearning/paddle/bin/paddle: line 46: 1053 Illegal instruction ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

from paddle.

reyoung commented on May 8, 2024

@NIULQfromNJU Hello, it seems that PaddlePaddle use some CPU instructions that your CPU not support (AVX). Please try to rebuild your PaddlePaddle, disable the AVX support using
-DWITH_AVX=OFF, and rebuild it. That will solve your problem.

There is a TODO in CMake file to automatically select AVX flag depends on machine CPU, but it is still not developed.

Please set -DCMAKE_BUILD_TYPE=Debug -DWITH_AVX=OFF to rebuild PaddlePaddle, make sure there is no error. Then you can set -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_AVX=OFF, and install it to train your model.

from paddle.

lqniunjunlper commented on May 8, 2024

hi @reyoung , i rebuild the paddle with -DWITH_AVX=OFF, and then i run the quick start demo. But I have the same problem as before: LR, WE+LR, WE+CNN run successfully while WE+LSTM aborted. So strange! Is there any other instruction that is not supported by CPU in LSTM example?
The following is the error print:
I0907 20:30:21.711181 10069 Util.cpp:144] commandline: /data11/paddle/pd/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output --trainer_count=4 --log_period=20 --num_passes=15 --use_gpu=false --show_parameter_stats_period=100 --test_all_data_in_one_period=1
I0907 20:30:21.711364 10069 Util.cpp:113] Calling runInitFunctions
I0907 20:30:21.711556 10069 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 20:30:22,156 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 20:30:22,157 networks.py:1129] The output order is [cost_0]
I0907 20:30:22.174654 10069 Trainer.cpp:169] trainer mode: Normal
I0907 20:30:22.262153 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288261 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288434 10069 GradientMachine.cpp:134] Initing parameters..
I0907 20:30:22.491011 10069 GradientMachine.cpp:141] Init parameters done.
I0907 20:30:22.681430 10100 ThreadLocal.cpp:39] thread use undeterministic rand seed:10101
I0907 20:30:22.683939 10101 ThreadLocal.cpp:39] thread use undeterministic rand seed:10102
I0907 20:30:22.699645 10098 ThreadLocal.cpp:39] thread use undeterministic rand seed:10099
I0907 20:30:22.701810 10099 ThreadLocal.cpp:39] thread use undeterministic rand seed:10100
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473251422 (unix time) try "date -d @1473251422" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 10069 (TID 0x7f92afa00700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f92c202d710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f92c11353d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f92c20259d1 start_thread
Current Layer forward/backward stack is
@ 0x7f92c091a8fd clone
/data11/paddle/pd/bin/paddle: line 46: 10069 Illegal instruction ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

from paddle.

lqniunjunlper commented on May 8, 2024

@reyoung great！

from paddle.

reyoung commented on May 8, 2024

@NIULQfromNJU Please give us your cpu info. just cat /proc/cpuinfo

from paddle.

lqniunjunlper commented on May 8, 2024

@reyoung

processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
cpu MHz : 2401.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 10
cpu cores : 4
apicid : 21
initial apicid : 21
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
bogomips : 4800.24
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

from paddle.

reyoung commented on May 8, 2024

@NIULQfromNJU the code, that will fix this error, is under review. #51

from paddle.

reyoung commented on May 8, 2024

@NIULQfromNJU The fix code is merge into master branch. Please checkout and lstm should be ok now.

from paddle.

lqniunjunlper commented on May 8, 2024

@reyoung Well done! Now updated paddle can run lstm successfully! thx~

from paddle.

reyoung commented on May 8, 2024

@NIULQfromNJU You're welcome.

If there is anything I can help, don't hesitate to ask.

Thank you for your attention.

from paddle.

LSTM running error ! about paddle HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent