Comments (13)
OK, we locate the problem here. It seems that the lstm layer is use some AVX instructions. We will fix it in few days.
from paddle.
Before building paddle, "version GLIBC_2.14 no found" occured, so i update glibc from 2.12 to 2.14. Is this OK?
from paddle.
It's very strange that PaddlePaddle didn't print call stack. If you are convenient, can you rebuild PaddlePaddle with flag '-DCMAKE_BUILD_TYPE=Debug', and rerun this training? Or can you give us the core dump files?
And you can refer this link http://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault
from paddle.
@reyoung
I0907 17:26:32.151026 1053 Util.cpp:144] commandline: /data11/dis_ml/deeplearning/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output_lstm --trainer_count=4 --log_period=1000 --num_passes=15 --use_gpu=false --show_parameter_stats_period=2000 --test_all_data_in_one_period=1
I0907 17:26:32.151208 1053 Util.cpp:113] Calling runInitFunctions
I0907 17:26:32.151401 1053 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 17:26:32,723 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 17:26:32,723 networks.py:1125] The output order is [cost_0]
I0907 17:26:32.740944 1053 Trainer.cpp:169] trainer mode: Normal
I0907 17:26:32.826501 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856484 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856694 1053 GradientMachine.cpp:134] Initing parameters..
I0907 17:26:33.070418 1053 GradientMachine.cpp:141] Init parameters done.
I0907 17:26:33.346114 1062 ThreadLocal.cpp:39] thread use undeterministic rand seed:1063
I0907 17:26:33.367995 1065 ThreadLocal.cpp:39] thread use undeterministic rand seed:1066
I0907 17:26:33.373780 1064 ThreadLocal.cpp:39] thread use undeterministic rand seed:1065
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473240393 (unix time) try "date -d @1473240393" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 1053 (TID 0x7f50fe12e700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f510f76c710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f510e8743d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f510f7649d1 start_thread
Current Layer forward/backward stack is
@ 0x7f510e0598fd clone
/data11/dis_ml/deeplearning/paddle/bin/paddle: line 46: 1053 Illegal instruction ${DEBUGGER}
from paddle.
@NIULQfromNJU Hello, it seems that PaddlePaddle use some CPU instructions that your CPU not support (AVX). Please try to rebuild your PaddlePaddle, disable the AVX support using
-DWITH_AVX=OFF
, and rebuild it. That will solve your problem.
There is a TODO in CMake file to automatically select AVX flag depends on machine CPU, but it is still not developed.
Please set -DCMAKE_BUILD_TYPE=Debug -DWITH_AVX=OFF
to rebuild PaddlePaddle, make sure there is no error. Then you can set -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_AVX=OFF
, and install it to train your model.
from paddle.
hi @reyoung , i rebuild the paddle with -DWITH_AVX=OFF, and then i run the quick start demo. But I have the same problem as before: LR, WE+LR, WE+CNN run successfully while WE+LSTM aborted. So strange! Is there any other instruction that is not supported by CPU in LSTM example?
The following is the error print:
I0907 20:30:21.711181 10069 Util.cpp:144] commandline: /data11/paddle/pd/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output --trainer_count=4 --log_period=20 --num_passes=15 --use_gpu=false --show_parameter_stats_period=100 --test_all_data_in_one_period=1
I0907 20:30:21.711364 10069 Util.cpp:113] Calling runInitFunctions
I0907 20:30:21.711556 10069 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 20:30:22,156 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 20:30:22,157 networks.py:1129] The output order is [cost_0]
I0907 20:30:22.174654 10069 Trainer.cpp:169] trainer mode: Normal
I0907 20:30:22.262153 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288261 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288434 10069 GradientMachine.cpp:134] Initing parameters..
I0907 20:30:22.491011 10069 GradientMachine.cpp:141] Init parameters done.
I0907 20:30:22.681430 10100 ThreadLocal.cpp:39] thread use undeterministic rand seed:10101
I0907 20:30:22.683939 10101 ThreadLocal.cpp:39] thread use undeterministic rand seed:10102
I0907 20:30:22.699645 10098 ThreadLocal.cpp:39] thread use undeterministic rand seed:10099
I0907 20:30:22.701810 10099 ThreadLocal.cpp:39] thread use undeterministic rand seed:10100
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473251422 (unix time) try "date -d @1473251422" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 10069 (TID 0x7f92afa00700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f92c202d710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f92c11353d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f92c20259d1 start_thread
Current Layer forward/backward stack is
@ 0x7f92c091a8fd clone
/data11/paddle/pd/bin/paddle: line 46: 10069 Illegal instruction ${DEBUGGER}
from paddle.
@reyoung great!
from paddle.
@NIULQfromNJU Please give us your cpu info. just cat /proc/cpuinfo
from paddle.
processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
cpu MHz : 2401.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 10
cpu cores : 4
apicid : 21
initial apicid : 21
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
bogomips : 4800.24
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
from paddle.
@NIULQfromNJU the code, that will fix this error, is under review. #51
from paddle.
@NIULQfromNJU The fix code is merge into master branch. Please checkout and lstm should be ok now.
from paddle.
@reyoung Well done! Now updated paddle can run lstm successfully! thx~
from paddle.
@NIULQfromNJU You're welcome.
If there is anything I can help, don't hesitate to ask.
Thank you for your attention.
from paddle.
Related Issues (20)
- paddle-gpu总是报找不到类似libnccl.so链接库的错误 HOT 4
- paddle.utils.run_check()报错
- Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. HOT 2
- 对于Custom Device如何注册fake_quantize_range_abs_max HOT 9
- Miscellaneous issues on CUDA HOT 1
- 使用其他op实现2.3中的cumprod算子 HOT 1
- 静态图高阶微分计算结果不符合预期 HOT 1
- jetson nano jetpack4.6 有cp38的paddlepaddle gpu whl吗 HOT 1
- paddle.einsum is broken in dev version, and breaks tests in einops HOT 1
- 静态图模式下通过nccl等方式完成流水线并行通信是否行得通 HOT 1
- index_add反向传播报错 HOT 1
- 缺少SequentialSampler HOT 2
- Visualdl输入命令后卡住没有反应 HOT 2
- Different result on CPU and GPU HOT 2
- 【Hackathon 6th Fundable Projects 2】修复记录
- paddlepaddle 2.6.1 paddleocr 2.7.3 lang='ch'与ocr_version=‘PP-OCRv4’ 报错FatalError: `Illegal instruction` is detected by the operating system Illegal instruction (core dumped) HOT 1
- PGL(Paddle Graph Learning)模块是否有ARM架构的版本呢? HOT 1
- Unable to install paddlepaddle-gpu with Cuda 12 on Windows HOT 1
- paddle_infer::CreatePredictor(config)崩溃 HOT 1
- Inadequate Validation for 'scale' Parameter in Model APIs. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddle.