Comments (19)
A user running a CPU reported this same issue on the keybase. I don't have such a system to test in, but if you can share a plot id that causes this, it would be useful for attempting to reproduce it.
from bladebit.
I just got one that stuck
Creating 10 plots:
Output path : /mnt/t10/
Thread count : 256
Warm start enabled : true
Farmer public key :
ae04efd1aa703cb659bdb694fee4eabdfe803c1166f0986aaae43ba2edf15ec7260a2c0c5a7e7f29e6d4920a0f0e738b
Pool contract address : xch1t0fq7g90dj4wxuc0tyhxq8e4t4v8xhwzghq0d9m64vkvxrlvw42s6fuq30
System Memory: 490/503 GiB.
Memory required: 416 GiB.
Allocating buffers.
Generating plot 1 / 10: d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192
...
Running Phase 3
Compressing tables 1 and 2...
Finished compressing tables 1 and 2 in 38.57 seconds
Table 1 now has 3429180105 / 4294854473 entries ( 79.84% ).
Compressing tables 2 and 3...
Finished compressing tables 2 and 3 in 29.91 seconds
Table 2 now has 3439458240 / 4294695855 entries ( 80.09% ).
Compressing tables 3 and 4...
Finished compressing tables 3 and 4 in 34.76 seconds
Table 3 now has 3465317203 / 4294419427 entries ( 80.69% ).
Compressing tables 4 and 5...
[Froze here]
It seem P3 will copy partial plot to the exporting folder, so the plot get half written while P3 frozen:
42906103808 Jul 15 18:00 d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192.plot
Complete plot size would be something like 108833386496 in byte
from bladebit.
OK great, thank you, this might help.
Yes, in Phase 3 the plot starts to begin being written in the background in order to avoid having to wait too long for the buffers for the next plot.
Does it freeze if you re-use that same plot id? You can specify an explicit plot id with the -i <plot_id>
parameter.
from bladebit.
./bladebit -i d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192
Fatal Error:
A farmer public key is required. Please specify a farmer public key.
How can I specify plot id? if I both specify the farmer public key and contract address/pool key the plot id won't be the one I explicitly specified.
from bladebit.
./bladebit -i d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192 Fatal Error: A farmer public key is required. Please specify a farmer public key.
How can I specify plot id? if I both specify the farmer public key and contract address/pool key the plot id won't be the one I explicitly specified.
from bladebit.
./bladebit -i d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192 Fatal Error: A farmer public key is required. Please specify a farmer public key.
How can I specify plot id? if I both specify the farmer public key and contract address/pool key the plot id won't be the one I explicitly specified.
I am blind lol
Well, the second time this plot with same ID able to plot to the end without freeze:
Finished Phase 4 in 0.68 seconds.
Writing final plot tables to disk
Plot /mnt/t10/9a186801d56fb3e6f8973f25a2be5b369a4e9a82ffff7c17729c1e2aebe6228b.plot finished writing to disk:
Table 1 pointer : 4096 ( 0x0000000000001000 )
Table 2 pointer : 14838173696 ( 0x00000003746c9000 )
Table 3 pointer : 28818788352 ( 0x00000006b5bbd000 )
Table 4 pointer : 42904256512 ( 0x00000009fd4b0000 )
Table 5 pointer : 57258422272 ( 0x0000000d54de3000 )
Table 6 pointer : 72341868544 ( 0x00000010d7e95000 )
Table 7 pointer : 89778573312 ( 0x00000014e7385000 )
C1 table pointer : 107472904192 ( 0x0000001905e26000 )
C2 table pointer : 107474620416 ( 0x0000001905fc9000 )
C3 table pointer : 107474624512 ( 0x0000001905fca000 )
Finished writing tables to disk in 25.18 seconds. Finished plotting in 539.86 seconds (9.00 minutes).
from bladebit.
Just had the same issue. Hanged here:
Running Phase 3 Compressing tables 1 and 2... Finished compressing tables 1 and 2 in 19.47 seconds Table 1 now has 3429382928 / 4294967296 entries ( 79.85% ). Compressing tables 2 and 3... Finished compressing tables 2 and 3 in 19.38 seconds Table 2 now has 3439916689 / 4294967296 entries ( 80.09% ). Compressing tables 3 and 4...
Plot ID: e1b2de291813d05e6657ef93d51ab8e858976f5c6ece4c680606897d4d5f7920
from bladebit.
I just killed the frozen plotting, run another one and this one also hanged:
Running Phase 3 Compressing tables 1 and 2... Finished compressing tables 1 and 2 in 18.06 seconds Table 1 now has 3429347663 / 4294967296 entries ( 79.85% ). Compressing tables 2 and 3... Finished compressing tables 2 and 3 in 19.60 seconds Table 2 now has 3439894362 / 4294967296 entries ( 80.09% ). Compressing tables 3 and 4... Finished compressing tables 3 and 4 in 19.57 seconds Table 3 now has 3466138767 / 4294967296 entries ( 80.70% ). Compressing tables 4 and 5... Finished compressing tables 4 and 5 in 19.60 seconds Table 4 now has 3532983064 / 4294943989 entries ( 82.26% ). Compressing tables 5 and 6... Finished compressing tables 5 and 6 in 20.72 seconds Table 5 now has 3713666620 / 4294935750 entries ( 86.47% ). Compressing tables 6 and 7...
plot ID: a15e67cd5c1753ba7341751f7b625dbb2a337aed23c1d8c0ee9dcbad39363e15
EDIT: To now spam anymore - I run another process and it also hanged. So after installing bladebit only first plotting finished successfully, next 3 after that are freezing.
from bladebit.
Can you tell me what settings and what hardware you were using? Especially the thread count
from bladebit.
Can you tell me what settings and what hardware you were using? Especially the thread count
2 x AMD EPYC 7532 (NUMA). I didn't specify thread count so by default bladebit should use all threads (and looking at the htop it did). I didn't specify any other extra parameters so I just specify my farmer key and pool contract id. So I run bladebit with ./bladebit -f [key] -c [key] /data/destination
from bladebit.
Looking at the datasheet, it says the 7532 has 32 cores/64 threads. So a total of 128 threads running correct?
If you wouldn’t mind, can you run one with 64 threads only? I am trying to diagnose if this is thread related. I don’t have such a system to test on, so I’m navigating in the dark here.
from bladebit.
Looking at the datasheet, it says the 7532 has 32 cores/64 threads. So a total of 128 threads running correct?
If you wouldn’t mind, can you run one with 64 threads only? I am trying to diagnose if this is thread related. I don’t have such a system to test on, so I’m navigating in the dark here.
128 threads, correct. I provisioned a new server to try 64 cores but I first tried to reproduce the error but now 3 plots got created one after each other correctly (with 128 threads). Nothing froze. I'll try tomorrow again. I can also give you access to the machine if you'd like to test yourself. Drop me an email at [email protected] if you're interested
from bladebit.
When specify thread count than not intentionally using hyperthread (128 threads instead of 256 threads from dual EPYC 64 core) also not triggering the hang. Plus found that there is no plotting speed gain when using HT. More rounds of test required to confirm so I will keep update here.
Thinking this is a issue related to the HT from AMD EPYC ?
from bladebit.
I have this issue on Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
(run on 70 threads) and Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz
. (28 threads). Both ran in docker inside proxmox VM.
from bladebit.
I have this issue on
Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
(run on 70 threads) andIntel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz
. (28 threads). Both ran in docker inside proxmox VM.
Have you try specify the thread count by using -t option to match only the physical core count? From my test, it seems there is no real benefit using hyper threading in terms of plotting speed.
from bladebit.
They are multi purpose servers so more than performance I miss stability. I will plot my remaining space in a week so I don't want to spend my time for fine-tuning :) But thank you for suggestion.
from bladebit.
I just experienced the same issue. I am using a Threadripper Pro 64 core with 512gb of ECC RAM.
Running Phase 3, Compressing Tables 1 and 2....
Hangs here and CPU utilization stuck at 100% for every thread.
from bladebit.
Should be fixed by #25. Pending tests.
from bladebit.
The fix has now been merged to master. Thanks to the folks at Poolchia for giving me access to an instance stuck with this bug to bee able to debug it an issue a fix.
from bladebit.
Related Issues (20)
- Windows ramplot not valid HOT 12
- BBCuda stops: waiting for parks buffer and plot size keeps growing
- help info missing for "check" --- bladebit_cuda help check HOT 2
- Centos7.9 Building Error "fatal error: sys/random.h: No such file or directory" HOT 3
- 3.1.0 - Invalid Plots generated using --disk16 parameter (Ubuntu 22.04)
- 3.1.0 - low-proof Plots (WSL) generated using --disk16 HOT 1
- cuda plot Error STDERR: CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered. HOT 14
- Slow bladebit 3.1.0 writing on hdd disk and funny dependence on Ubuntu HOT 3
- 128GB mode Failed to Write Slice Error win11 only occurs on SSDs HOT 21
- win 10 wsl bladebit_cuda "cudaErrorMemoryAllocation : out of memory" although card has enough memory HOT 10
- CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered HOT 2
- bladebit-cuda-v3.1.0-rc1-ubuntu-x86-64 second plot much slower than first one.
- bladebit-cuda-v3.1.0-windows-x86-64 very slowly HOT 12
- when will compression in diskplotting be available? HOT 3
- Bladebit 3.1 very slow GPU plot
- FreeBSD support
- cudaErrorMemoryAllocation : out of memory in 16G mode HOT 1
- RAM Only - Supporting lower RAM requirements HOT 3
- internal compiler error: Segmentation fault HOT 4
- Is bladebit dead ? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bladebit.