Code Monkey home page Code Monkey logo

Comments (19)

harold-b avatar harold-b commented on July 29, 2024

A user running a CPU reported this same issue on the keybase. I don't have such a system to test in, but if you can share a plot id that causes this, it would be useful for attempting to reproduce it.

from bladebit.

paspy avatar paspy commented on July 29, 2024

I just got one that stuck

 Creating 10 plots:
 Output path           : /mnt/t10/
 Thread count          : 256
 Warm start enabled    : true
 Farmer public key     : 
 ae04efd1aa703cb659bdb694fee4eabdfe803c1166f0986aaae43ba2edf15ec7260a2c0c5a7e7f29e6d4920a0f0e738b
 Pool contract address : xch1t0fq7g90dj4wxuc0tyhxq8e4t4v8xhwzghq0d9m64vkvxrlvw42s6fuq30

System Memory: 490/503 GiB.
Memory required: 416 GiB.
Allocating buffers.
Generating plot 1 / 10: d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192

...

Running Phase 3
  Compressing tables 1 and 2...
  Finished compressing tables 1 and 2 in 38.57 seconds
  Table 1 now has 3429180105 / 4294854473 entries ( 79.84% ).
  Compressing tables 2 and 3...
  Finished compressing tables 2 and 3 in 29.91 seconds
  Table 2 now has 3439458240 / 4294695855 entries ( 80.09% ).
  Compressing tables 3 and 4...
  Finished compressing tables 3 and 4 in 34.76 seconds
  Table 3 now has 3465317203 / 4294419427 entries ( 80.69% ).
  Compressing tables 4 and 5...
  [Froze here]

It seem P3 will copy partial plot to the exporting folder, so the plot get half written while P3 frozen:

42906103808 Jul 15 18:00 d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192.plot

Complete plot size would be something like 108833386496 in byte

from bladebit.

harold-b avatar harold-b commented on July 29, 2024

OK great, thank you, this might help.

Yes, in Phase 3 the plot starts to begin being written in the background in order to avoid having to wait too long for the buffers for the next plot.

Does it freeze if you re-use that same plot id? You can specify an explicit plot id with the -i <plot_id> parameter.

from bladebit.

paspy avatar paspy commented on July 29, 2024
./bladebit -i d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192
Fatal Error:
  A farmer public key is required. Please specify a farmer public key.

How can I specify plot id? if I both specify the farmer public key and contract address/pool key the plot id won't be the one I explicitly specified.

from bladebit.

paspy avatar paspy commented on July 29, 2024
./bladebit -i d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192
Fatal Error:
  A farmer public key is required. Please specify a farmer public key.

How can I specify plot id? if I both specify the farmer public key and contract address/pool key the plot id won't be the one I explicitly specified.

from bladebit.

paspy avatar paspy commented on July 29, 2024
./bladebit -i d1476057a46d887f6770ee1c435049853a60ca6352206799b0a27a49670bb192
Fatal Error:
  A farmer public key is required. Please specify a farmer public key.

How can I specify plot id? if I both specify the farmer public key and contract address/pool key the plot id won't be the one I explicitly specified.

I am blind lol

Well, the second time this plot with same ID able to plot to the end without freeze:

Finished Phase 4 in 0.68 seconds.
Writing final plot tables to disk

Plot /mnt/t10/9a186801d56fb3e6f8973f25a2be5b369a4e9a82ffff7c17729c1e2aebe6228b.plot finished writing to disk:
  Table 1 pointer  :             4096 ( 0x0000000000001000 )
  Table 2 pointer  :      14838173696 ( 0x00000003746c9000 )
  Table 3 pointer  :      28818788352 ( 0x00000006b5bbd000 )
  Table 4 pointer  :      42904256512 ( 0x00000009fd4b0000 )
  Table 5 pointer  :      57258422272 ( 0x0000000d54de3000 )
  Table 6 pointer  :      72341868544 ( 0x00000010d7e95000 )
  Table 7 pointer  :      89778573312 ( 0x00000014e7385000 )
  C1 table pointer :     107472904192 ( 0x0000001905e26000 )
  C2 table pointer :     107474620416 ( 0x0000001905fc9000 )                                                                                                                                                                                                                    
  C3 table pointer :     107474624512 ( 0x0000001905fca000 )

Finished writing tables to disk in 25.18 seconds.                                                                                                                                                                                                                             Finished plotting in 539.86 seconds (9.00 minutes).

from bladebit.

DavidZisky avatar DavidZisky commented on July 29, 2024

Just had the same issue. Hanged here:

Running Phase 3 Compressing tables 1 and 2... Finished compressing tables 1 and 2 in 19.47 seconds Table 1 now has 3429382928 / 4294967296 entries ( 79.85% ). Compressing tables 2 and 3... Finished compressing tables 2 and 3 in 19.38 seconds Table 2 now has 3439916689 / 4294967296 entries ( 80.09% ). Compressing tables 3 and 4...

Plot ID: e1b2de291813d05e6657ef93d51ab8e858976f5c6ece4c680606897d4d5f7920

from bladebit.

DavidZisky avatar DavidZisky commented on July 29, 2024

I just killed the frozen plotting, run another one and this one also hanged:

Running Phase 3 Compressing tables 1 and 2... Finished compressing tables 1 and 2 in 18.06 seconds Table 1 now has 3429347663 / 4294967296 entries ( 79.85% ). Compressing tables 2 and 3... Finished compressing tables 2 and 3 in 19.60 seconds Table 2 now has 3439894362 / 4294967296 entries ( 80.09% ). Compressing tables 3 and 4... Finished compressing tables 3 and 4 in 19.57 seconds Table 3 now has 3466138767 / 4294967296 entries ( 80.70% ). Compressing tables 4 and 5... Finished compressing tables 4 and 5 in 19.60 seconds Table 4 now has 3532983064 / 4294943989 entries ( 82.26% ). Compressing tables 5 and 6... Finished compressing tables 5 and 6 in 20.72 seconds Table 5 now has 3713666620 / 4294935750 entries ( 86.47% ). Compressing tables 6 and 7...

plot ID: a15e67cd5c1753ba7341751f7b625dbb2a337aed23c1d8c0ee9dcbad39363e15

EDIT: To now spam anymore - I run another process and it also hanged. So after installing bladebit only first plotting finished successfully, next 3 after that are freezing.

from bladebit.

harold-b avatar harold-b commented on July 29, 2024

Can you tell me what settings and what hardware you were using? Especially the thread count

from bladebit.

DavidZisky avatar DavidZisky commented on July 29, 2024

Can you tell me what settings and what hardware you were using? Especially the thread count

2 x AMD EPYC 7532 (NUMA). I didn't specify thread count so by default bladebit should use all threads (and looking at the htop it did). I didn't specify any other extra parameters so I just specify my farmer key and pool contract id. So I run bladebit with ./bladebit -f [key] -c [key] /data/destination

from bladebit.

harold-b avatar harold-b commented on July 29, 2024

Looking at the datasheet, it says the 7532 has 32 cores/64 threads. So a total of 128 threads running correct?

If you wouldn’t mind, can you run one with 64 threads only? I am trying to diagnose if this is thread related. I don’t have such a system to test on, so I’m navigating in the dark here.

from bladebit.

DavidZisky avatar DavidZisky commented on July 29, 2024

Looking at the datasheet, it says the 7532 has 32 cores/64 threads. So a total of 128 threads running correct?

If you wouldn’t mind, can you run one with 64 threads only? I am trying to diagnose if this is thread related. I don’t have such a system to test on, so I’m navigating in the dark here.

128 threads, correct. I provisioned a new server to try 64 cores but I first tried to reproduce the error but now 3 plots got created one after each other correctly (with 128 threads). Nothing froze. I'll try tomorrow again. I can also give you access to the machine if you'd like to test yourself. Drop me an email at [email protected] if you're interested

from bladebit.

paspy avatar paspy commented on July 29, 2024

When specify thread count than not intentionally using hyperthread (128 threads instead of 256 threads from dual EPYC 64 core) also not triggering the hang. Plus found that there is no plotting speed gain when using HT. More rounds of test required to confirm so I will keep update here.
Thinking this is a issue related to the HT from AMD EPYC ?

from bladebit.

abranski avatar abranski commented on July 29, 2024

I have this issue on Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz (run on 70 threads) and Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz. (28 threads). Both ran in docker inside proxmox VM.

from bladebit.

paspy avatar paspy commented on July 29, 2024

I have this issue on Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz (run on 70 threads) and Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz. (28 threads). Both ran in docker inside proxmox VM.

Have you try specify the thread count by using -t option to match only the physical core count? From my test, it seems there is no real benefit using hyper threading in terms of plotting speed.

from bladebit.

abranski avatar abranski commented on July 29, 2024

They are multi purpose servers so more than performance I miss stability. I will plot my remaining space in a week so I don't want to spend my time for fine-tuning :) But thank you for suggestion.

from bladebit.

brianwfreeman avatar brianwfreeman commented on July 29, 2024

I just experienced the same issue. I am using a Threadripper Pro 64 core with 512gb of ECC RAM.

Running Phase 3, Compressing Tables 1 and 2....

Hangs here and CPU utilization stuck at 100% for every thread.

from bladebit.

harold-b avatar harold-b commented on July 29, 2024

Should be fixed by #25. Pending tests.

from bladebit.

harold-b avatar harold-b commented on July 29, 2024

The fix has now been merged to master. Thanks to the folks at Poolchia for giving me access to an instance stuck with this bug to bee able to debug it an issue a fix.

from bladebit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.