Code Monkey home page Code Monkey logo

mmx-binaries's People

Contributors

felixbrucker avatar madmax43v3r avatar mwpastore avatar scrutinously avatar stotiks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mmx-binaries's Issues

Windows version always exits after [P2] Setup took ... sec

Hi @madMAx43v3r, thanks for making this great piece of software again.

I just wanted you to know that I've tried to run it on windows and on all k sizes and all stream sizes it crashes after [P2] setup...
Toggling DirectIO also does not change anything.
No errors, here is how it looks like:
image
In logs it looks like this terminal problem:
image

My setup:
Win 10 22H2 19045.2486
AMD Ryzen 9 3900X
128 GB DDR4 3200
RTX 3080 Ti 12 GB
Can you suggest what to troubleshoot further?

Gigahorse 1.1.8-40a8b16 3060Ti: sporadic WritePark(): ans_length (857) > max_ans_length (856) (y = 0, i = 465)

Hi,

I got this message once, other runs worked fine (several runs with S2, S3 and S4)
43 WritePark(): ans_length (857) > max_ans_length (856) (y = 0, i = 465)

Machine:Z8G4 GPU:3060Ti 8GB GDDR6X OS:Linux

Is this a known problem? What does it mean?

    1	Chia k32 next-gen CUDA plotter - 40a8b16
     2	Network Port: 11337 [MMX] (unique)
     3	Direct IO: disabled
     4	No. Streams: 2
     5	Final Directory: ./
     6	Shared Memory limit: unlimited
     7	Number of Plots: 1
     8	Initialization took 0.145 sec
     9	Crafting plot 1 out of 1 (2023/01/26 14:52:19)
    10	Process ID: 1093
    11	Pool Puzzle Hash:  xxx
    12	Farmer Public Key: xxx
    13	Working Directory:   ./
    14	Working Directory 2: @RAM
    15	Compression Level: C1 (xbits = 15, final table = 3)
    16	Plot Name: plot-mmx-k32-c1-2023-01-26-14-52-xxx
    17	[P1] Setup took 0.372 sec
    18	[P1] Table 1 took 21.398 sec, 4294967296 entries, 16789591 max, 66680 tmp, 0 GB/s up, 3.97235 GB/s down
    19	[P1] Table 2 took 18.113 sec, 4294787100 entries, 16788254 max, 66636 tmp, 1.76669 GB/s up, 2.81567 GB/s down
    20	[P1] Table 3 took 42.174 sec, 4294630088 entries, 16790836 max, 66772 tmp, 1.13809 GB/s up, 2.82165 GB/s down
    21	[P1] Table 4 took 30.958 sec, 4294142118 entries, 16787042 max, 66708 tmp, 2.58394 GB/s up, 3.84393 GB/s down
    22	[P1] Table 5 took 23.029 sec, 4293252612 entries, 16780269 max, 66603 tmp, 3.47321 GB/s up, 4.42921 GB/s down
    23	[P1] Table 6 took 19.32 sec, 4291374057 entries, 16776826 max, 66601 tmp, 3.31131 GB/s up, 4.3996 GB/s down
    24	[P1] Table 7 took 10.617 sec, 4287652952 entries, 16760702 max, 66558 tmp, 4.51727 GB/s up, 4.40334 GB/s down
    25	Phase 1 took 166.124 sec
    26	[P2] Setup took 0.214 sec
    27	[P2] Table 7 took 2.795 sec, 11.4295 GB/s up, 0.190072 GB/s down
    28	[P2] Table 6 took 2.8 sec, 11.419 GB/s up, 0.189732 GB/s down
    29	[P2] Table 5 took 2.801 sec, 11.4199 GB/s up, 0.189664 GB/s down
    30	[P2] Table 4 took 2.801 sec, 11.4223 GB/s up, 0.189664 GB/s down
    31	Phase 2 took 11.558 sec
    32	[P3] Setup took 0.241 sec
    33	[P3] Table 3 LPSK took 6.605 sec, 3439230328 entries, 14275581 max, 56418 tmp, 4.92487 GB/s up, 7.72146 GB/s down
    34	[P3] Table 3 NSK took 7.284 sec, 3439230328 entries, 13449078 max, 56418 tmp, 5.27683 GB/s up, 7.88047 GB/s down, 778 / 947 ANS bytes
    35	[P3] Table 4 PDSK took 6.63 sec, 3464822977 entries, 13554146 max, 53915 tmp, 4.90575 GB/s up, 7.05132 GB/s down
    36	[P3] Table 4 LPSK took 6.88 sec, 3464822977 entries, 13860614 max, 55428 tmp, 8.88369 GB/s up, 7.41283 GB/s down
    37	[P3] Table 4 NSK took 7.539 sec, 3464822977 entries, 13551282 max, 55078 tmp, 5.13628 GB/s up, 7.61392 GB/s down, 810 / 854 ANS bytes
    38	[P3] Table 5 PDSK took 6.184 sec, 3530630783 entries, 13817287 max, 54853 tmp, 5.25849 GB/s up, 7.55987 GB/s down
    39	[P3] Table 5 LPSK took 6.949 sec, 3530630783 entries, 14247963 max, 57008 tmp, 8.91994 GB/s up, 7.33922 GB/s down
    40	[P3] Table 5 NSK took 7.607 sec, 3530630783 entries, 13803591 max, 56652 tmp, 5.18705 GB/s up, 7.54586 GB/s down, 809 / 853 ANS bytes
    41	[P3] Table 6 PDSK took 6.371 sec, 3709537729 entries, 14508681 max, 57630 tmp, 5.10194 GB/s up, 7.33798 GB/s down
    42	[P3] Table 6 LPSK took 7.173 sec, 3709537729 entries, 15086020 max, 60444 tmp, 8.96526 GB/s up, 7.11003 GB/s down
    43	WritePark(): ans_length (857) > max_ans_length (856) (y = 0, i = 465)
    44	[P3] Table 6 NSK took 7.942 sec, 3709537729 entries, 14503701 max, 59903 tmp, 5.22001 GB/s up, 7.22757 GB/s down, 805 / 857 ANS bytes
    45	[P3] Table 7 PDSK took 7.109 sec, 4287652952 entries, 16773863 max, 66558 tmp, 6.1788 GB/s up, 6.57621 GB/s down
    46	[P3] Table 7 LPSK took 7.897 sec, 4287652952 entries, 17198764 max, 68807 tmp, 9.06208 GB/s up, 6.45818 GB/s down
    47	[P3] Table 7 NSK took 8.66 sec, 4287652952 entries, 16760702 max, 68226 tmp, 5.53329 GB/s up, 6.62833 GB/s down, 792 / 837 ANS bytes
    48	Phase 3 took 101.182 sec
    49	[P4] Setup took 0.071 sec
    50	[P4] total_p7_parks = 2093581
    51	[P4] total_c3_parks = 428765, 2385 / 2452 ANS bytes
    52	Phase 4 took 4.962 sec, 6.43803 GB/s up, 3.68469 GB/s down
    53	Total plot creation time was 283.916 sec (4.73194 min)
    54	Flushing to disk took 9.294 sec

Plots created with compression are not recognized in mmx-node

Hi @madMAx43v3r,
am I doing something wrong or is mmx-node not ready to utilize mmx plots with compression generated with cuda plotter?
Failed to load plot 'G:plot-mmx-k32-c8-2023-01-24-15-51-3bf9f629f184706d2bf70b32f948029f4a4027057face2b06e65508fbacae7d2.plot' due to: Invalid plot file format
Plot generation was completed successfully on Windows.
BTW. Latest (v0.9.6) MMX-NODE used.

Gigahorse 1.1.8-ff505a5 does not run on M4000 with Cuda5.2: after phase1, terminate called, what(): invalid device ordinal , signal 6

Hi Max,

on Quadro M4000 it terminates with an error.

The Readme says:
All GPUs for compute capability 5.2 (Maxwell 2.0), 6.0, 6.1 (Pascal), 7.0 (Volta), 7.5 (Turing) and 8.0, 8.6, 8.9 (Ampere).
Which includes: GTX 1000 series, GTX 1600 series, RTX 2000 series, RTX 3000 series and RTX 4000 series

According to this: https://developer.nvidia.com/cuda-gpus
Cuda 5.2 cards are: GTX 900 series and Quadro M series

Can you please check / enable Cuda 5.2 compatibility?

     1	bf328434cd57a12ae38d0c41aa266a57  cuda_plot_k26_e4ed5f
     2	Chia k26 next-gen CUDA plotter - ff505a5
     3	Plot Format: v2.4
     4	Network Port: 11337 [MMX] (unique)
     5	No. GPUs: 1
     6	No. Streams: 4
     7	Final Destination: ./
     8	Shared Memory limit: unlimited
     9	Number of Plots: 1
    10	Initialization took 0.143 sec
    11	Crafting plot 1 out of 1 (2023/01/28 10:36:18)
    12	Process ID: 27
    13	Pool Puzzle Hash:  xxx
    14	Farmer Public Key: xxx
    15	Working Directory:   ./
    16	Working Directory 2: @RAM
    17	Compression Level: C1 (xbits = 15, final table = 3)
    18	Plot Name: plot-mmx-k26-c1-2023-01-28-10-36-xxx
    19	[P1] Setup took 0.059 sec
    20	[P1] Table 1 took 1.051 sec, 67108864 entries, 1051360 max, 16884 tmp, 0 GB/s up, 0.594686 GB/s down
    21	[P1] Table 2 took 1.263 sec, 67083606 entries, 1049983 max, 16882 tmp, 0.395883 GB/s up, 0.680436 GB/s down
    22	[P1] Table 3 took 1.031 sec, 67026740 entries, 1048853 max, 16818 tmp, 0.666577 GB/s up, 1.89441 GB/s down
    23	[P1] Table 4 took 1.12 sec, 66915113 entries, 1047733 max, 16856 tmp, 0.9475 GB/s up, 1.67412 GB/s down
    24	[P1] Table 5 took 1.091 sec, 66704641 entries, 1044894 max, 16759 tmp, 0.971065 GB/s up, 1.50379 GB/s down
    25	[P1] Table 6 took 1.033 sec, 66288742 entries, 1038189 max, 16731 tmp, 0.841945 GB/s up, 1.36134 GB/s down
    26	[P1] Table 7 took 0.48 sec, 65450408 entries, 1025813 max, 16545 tmp, 1.41479 GB/s up, 1.62764 GB/s down
    27	Phase 1 took 7.19 sec
    28	terminate called after throwing an instance of 'std::runtime_error'
    29	  what():  invalid device ordinal
    30	Command terminated by signal 6
    31	2.42user 5.55system 0:09.16elapsed 87%CPU (0avgtext+0avgdata 8366844maxresident)k
    32	21144inputs+96outputs (5major+2082253minor)pagefaults 0swaps
    33

P.S. 
The previous version showed another error behaviour:
Chia k26 next-gen CUDA plotter - 40a8b16
...
   48	Phase 1 took 4.939 sec
    49	[P2] Setup took 0.004 sec
    50	[P2] Table 7 took 0.051 sec, 9.11458 GB/s up, 0.16276 GB/s down
    51	[P2] Table 6 took 0.051 sec, 9.11458 GB/s up, 0.16276 GB/s down
    52	[P2] Table 5 took 0.05 sec, 9.29688 GB/s up, 0.166016 GB/s down
    53	[P2] Table 4 took 0.051 sec, 9.11458 GB/s up, 0.16276 GB/s down
    54	Phase 2 took 0.209 sec
    55	[P3] Setup took 0.041 sec
    56	[P3] Table 3 LPSK took 0.107 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 5.04253 GB/s up, 8.03168 GB/s down
    57	WritePark(): ans_length (65535) > max_ans_length (1022) (y = 0, i = 0)
    58	WritePark(): ans_length (57343) > max_ans_length (1022) (y = 0, i = 1)
...
Command terminated by signal 11

[Fixed] mmx-cuda-plotter doesn't back off when new plot is finished, but rather starts a second plot xfr per HD

I started mmx-cuda-plotter with 4 target drives:

mm/cuda_plot_k32_f24 -n -1 -x 11337 -C 7 -t /mnt/nvme/mmx/tmp1/ -d /mnt/d1/mmx/plots/ -d /mnt/d2/mmx/plots/ -d /mnt/d3/mmx/plots/ -d /mnt/d4/mmx/plots/ -f fff -p ppp

Looks like that at some point, writes to destination HDs slowed down, and when a new plot was finished, instead of letting it sit on the NVMe, the plotter started another write to the drive that was already being written to. This is basically killing such drive write performance, as such, those writes start to snowball right now.

Would it be possible to have the new plot sit on NVMe (yes, eventually choking NVMe), as that could potentially help those drives to recover.

Here is what I see right now:

-rw-rw-r-- 1 bull bull  56G Feb  5 14:33 /mnt/d1/mmx/plots/plot-mmx-k32-c7-2023-02-05-14-15-e9c8f895c2cb3a17d45d9c81f8ff74337906e0a2517d1c2ac07179a6dbbec090.plot.tmp
-rw-rw-r-- 1 bull bull 8.8G Feb  5 14:33 /mnt/d1/mmx/plots/plot-mmx-k32-c7-2023-02-05-14-27-8323f3b96a9cc09b172e7dcdd50780ff78cb2a8c978a6b8dad2f1e60d5814fa1.plot.tmp
-rw-rw-r-- 1 bull bull  47G Feb  5 14:33 /mnt/d2/mmx/plots/plot-mmx-k32-c7-2023-02-05-14-18-06db62c80f193dbe9e5f9a9c8acc7e8d7d69276c77f43e1e7148a20d4f5e6c14.plot.tmp
-rw-rw-r-- 1 bull bull  75G Feb  5 14:33 /mnt/d3/mmx/plots/plot-mmx-k32-c7-2023-02-05-14-09-a56b5de62045ce8981badc876cab9d9d21b966f6cc806e3d790a844b4bcff8a3.plot.tmp
-rw-rw-r-- 1 bull bull  30G Feb  5 14:33 /mnt/d3/mmx/plots/plot-mmx-k32-c7-2023-02-05-14-21-194df4e3a53e12dc4dde1df68daf8df32ce971025dda3b959c07e252570be269.plot.tmp
-rw-rw-r-- 1 bull bull  41G Feb  5 14:33 /mnt/d4/mmx/plots/plot-mmx-k32-c7-2023-02-05-14-24-6646093fd7a0e5255e2ebc90f22db8b1fe9ce3276cabff8ff6e17cedd5fd9078.plot.tmp

There are 2 writes on d1 and d3.

Actually, plot-sink has right now a nice feature that permits to add "overflow" HDs that could be connected if needed. It would be really nice if that feature would be also available in cuda-plotter.

Out of memory

A there any minimum requirements? I am getting "Out of memory"
I tried on Core i5, 16GB Ram, and GTX1650 super. I also wish to try with RX6600XT but not sure if AMD is supported.

cuda_plot Feature Request: a -2 for every -g

In partial RAM mode, with multiple GPUs, it might help performance to be able to set a -2 for each GPU (placed by the operator in the same NUMA node as the GPU).

How much it might help would depend on how much cross-traffic there is between GPUs and the temporary files they're reading and writing. If each GPU works mostly independently, reading and writing its own temporary files, then this feature might help a lot. If the inputs and outputs of each GPU are being split and merged over and over again throughout the various phases and tables, it might not help at all.

Node becomes 'Disconnected from Node'

This is happening while moving plots over 1G lan network.
Furthermore after completed file transfer, node is not recovering from this situation, and hanging on this:
image

Potential enhancements to cuda-plot and plot-sink combo

I started plotting last night to 4x 10TB drives. Around 5TB in, plot-sink crashed. However, plotter was fine, as it was waiting for some free space on NVMe as well as complaining that it cannot talk to any plot-sink.

The box that I am using is Ubuntu 22.10, Dell t7610 - dual Xeon e5-2600 v2, currently depopulated to just one CPU and 256 GB RAM (to save on power draw, as I have 2695 v2). (I will be getting a couple of low power v2 CPUs later this week.) On the GPU side I have 3060 Ti.

The initial disk transfer speed (yesterday) was ~200 MBps, so basically max 3 HDs were used at a time. Today, when I got a new plot-sink running (without stopping the plotter or rebooting the box) they were around 100 MBps, but once the NVMe got cleared they are closer to 120-130 MBps right now (still kind of low).

Unfortunately, I don't know what triggered plot-sink crash, and am not sure (yet?) why the write speed dropped that much (those are WD Red Pro drives). One culprit could be the NVMe, but I cannot say it for sure.

I would like to ask to consider two enhancements for cuda-plotter:

  1. Be able to specify two temp folders (e.g., -t /mnt/nvme1/ -t /mnt/nvme2/). This way those NVMes may potentially completely go out of the critical path of writing/reading plots.
  2. Use extra RAM (if available) as the first level buffer before writing finished plots to NVMe(s) or passing them to plot-sink directly (so NVMe will go out of the loop). Assuming that the box has 512 GB RAM, that would allow to potentially reduce at least 75% of NVMe reads / writes, both saving NVMes and removing extra IO cycles.

On Windows running MMX node makes system unresponsive for some milliseconds of time.

Every few seconds system reaches unstable point, where everything gets frozen for some short milliseconds of time.
It wouldn't be of a big deal, but the system is not responsive enough to catch all letters typed (sometimes letters are wiped out).
So I believe it might be a huge problem for people running MMX node on their own systems for daily work...

Tested on:
i9-10900x, 96 GB RAM DDR4 3200 MHz
GPU: GTX 980 Ti 6 GB RAM
System drive: Samsung 980 PRO
All plots on hdds
TimeLord disabled

MMX node v0.9.8

plots check

Will the plots check function be introduced as in Gigahorse. It is not reasonable to check every raft through ProofOfSpace.exe. Thank you.

[Fixed] plot-sink possibly stuck on selecting the next drive, if the drive to be first selected doesn't have expected folder structure

I was trying to do disk set swap to add new target drives, and it looks like the plot-sink got stuck on unmounted folders.

I started with d1-d4 (first set) being the active target drives, and d5-d8 not mounted folders (still provided as target folders at startup) but having the proper folder structure and do-not-use file there. All was working fine. When the first set was close to be topped, I mounted 4 overflow drives in d5-d8 structure. Plot-sink nicely transitioned from set one to set two once drives from set one were topped. I unmounted all drives from the first set and started removing them from the cage. I have not noticed that the plot-sink got stuck on trying to use drives from the first set, where it still had available drives in the second set. It kept issuing the following message:

accept() failed with: filesystem error: cannot get free space: No such file or directory [/mnt/d2/mmx/plots/]

However, was not skipping d2 (in this case) folder, but rather aborting the target selection process. I noticed that I forgot to create the proper folder structure on the mount folders from the first set. Once I added that folder structure to a drive, the plot-sink immediately picked up such drives. However, when I started to do that, all but 1 drive from the overflow set were basically idling, where the plotter was busy pumping new plots.

Would it be possible to just ignore such target, and move to check the next one in the line?

Support for CUDA Compute Capability 8.0

Hi there,

In your readme, you explicitly mention that it supports "All GPUs for compute capability 6.1 (Pascal), 7.5 (Turing) and 8.6 (Ampere)."
Can you also compile it against Compute Capability 8.0? It is Ampere, but I suspect that 8.6 is not backward compatible with 8.0.

I got Floating Point Exception when trying to run your plotter on device with Compute Capability 8.0.
Probably because I get total_p7_parks = 0 during Phase 4 as shown below.

Chia k32 next-gen CUDA plotter - 7becb2a
Network Port: 11337 [MMX] (unique)
Direct IO: disabled
Final Directory: /home/bagus/chia/plots/
Number of Plots: 2
Initialization took 0.927 sec
Crafting plot 1 out of 2 (2023/01/17 22:07:20)
Process ID: 73210
Pool Public Key:   8cfec5f0915d0b06026905ef3f86d86bc94a3b43a5f6e06d1c0ab1957985ca64205dd3bd888f2d51cb3ba2118dec33ee
Farmer Public Key: 96416acb6d03333c677cc048d3a008f1748cc33386364110374770fa3a37b026380da15a9d506fcbfcb90e91b2510f9d
Working Directory:   /raid/bagus/
Working Directory 2: @RAM
Compression Level: C7 (xbits = 9, final table = 4)
Plot Name: plot-mmx-k32-c7-2023-01-17-22-07-666da885db995045c1c3dc1eeec0c4525851897374e59066066199741d340952
[P1] Setup took 1.064 sec
[P1] Table 1 took 3.521 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.1409 GB/s down
[P1] Table 2 took 2.097 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3206 GB/s down
[P1] Table 3 took 3.492 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3414 GB/s down
[P1] Table 4 took 5.059 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3626 GB/s down
[P1] Table 5 took 4.186 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.367 GB/s down
[P1] Table 6 took 3.491 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3484 GB/s down
[P1] Table 7 took 1.922 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3237 GB/s down
Phase 1 took 25.259 sec
[P2] Setup took 0.156 sec
[P2] Table 7 took 0.001 sec, 0 GB/s up, 531.25 GB/s down
[P2] Table 6 took 0.225 sec, 0 GB/s up, 2.36111 GB/s down
[P2] Table 5 took 0.24 sec, 0 GB/s up, 2.21354 GB/s down
Phase 2 took 0.851 sec
[P3] Setup took 0.666 sec
[P3] Table 4 LPSK took 2.099 sec, 0 entries, 0 max, 0 tmp, 0.253097 GB/s up, 24.2974 GB/s down
[P3] Table 4 NSK took 2.361 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3123 GB/s down, 0 / 0 ANS bytes
[P3] Table 5 PDSK took 1.924 sec, 0 entries, 0 max, 0 tmp, 0.276117 GB/s up, 24.2985 GB/s down
[P3] Table 5 LPSK took 2.096 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3322 GB/s down
[P3] Table 5 NSK took 2.36 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3226 GB/s down, 0 / 0 ANS bytes
[P3] Table 6 PDSK took 1.925 sec, 0 entries, 0 max, 0 tmp, 0.275974 GB/s up, 24.2858 GB/s down
[P3] Table 6 LPSK took 2.096 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3322 GB/s down
[P3] Table 6 NSK took 2.361 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3123 GB/s down, 0 / 0 ANS bytes
[P3] Table 7 PDSK took 1.923 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3111 GB/s down
[P3] Table 7 LPSK took 2.095 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.3438 GB/s down
[P3] Table 7 NSK took 2.362 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 24.302 GB/s down, 0 / 0 ANS bytes
Phase 3 took 24.546 sec
[P4] Setup took 0.205 sec
[P4] total_p7_parks = 0
Floating point exception (core dumped)

Thanks!

[-t was and is equivalent to -stage, so no need for this implementation] Add "stage" destination to cuda-plotter

Would it be possible to add stage destination to cuda-plotter? Yes, it would increase the write load on NVMes, but would potentially quickly offload the stage drive from finished plots making writes to destination drives a little faster.

My box is Dell t7610 with one e5-2695 v2 chip (removed the second to lower the power consumption) and Samsung 970 EVO Plus. When I run the plotter where both -t and -d are on the same NVMe, I am getting plot times just a bit over 140 secs. On the other hand, when -d are for example 4 drives (10 TB WD Red Pro), those plots are taking a bit over 180 secs. Potentially using a "stage" folder would decrease that 40 secs a bit.

-d tag in plotter

it would be very easy if we had -d1 -d2 -d3 .. parameters. (not only 1 d) it would be easier to use then file distributing things.
so a basic loop which starts from d1 and after every plot generation d = d + 1 ..
easily we could plot from single command line to many hdds.

closing the program causes a critical system error (ucrtbase.dll) in the system stability log

изображение

изображение

Источник
mmx_node

Сводка
Остановка работы

Дата
‎01.‎05.‎2023 16:18

Состояние
Отчет отправлен

Описание
Неправильный путь приложения: D:\MMX\mmx_node.exe

Сигнатура проблемы
Имя проблемного события: BEX64
Имя приложения: mmx_node.exe
Версия приложения: 0.10.1.0
Отметка времени приложения: 644a033c
Имя модуля с ошибкой: ucrtbase.dll
Версия модуля с ошибкой: 10.0.22621.608

Отметка времени модуля с ошибкой: f5fc15a3
Смещение исключения: 000000000007f61e
Код исключения: c0000409
Данные исключения: 0000000000000007
Версия ОС: 10.0.22621.2.0.0.768.101
Код языка: 1049
Дополнительные сведения 1: 0000
Дополнительные сведения 2: 00000000000000000000000000000000
Дополнительные сведения 3: 0000
Дополнительные сведения 4: 00000000000000000000000000000000

Дополнительные сведения о проблеме
ИД контейнера: 462bdd9e46e0172b7ff554a2c57b0ade (2302839842544487134)

Gigahorse-1.1.8-9d66b86: GPU address limit 1TB/40bit problem: instance of 'std::runtime_error, signal 6, swiotlb buffer is full, NVRM: Failed to create a DMA mapping!

Hi,

on 1.5TB machine K34 plots do not work (GPU: Quadro M6000).
I believe this is due to GPU 40bit address limit.

Here it is mentioned:
https://learn.microsoft.com/en-us/windows-hardware/drivers/display/iommu-dma-remapping
" This page describes the IOMMU DMA remapping feature that was introduced in Windows 11 22H2 (WDDM 3.0).
...
Upcoming servers and high end workstations can be configured with over 1TB of memory which crosses the common 40-bit address space limitation of many GPUs."

So it seems while Windows 22H2 can handle it, in Linux it can be a problem (kernel 4.18.0-425.10.1.el8_7).

Also note increasing swiotlb delays the termination to the 2nd plot, but even the 1st plot
might be corrupt as there are ten thousand (!) of such (and other) messages:

    200 park_delta(): LP_1 < LP_0 (1875189930, 18446744073709551615) (x = 1348, y = 6770)
    201 park_delta(): LP_1 < LP_0 (2255150717, 18446744073709551615) (x = 1351, y = 6770)
    202 park_delta(): LP_1 < LP_0 (1891597597, 18446744073709551615) (x = 1353, y = 6770)
    203 park_delta(): LP_1 < LP_0 (1267797774, 1891597597) (x = 1354, y = 6770)
    204 park_delta(): LP_1 < LP_0 (2922005224, 3001459040) (x = 1356, y = 6770)
    205 park_delta(): LP_1 < LP_0 (30753450, 2922005224) (x = 1357, y = 6770)

Furthermore these messages also appear for K32 after some successful plots when running "-n -1",
so while that did not terminate it might also produce corrupt plots.

This workstation has a BIOS option "1TB Memory Cap":

If 1 TB of memory is installed, limits useable memory to 1TB-64MB for compatibility with graphics cards that
can`t address 1TB or more of memory".

I will try that next.

Logs

    46 Chia k34 next-gen CUDA plotter - 9d66b86
     47 Plot Format: v2.4
     48 Network Port: 11337 [MMX] (unique)
     49 No. GPUs: 1
     50 No. Streams: 4
     51 Final Destination: ./
     52 Shared Memory limit: unlimited
     53 Number of Plots: 5
     54 Initialization took 0.106 sec
     55 Crafting plot 1 out of 5 (2023/02/01 16:36:20)
     56 Process ID: 1993
     57 Pool Puzzle Hash:  xxx
     58 Farmer Public Key: xxx
     59 Working Directory:   ./
     60 Working Directory 2: @RAM
     61 Compression Level: C1 (xbits = 15, final table = 3)
     62 Plot Name: plot-mmx-k34-c1-2023-02-01-16-36-xxx
     63 [P1] Setup took 0.894 sec
     64 [P1] Table 1 took 77.857 sec, 17179869184 entries, 16789935 max, 17020 tmp, 0 GB/s up, 2.31198 GB/s down
     65 [P1] Table 2 took 143.196 sec, 17179636764 entries, 16790768 max, 17010 tmp, 1.00561 GB/s up, 1.81572 GB/s down
     66 [P1] Table 3 took 319.245 sec, 17178866356 entries, 16787901 max, 16960 tmp, 0.651528 GB/s up, 1.8168 GB/s down
     67 terminate called after throwing an instance of 'std::runtime_error'
     68   what():  OS call failed or operation not supported on this OS
     69 Command terminated by signal 6
     70 223.30user 287.34system 10:14.91elapsed 83%CPU (0avgtext+0avgdata 541219792maxresident)k

This can be seen with "dmesg -T" or /var/log/messages:

61238 Feb  1 17:45:35 m8 kernel: nvidia 0000:2d:00.0: swiotlb buffer is full (sz: 4194304 bytes), total 32768 (slots), used 0 (slots)
61239 Feb  1 17:45:35 m8 kernel: NVRM: 0000:2d:00.0: Failed to create a DMA mapping!

With some experiment I also got this failures mode:

Feb  1 22:43:28 m8 kernel: NVRM: GPU 0000:2d:00.0: RmInitAdapter failed! (0x25:0x65:1457)
Feb  1 22:43:28 m8 kernel: NVRM: GPU 0000:2d:00.0: rm_init_adapter failed, device minor number 0

Related documentation:
https://lenovopress.lenovo.com/lp1467.pdf
An Introduction to IOMMU Infrastructure in the Linux Kernel

[Solved] plot-sink reading target drives from a file - enabling "hot swap"

Would it be possible to have plot-sink read target drives from the file before selecting the next drive to write to? This would let us to modify that file on the fly and remove full drives and add new ones to the target set.

Also, if possible, when the new drives would be added before the first batch is full, maybe the plot-sink could start using all drives (both from the old set and the new one), so this way the old set would be topped off, and the transition to the new set would be smoother. This could potentially help to overcome drives slowdown when close to being full.

Number of expected challenges a farm should receive

MOVED FROM CHIA-GIGAHORSE (posted there by mistke)

@madMAx43v3r, you have mentioned on Discord that the number of eligible plots per day should follow this rule:

"total number of eligible plots per 24h should be 9340 / PLOT_FILTER * number_of_plots"
assuming PLOT_FILTER is either 256 or 512 (tn10 vs tn9)

Using that formula, we can calculate that challenges are coming every ~9.25 sec (rounded to second digit after dec point).

Having that number, we can compare the number of received challenges by a harvester with the expected / calculated one. When we do that calculation, all harvester on 2 farmers always get about 7.5% less challenges than the expected number. Again, those calculations are done on two independent farms over the last month or two. Here are few such reports:

TN10:
F1/H1
---- challenges:
------ filter ratio: 256 : 1, plots efficiency: 99.92%
------ challenges: expected: 5,387.1, received: 4,982 (eligible: 4982, empty: 0), total eligible plots: 35,001
---------- ⚠️ Missed challenges: 405.1, percentage: 7.52%
F2/H2
---- challenges:
------ filter ratio: 251 : 1, plots efficiency: 101.93%
------ challenges: expected: 2,096.0, received: 1,936 (eligible: 1936, empty: 0), total eligible plots: 12,758
---------- ⚠️ Missed challenges: 160.0, percentage: 7.63%
(one calculation was done over ~13 hrs, and the second over ~5 hrs, but also have same calculations over 24 hours, and there is basically no difference what period such report is for - as expected)

TN9
F2/H2
---- challenges:
------ filter ratio: 514 : 1
------ challenges: expected: 9339.5, received: 8631 (eligible: 8546, empty: 85), total eligible plots: 66129
---------- ⚠️ Missed challenges: 708.5, percentage: 7.59%

The only factor that is at play is how we calculate those received challenges. The code is counting harvester issued "XYZ plots were eligible for height ..." lines.

The percentage calculated hovers around 7.5% or so, so whatever mistake is being made is kind of consistent. Although, looking at calculated PLOT_FILTER, the differences from day to day are on the third digit, where here are on the second.

Any idea what potentially could be at play here? Maybe that 9340 number you provided in the original formula is off, or maybe those "missing" challenges are some sort of synchronizing ones, that farmer is not sending to harvesters?

Expected lookup times with compressed plots

I have added 4x 10TB drives with k32 C7 compressed plots to my node. As expected, the lookup times are longer, but not as stable as for non-compressed plots. What should we be roughly expecting to see for those lookups, and at which point we should consider those lookup times as taking too long (i.e., either upgrade the GPU or use lower compression).

Here is the most recent chart with lookup times. I added those 4 drives past midnight, and the chart changes sharply around that time.
image

The bottom band (roughly below 0.5 sec) are lookups for those non-compressed plots. I am not sure why there are 2 bands (around 1 sec and 1.6 sec) there. Also, not sure why are those random lookups there going up to 10 secs a couple of times. That harvester has ~400 TB worth of plots, so those 4 drives are rather a small fraction (thus less likely those variations are due to multiple eligible compressed plots). The chart potentially doesn't properly reflects the lookup distribution, as lookups in those banded areas are overlapping, thus hiding the density. If needed, I can also get percentage distribution per time slices.

Although, the good part is that no other stats got affected (same distribution for eligible plots, and VDF processing looks like rock solid with no changes whatsoever after adding those compressed plots.

Here for comparison is the lookup chart for only non-compressed plots:
image

Actually, one more question about VDF times. The node is running on k2200 for now (waiting for remote decompression to be able to use remote node / farmer to decompress plots as I am out of PCIe slots). Beside the fact that this card is not going to cut in the long run, do we understand why the card is kind of shifting VDF processing times from time to time. I didn't find any correlation between what else is happening on that node. It just switches at random. It looks as a significant shift on the chart, but is actually not, as the Y axis is just partial, thus emphasizing those small differences. It is just curiosity, not something that has much relevance.
image

Directory Structure for Windows

What is the right way to add temp directory in windows?
I tried the usual C:\Users\Temp and I got "Invalid tmpdir: C:\Users\Temp" and I also tried /mnt/disk0/ I got "Failed to write to tmpdir directory: '/mnt/disk0/'

cuda_plot keeps GPU x open when only GPU y is selected

When I select e.g. CUDA device 1 with e.g. -g 1, I can see in nvtop that cuda_plot keeps a process open on CUDA device 0. I'm not sure what impact this has, if any, but I can see some undesirable RMA in numatop and this is the most likely explanation in my case. I think it would be better if cuda_plot completely disengaged the CUDA device(s) not selected with -g.

Gigahorse 1.1.8-217b8ba sporadic park_delta(): LP_1 < LP_0 (2183940545018, 2199560346605), park_delta_split(): small_delta > 255 , ans_encode_sym(): index out of bounds: 32694, double free or corruption (!prev), signal 6

Hi,
when testing various C levels one run terminated with signal 6.
Rerunning with same parameters (K32 -C 3) it works fine.

Error message (also see below):
59 park_delta(): LP_1 < LP_0 (2183940545018, 2199560346605) (x = 501, y = 0)
60 park_delta_split(): small_delta > 255 (786) (x = 500, y = 0, stub = 26)
61 ans_encode_sym(): index out of bounds: 32694
62 double free or corruption (!prev)
63 Command terminated by signal 6

I would like to rerun with the same PlotID to see if this reproduces.
Is that possible or would this need a new (debug) option?
P.S. Can you explain what -Z/--unique does?

[root@287c4448ccc0 data1]# cat -n giga1169.cuda_plot_k32_834ee25.27247.out
     1	Sun Jan 29 10:12:46 2023       
     2	+-----------------------------------------------------------------------------+
     3	| NVIDIA-SMI 525.85.05    Driver Version: 525.85.05    CUDA Version: 12.0     |
     4	|-------------------------------+----------------------+----------------------+
     5	| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     6	| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     7	|                               |                      |               MIG M. |
     8	|===============================+======================+======================|
     9	|   0  Quadro M4000        Off  | 00000000:84:00.0 Off |                  N/A |
    10	| 82%   59C    P0    47W / 120W |      0MiB /  8192MiB |      1%      Default |
    11	|                               |                      |                  N/A |
    12	+-------------------------------+----------------------+----------------------+
    13	                                                                               
    14	+-----------------------------------------------------------------------------+
    15	| Processes:                                                                  |
    16	|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    17	|        ID   ID                                                   Usage      |
    18	|=============================================================================|
    19	|  No running processes found                                                 |
    20	+-----------------------------------------------------------------------------+
    21	    Product Name                          : Quadro M4000
    22	Sun Jan 29 10:12:47 UTC 2023
    23	8fbbd13e2321599f221f2f289b7316c2  cuda_plot_k32_834ee25
    24	Calling /cuda_plot_k32_834ee25 -c xxx -f xxx -x 11337 -t ./ -C 3
    25	Chia k32 next-gen CUDA plotter - 217b8ba
    26	Plot Format: v2.4
    27	Network Port: 11337 [MMX] (unique)
    28	No. GPUs: 1
    29	No. Streams: 4
    30	Final Destination: ./
    31	Shared Memory limit: unlimited
    32	Number of Plots: 1
    33	Initialization took 0.151 sec
    34	Crafting plot 1 out of 1 (2023/01/29 10:12:48)
    35	Process ID: 1177
    36	Pool Puzzle Hash:  xxx
    37	Farmer Public Key: xxx
    38	Working Directory:   ./
    39	Working Directory 2: @RAM
    40	Compression Level: C3 (xbits = 13, final table = 3)
    41	Plot Name: plot-mmx-k32-c3-2023-01-29-10-12-xxx
    42	[P1] Setup took 0.871 sec
    43	[P1] Table 1 took 21.87 sec, 4294967296 entries, 16791110 max, 66551 tmp, 0 GB/s up, 1.55465 GB/s down
    44	[P1] Table 2 took 49.297 sec, 4294867239 entries, 16789748 max, 66891 tmp, 0.649127 GB/s up, 1.03455 GB/s down
    45	[P1] Table 3 took 86.297 sec, 4294707539 entries, 16787530 max, 66630 tmp, 0.556206 GB/s up, 1.32971 GB/s down
    46	[P1] Table 4 took 92.214 sec, 4294364423 entries, 16784491 max, 66738 tmp, 0.867495 GB/s up, 1.29048 GB/s down
    47	[P1] Table 5 took 80.065 sec, 4293720114 entries, 16786710 max, 66611 tmp, 0.999048 GB/s up, 1.27397 GB/s down
    48	[P1] Table 6 took 67.752 sec, 4292403787 entries, 16780888 max, 66627 tmp, 0.944347 GB/s up, 1.25458 GB/s down
    49	[P1] Table 7 took 47.582 sec, 4289820387 entries, 16771351 max, 66486 tmp, 1.00818 GB/s up, 0.98252 GB/s down
    50	Phase 1 took 446.442 sec
    51	[P2] Setup took 0.469 sec
    52	[P2] Table 7 took 6.994 sec, 4.56987 GB/s up, 0.075958 GB/s down
    53	[P2] Table 6 took 6.212 sec, 5.14825 GB/s up, 0.08552 GB/s down
    54	[P2] Table 5 took 5.983 sec, 5.34693 GB/s up, 0.0887932 GB/s down
    55	[P2] Table 4 took 5.9 sec, 5.42297 GB/s up, 0.0900424 GB/s down
    56	Phase 2 took 25.827 sec
    57	[P3] Setup took 0.635 sec
    58	[P3] Table 3 LPSK took 7.099 sec, 3439413469 entries, 14270566 max, 56421 tmp, 4.01881 GB/s up, 7.18414 GB/s down
    59	park_delta(): LP_1 < LP_0 (2183940545018, 2199560346605) (x = 501, y = 0)
    60	park_delta_split(): small_delta > 255 (786) (x = 500, y = 0, stub = 26)
    61	ans_encode_sym(): index out of bounds: 32694
    62	double free or corruption (!prev)
    63	Command terminated by signal 6
    64	167.79user 116.78system 8:49.11elapsed 53%CPU (0avgtext+0avgdata 196450004maxresident)k
    65	0inputs+200outputs (0major+50260110minor)pagefaults 0swaps

chia-plot-sink,Suggestions for improvement.

The current rule is that the newly completed farmland is transferred to the specified directory through the LAN, but the previously completed farmland will not be processed.

Can it be modified to transfer the previously completed farmland to the specified directory?

I am the dialogue of software translation, which may not be smooth. I hope you can understand~

You are really good~~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.