Comments (12)
WHY does it still make a temp file on the drive when plotting in ram?
from bladebit.
it writes to the final plot file during the plotting process. so the final write time takes less. as it can commit some of it as it goes.
from bladebit.
Any ideas regarding the crashing of higher then c1 plots, and being invalid even though the log says it is completed?
from bladebit.
No one I've got one. It's just poorly poorly optimized read somewhere that bladebit does not like 2 numa nodes. Did 1 it worked at first, but still crashed after 2 invalid plots. All ramplots are invalid. C0 through C5. Even diskplot: cheap nvme read ±400 write 750MB 121 min. Better nvme ± 700 read ±1.1 to 1.4GB a second 62 min so that scales fairly decent. On to a datacenter stripe array that reads 5.5 GB a second and writes around 7GB (they are sustained write optimized) it takes 76 minutes. There is absolutely no rime nor reason for these times. Get the gnarly feeling that it is mostly AMD optimized. Could be wrong but other diskplotters with similar read n write that are AMD based hit sub to low 20 min. Some even faster....
The thought creeps up on me that in their haste to catch up in the cuda compressing arms race they are focusing on the cuda plotters optimisation, and leading the disk and ramplotters out to pasture. Or bluntly stated left standing with their pants around their ankles. Then again that's my 2 cents worth of experience so far. Looks like I and others like me are forced to cuda plotting......
Don't think that this is on par with the original Chia mindset
from bladebit.
No one I've got one. It's just poorly poorly optimized read somewhere that bladebit does not like 2 numa nodes.
This is incorrect. bladebit's ramplot
I believe is the only plotter explicitly coded w/ NUMA in mind. You might have to choose the best settings for high memory throughput on your BIOS. ramplot
has not changed since its first versions w/ users running them for months at a time with no crashes, so its the most stable out of the 3 variants.
The only things that changed in the latest v3 were minor things to accommodate for compressed plotting. There could potentially be a bug there in phase 3, but we never encountered any during testing. If you can provide plot id's and compression levels for the plots you created that were invalid I could reproduce them locally to see if I encounter them as well. If you have some full logs of ramplot's faulty plots please post a few as well and I can take a look
from bladebit.
Are the corrupt ones only in Windows, by the way?
from bladebit.
Yes sir they are windows based. Where could I find these, do you mean the actual id's of the plots. If so I will post them. As for logs they just crap out at the start at first and after reading about the numa dislike i changed it to no more than 16 threads, thus using 1 numa node and that worked at first.... later it produces no more than 2 or 3 plots before crashing. Thats just it the logs state completed succesfully no errors. But the farmer node does not recognise them as valid.
And on 2 plotters the 1st ran diskplot for days with the I think chia 2.0 without any problems. But after updating to the 2.1 and later the 2.1.1 even my original plotter crashes on diskplot( which ran stable as a rock on the 2.0 version).
from bladebit.
Ok i think i got the not opening compressed plots sorted. was my bad and not the harvester the decrompression was turned of. I discovered this after reading the debug logs. started a series of 10 C5. 1st C5 did seize up at start and never got to allocating buffers. Deleted it and it suprisingly started with 2nd plot... Got a autoremover so i hope it will krank out 10 plots, o on sorry 9 the first 1 did choke upon start
from bladebit.
So it is a hit and a miss. Never makes it past third plot, makes 1 delete the next where it keeps hanging and then it will start the next. It is not set and forget but rather set and forget about it. as if plotting on its own when automated did not take long enough, i now have to keep an eye out for when it crashes, because it will!! and run all night burning electric on a plot that is never going to finish. Maybe code in sort of a counter that exits current plot when phase time equals or exceeds number x and then start with next in the batch. It's a fairly safe assumption when a phase or rather a subset lets say propagating, sorting or computing fx in the phase is not completed after say 1800 seconds it most likely never will.
could be as simple as
timer.start
if timer.count = or >= exit...
then
timer.stop
else....
and start the next itteration
you most likely will have to define a sub routine or a function for counter this will be a 1 time definition that you can call upon anytime its needed, it most defenitly will put some overhead on the proces but i think a counter function or sub will be negligible, compared to the alternative running a routine all night that is never going to finish . And i know for a fact that it is a problem that more users have when the plotter craps out. It would be quite useful if you have the knowledge that when plotting. in the event that it does crash. The program wil stop clean up after itself and goes about it's business plotting. This can be applied linear to ram, disk and cuda plotting alike I think. And quite frankly I think my old coding teacher would have killed me for overlooking such an obvious single point of failure, and not code in a safety to handle it.
I am not trying to attack you, but rather provide a maybe fresh view, and think constructively on how to further the efficiency and quality.
I will put my money where my mouth is and pull the source code and make an attempt to put in this I think quite simple function and put it up for review by more qualified people.
i do not know which language you used but you get the intention.
from bladebit.
Ok it plowed through 7 plots un interrupted. Thats a plus. Now i can realish into gpu plotting, as my riser and bracket for my server finaly got delivered in the mail this afternoon. But still going to check out if i can add a boundry of some sorts to define scope of "normal" operating parameters. Still think that 30 minutes for allocating buffers and resources will more then suffice for atleast ram and cudaplotting. No need to reinvent the wheel, that is well beyond the scope of my coding capabilities and lets be honest it is a very well designed wheel already after my first peak into the code. Even the temp cleanup is already arranged in the code i think, because it litteraly is done at the end.
from bladebit.
After careful reading your reply stating it is the most stable out of the 3( so you do know there are issues with the 3), is like spinning 3 spinning tops a b c. A 100mph b 75 mph c 50 mph and stating a is the most stable( the longest), which it is due to higher rpm and the gyroscopic effect. But the end result wil be the same for all 3. Lay on their side.....
from bladebit.
That said gpu plotter crashed after 1 plot out of 10. After deleting it did finish. On go 25.. crashes after 16. Luckily I caught it and after checking the logs from the mover I concluded that it hanged on allocating resources for over 45 or should I say only 45 minutes.
Off go 100 alternating c1 through c5 and hybrid disk. So 1 c1 ram 1 c1 hybrid. 1 c2 ram 1 c2 hybrid and so on. To be continued...
from bladebit.
Related Issues (20)
- CUDA driver version is insufficient for CUDA runtime version HOT 2
- Simulate command missing -d --device parameter yielding error HOT 1
- STDERR: CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered. HOT 3
- BBCuda stops: waiting for parks buffer and plot size keeps growing
- help info missing for "check" --- bladebit_cuda help check HOT 2
- Centos7.9 Building Error "fatal error: sys/random.h: No such file or directory" HOT 3
- 3.1.0 - Invalid Plots generated using --disk16 parameter (Ubuntu 22.04)
- 3.1.0 - low-proof Plots (WSL) generated using --disk16 HOT 1
- cuda plot Error STDERR: CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered. HOT 14
- Slow bladebit 3.1.0 writing on hdd disk and funny dependence on Ubuntu HOT 3
- 128GB mode Failed to Write Slice Error win11 only occurs on SSDs HOT 21
- win 10 wsl bladebit_cuda "cudaErrorMemoryAllocation : out of memory" although card has enough memory HOT 10
- CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered HOT 2
- bladebit-cuda-v3.1.0-rc1-ubuntu-x86-64 second plot much slower than first one.
- bladebit-cuda-v3.1.0-windows-x86-64 very slowly HOT 12
- when will compression in diskplotting be available? HOT 3
- Bladebit 3.1 very slow GPU plot
- FreeBSD support
- cudaErrorMemoryAllocation : out of memory in 16G mode HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bladebit.