Comments (3)
Dear Courtney,
from the log files, it seems like there is an issue with writing to the log files. This issue keeps the rest of the program from running.
One of the goals of HALFpipe is to have a single log file so that identifying and reporting errors is easy. However, this leads to a technical challenge when there are multiple nodes on a cluster trying to log to the same file. If, by chance, one node starts to write a log file entry while another one is already writing, then the two log entries may end up corrupted and unreadable.
To prevent this from happening, HALFpipe uses a simple library called fasteners. In brief, this library asks the file system to temporarily lock the log file so that only one process on one node can write at a time.
My hypothesis is that your cluster's file system software may not be compatible with these locks or may have some unique quirks in how it implements locking. As a result, the jobs end up deadlocked.
There are three possible solutions to the problem:
-
Solution A: Stop having a single log file, just have one per job/chunk. What do think, @ilyaveer ?
-
Solution B: Do some research on how it may be possible to create locks more robustly, for example as suggested by https://www.php.net/manual/en/function.flock.php#82521
-
Solution C: Work with the cluster administrators to try to find out what goes wrong when HALFpipe tries to lock files, and research if this may be fixable on their end. This may be difficult to coordinate, as I (the developer) do not have any insight into your cluster, @cchaswell
I think will work on B for now :-)
from halfpipe.
Thanks Lea!
I forwarded this to our cluster admin and they wanted to know if there was a way to change where I write out the log file. Is it possible for me to set a different directory for that output? That could help troubleshoot on our end since we are currently writing to our Isilon server where the data folder is.
"Can you send your log to a different location than the output ( ie:home or a temp folder ) ? Asking because one thing youβre doing is writing data to the duke isilon, which we have no control over and is quite a bit slower than our storage."
Also: "Additionally β¦ looks like hard links are disabled on isilon, which is something she pointed to in her solution B link."
from halfpipe.
I might be having some of the same locking issues maybe on our other server, @HippocampusGirl.
I manually created a slice timing file to get around the issues of the milliseconds. I just divided the values that halfpipe read in by 1000. The program seems to get past those errors but it is still hanging and not finishing after about an hour or running. I am asking for 40G per node. I get a few crash files that files are not found. I am not sure if they are getting created or not but it isn't for all of the subjects. It seems like that would not cause everything to just pause.
I will copy over the error and crash files so maybe you can pinpoint where this error is happening. This is data that has gone through the pipeline in previous versions.
crash-20201015-125100-ch186-bold_to_std_transform.a0-ff41d240-dd89-4646-9746-487e9e8e6c23.txt
crash-20201015-125101-ch186-bold_to_std_transform.a0-a7159412-7b56-4bd1-a438-c01e419998b4.txt
crash-20201015-125101-ch186-ds_t1w_tpl_inv_xfm-5c2b338e-ed0d-46b2-a42c-deb6a65112c1.txt
log.txt
err.txt
from halfpipe.
Related Issues (20)
- Orientation of epi images HOT 2
- FileNotFoundError: No such file or directory HOT 8
- Removing dummy scans does not change carpet plot
- User interface error when too many distinct sessions are present HOT 7
- Error during smoothing
- HTC performance is better with separation of temporal nipype workdir (/tmp) and final HALFpipe workdir HOT 1
- "Shrunken brains" registration issue seen on EPI spatial normalisation HOT 1
- curses.error: addwstr() returned ERR HOT 2
- HALFpipe does not properly read in the fieldmap "PhaseEncodingDirection" and "IntendedFor" sidecar information HOT 1
- Seed-based connectivity is not computed for some subjects HOT 2
- Issues with core usage when running group-level analyses on high-performance (HPC) system HOT 3
- Submitting jobs on a cluster with MaxArraySize HOT 1
- Not creating execgraph HOT 4
- Automated testing for COINSTAC integration
- Fix crash in group model after aggregating runs
- HALFpipe crashing with group models
- Error occured in skullstrip_first_pass
- string index out of range HOT 1
- Issues when specifying the path of the BIDS directory HOT 1
- ICA-AROMA confusion/documentation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from halfpipe.