monashbi / arcana-legacy Goto Github PK
View Code? Open in Web Editor NEWAbstraction of Repository-Centric ANAlysis - a Python framework
License: Apache License 2.0
Abstraction of Repository-Centric ANAlysis - a Python framework
License: Apache License 2.0
Implement the possibility to get a list of desired outputs instead of just one.
FieldSpec and OptionSpec should use the traits package instead of the Python int|float|str classes directly. This should allow the specification of more complex types if required (e.g. list of floats), which could be permitted on a case by case basis (e.g. if the archive supports it)
Add an option 'single_visit' to the LocalRepository (to be renamed DirectoryRepository) to avoid having to restructure the project directories to include the visit sub-directories.
Instead of passing kwargs in pipeline getters as kwargs to create_pipeline method, they should be passed in as a required dictionary arg "mods", to ensure that they are always passed through.
Also, they should be able to take a name_map and store it in the pipeline itself, mapping any connections to inputs and outputs using the name map. This will bring modification functionality only currently available in MulitStudy's to all Study classes and reduce the need to create explicit factory methods (i.e. methods can be modified in sub-classes without altering the base class)
When restarting a crashed job, the checking of the local archive always return nothing to do as the final output was already there. This happen even if there are few pre-computed outputs and to run the pipeline I have to delete both the outputs from the archive and the working directory.
Since project isn't referenced anywhere else in the package, should probably call 'per_project' 'per_study' instead
Unittests are required to check regex, order and DICOM field matching performed by DatasetMatch
Pipelines options/version information needs to be stored in the archive alongside the data, and then checked before new runs are performed.
FileFormat should either have an optional argument that takes a function that can extract a data array from the given dataset (so it can be plotted), or optionally formats which this possible for should extend the FileFormat class
It seems that when you use pipeline.create_map_node the requirements the you specify when the map_node is created are not passed to all the sub-nodes. This cause command not found error for all of them.
Check if it is possible to pass the requirements to all the sub-nodes.
Currently, data specs are determined to be acquired or derived depending on whether a pipeline name is provided or not. Probably would be cleaner to define two separate classes for each as they have different attributes and methods. Also for the specs that are provided as input/outputs to the Pipeline.init should probably be called as such, i.e. DatasetInput or DatasetOutput.
Edit the .travis.yml to install a local XNAT instance (using the Docker-compose script) so unit tests can be run locally
With Nipype updating its syntax, it could be easier and simpler to jump ahead and already implement it in Arcana to avoid having to change it later when more pipelines are written in it.
In addition to the environment modules loading code (or perhaps in preference to it). Run all pipeline nodes within singularity containers if singularity it present.
URIs to the singularity containers can be kept in Requirement objects.
For env modules, need some code to map version names when creating a Runner.
Datasets should be able to be distinguished (matched) on the basis of their dicom fields (e.g. gre field mapping phase or mag). The new DatasetMatch infrastructure should make this possible
Allow explicit datasets and fields to be passed to Study inputs (in ExplicitDatasets|ExplicitFields objects). Iterables of Dataset|Field objects should be able to be passed as an input when using the dictionary inputs form.
Will need to implement the match(subject_id=None, visit_id=None) method, drawing the appropriate subject and visit ids from the provided datasets (although will typically be of 'per_project' frequency).
This will enable templates (e.g. atlases) to be passed to studies as inputs. A class attribute default_inputs could also be used to specify default templates to use for particular pipelines.
All datasets|fields should know which session|visit|subject they belong to (if any)
Edit unit tests to use basic dataset types (e.g. text_format) that can be created on the fly to avoid the need to download data from XNAT and use data_formats defined in nianalysis.
Using the future package, make arcana Python 2+3 compatible
Error message:
File "run_motion_detection.py", line 72, in
run_md(args.input_dir, dynamic=args.dynamic_md, xnat_id=args.xnat_id)
File "run_motion_detection.py", line 48, in run_md
visit_ids=[session_id], work_dir=WORK_PATH)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/pipeline.py", line 189, in run
self.connect_to_archive(complete_workflow, **kwargs)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/pipeline.py", line 310, in connect_to_archive
visit_ids=visit_ids)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/archive/xnat.py", line 771, in project
processed=processed),
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/archive/xnat.py", line 888, in _get_datasets
multiplicity=mult, location=None))
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/dataset.py", line 161, in init
super(Dataset, self).init(name, format, multiplicity)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/dataset.py", line 86, in init
super(BaseDataset, self).init(name=name, multiplicity=multiplicity)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/dataset.py", line 24, in init
assert isinstance(name, basestring)
AssertionError
If I use local archive instead of xnat archive everything works fine.
In order to reproduce this error you can run the script in my mbi-analysis branch (resting_state) called assertion_error.py. It is in mbi-analysis/debug
Maybe have a flag that enables you to reuse old work directories when generating data
Create a custom SlurmPlugin (that can also work with SGE) that only submits long jobs to the que and processes short book keeping nodes locally.
Implement submission over SSH with paramiko, so the main pipeline container can run on the XNAT server.
Optionally send progress information to a PIMS server
Instead of running pipelines explicitly, pipelines should be run when requesting a dataset/field. For this to happen, the Study object needs to have its own "Runner" object to determine how and where the processing pipelines are run (e.g. locally single/multi process or submitted to SLURM scheduler)
Repositories that are to combined into a single study may not have the same ID scheme, and for XNAT repositories the session ID can depended on the subject and project ID (at least in the case of MBI-XNAT). So there needs to be a custom way to map the IDs provided to the study and those of the repository.
A pair of lambda functions or a IDMapper object might be a good solution
Similar to the MultiStudyMetaClass, should write a meta class that all Study classes should use to construct class members such as data_specs and default_options
For sub-studies that encapsulate different runs of the same type within a session (e.g. multiple fMRI tasks), need to come up with a way to conveniently set the BIDS "run" number and have all the default bids matches updated
Instead of creating new "MR Sessions" to store derived data, the QIB datatype should be used instead.
LocalRepository
kwargs
need to be specified for each Study
field. Those could be passed to pybids queries. Alternatively use a function pointer with BIDSLayout
argument passed for more complex queries (think fieldmaps).More on pybids: https://github.com/INCF/pybids/tree/master/bids/grabbids
When passing inputs to a study as a dictionary, *Match objects should be allowed not to have names, and just take the name from the dictionary key.
Similarly for formats, these should be allowed to be an optional input that is detected from the matches themselves (which should be checked against the provided format if provided)
Instead of storing in *_PROC sessions (on XNAT) derived inputs should be stored in separate sessions for each study, e.g. MRH017_001_MR01_MYANALYSIS.
This would allow us to remove the study-name prefix for the derived datasets/fields, although care will have to be take to allow it to remain for sub-study prefixes.
For the local archive, the derived outputs should be stored in separate sub-directories.
This should be a bit neater but will also make it easier to store and retrieve provenance data.
BaseDataset/BaseField objects will need to have an additional 'study' field to specify which study they were derived from and the 'get_tree' methods will need to search for all studies that are listed in the input matches.
It seems there is a bug when trying to restart a crashed workflow using the local archive (I haven't tested with xnat). When checking the precomputed outputs it always return that there is nothing to do even if the final output has not been already produced. In order to make the pipeline work you need to delete the cache dir and all the previously produced outputs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.