Code Monkey home page Code Monkey logo

arcana-legacy's People

Contributors

tclose avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

arcana-legacy's Issues

Switch to using Traits instead of dtype in FieldSpec and OptionSpec

FieldSpec and OptionSpec should use the traits package instead of the Python int|float|str classes directly. This should allow the specification of more complex types if required (e.g. list of floats), which could be permitted on a case by case basis (e.g. if the archive supports it)

Formalise pipeline "mods" and replace TranslatedPipeline

Instead of passing kwargs in pipeline getters as kwargs to create_pipeline method, they should be passed in as a required dictionary arg "mods", to ensure that they are always passed through.

Also, they should be able to take a name_map and store it in the pipeline itself, mapping any connections to inputs and outputs using the name map. This will bring modification functionality only currently available in MulitStudy's to all Study classes and reduce the need to create explicit factory methods (i.e. methods can be modified in sub-classes without altering the base class)

Pre-computed outputs check is not working properly in local archive

When restarting a crashed job, the checking of the local archive always return nothing to do as the final output was already there. This happen even if there are few pre-computed outputs and to run the pipeline I have to delete both the outputs from the archive and the working directory.

Add method to extract data to FileFormat

FileFormat should either have an optional argument that takes a function that can extract a data array from the given dataset (so it can be plotted), or optionally formats which this possible for should extend the FileFormat class

passing pipeline requirements to map_node

It seems that when you use pipeline.create_map_node the requirements the you specify when the map_node is created are not passed to all the sub-nodes. This cause command not found error for all of them.
Check if it is possible to pass the requirements to all the sub-nodes.

Split data spec into Acquired, Derived and Input/Output

Currently, data specs are determined to be acquired or derived depending on whether a pipeline name is provided or not. Probably would be cleaner to define two separate classes for each as they have different attributes and methods. Also for the specs that are provided as input/outputs to the Pipeline.init should probably be called as such, i.e. DatasetInput or DatasetOutput.

Rework pipeline syntax to match Nipype 2.0

With Nipype updating its syntax, it could be easier and simpler to jump ahead and already implement it in Arcana to avoid having to change it later when more pipelines are written in it.

In addition to environment modules run nodes within containers

In addition to the environment modules loading code (or perhaps in preference to it). Run all pipeline nodes within singularity containers if singularity it present.

URIs to the singularity containers can be kept in Requirement objects.

For env modules, need some code to map version names when creating a Runner.

Dataset matching via DICOM fields

Datasets should be able to be distinguished (matched) on the basis of their dicom fields (e.g. gre field mapping phase or mag). The new DatasetMatch infrastructure should make this possible

Allow explicit Dataset and Fields to be passed as Study inputs

Allow explicit datasets and fields to be passed to Study inputs (in ExplicitDatasets|ExplicitFields objects). Iterables of Dataset|Field objects should be able to be passed as an input when using the dictionary inputs form.

Will need to implement the match(subject_id=None, visit_id=None) method, drawing the appropriate subject and visit ids from the provided datasets (although will typically be of 'per_project' frequency).

This will enable templates (e.g. atlases) to be passed to studies as inputs. A class attribute default_inputs could also be used to specify default templates to use for particular pipelines.

BidsSelector input to Filesetspec

  • Add bids kwarg_ to FilesetSpecs to pass a BidsSelector object.
  • Also add bids_run kwarg to Study and pick pass it on to the selector when the class is initialised

Assertion error when trying to run motion detection using XNATArchive

Error message:
File "run_motion_detection.py", line 72, in
run_md(args.input_dir, dynamic=args.dynamic_md, xnat_id=args.xnat_id)
File "run_motion_detection.py", line 48, in run_md
visit_ids=[session_id], work_dir=WORK_PATH)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/pipeline.py", line 189, in run
self.connect_to_archive(complete_workflow, **kwargs)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/pipeline.py", line 310, in connect_to_archive
visit_ids=visit_ids)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/archive/xnat.py", line 771, in project
processed=processed),
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/archive/xnat.py", line 888, in _get_datasets
multiplicity=mult, location=None))
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/dataset.py", line 161, in init
super(Dataset, self).init(name, format, multiplicity)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/dataset.py", line 86, in init
super(BaseDataset, self).init(name=name, multiplicity=multiplicity)
File "/Users/francescosforazzini/git/NiAnalysis/nianalysis/dataset.py", line 24, in init
assert isinstance(name, basestring)
AssertionError

If I use local archive instead of xnat archive everything works fine.
In order to reproduce this error you can run the script in my mbi-analysis branch (resting_state) called assertion_error.py. It is in mbi-analysis/debug

Create modified SlurmPlugin to submit only long jobs to the que

Create a custom SlurmPlugin (that can also work with SGE) that only submits long jobs to the que and processes short book keeping nodes locally.

Implement submission over SSH with paramiko, so the main pipeline container can run on the XNAT server.

Optionally send progress information to a PIMS server

Runner classes

Instead of running pipelines explicitly, pipelines should be run when requesting a dataset/field. For this to happen, the Study object needs to have its own "Runner" object to determine how and where the processing pipelines are run (e.g. locally single/multi process or submitted to SLURM scheduler)

ID mapping in Repository object

Repositories that are to combined into a single study may not have the same ID scheme, and for XNAT repositories the session ID can depended on the subject and project ID (at least in the case of MBI-XNAT). So there needs to be a custom way to map the IDs provided to the study and those of the repository.

A pair of lambda functions or a IDMapper object might be a good solution

Add study meta class

Similar to the MultiStudyMetaClass, should write a meta class that all Study classes should use to construct class members such as data_specs and default_options

Names for *Match objects taken from dictionary, format from found files

  • When passing inputs to a study as a dictionary, *Match objects should be allowed not to have names, and just take the name from the dictionary key.

  • Similarly for formats, these should be allowed to be an optional input that is detected from the matches themselves (which should be checked against the provided format if provided)

Move derived inputs into sessions named by study

Instead of storing in *_PROC sessions (on XNAT) derived inputs should be stored in separate sessions for each study, e.g. MRH017_001_MR01_MYANALYSIS.

This would allow us to remove the study-name prefix for the derived datasets/fields, although care will have to be take to allow it to remain for sub-study prefixes.

For the local archive, the derived outputs should be stored in separate sub-directories.

This should be a bit neater but will also make it easier to store and retrieve provenance data.

BaseDataset/BaseField objects will need to have an additional 'study' field to specify which study they were derived from and the 'get_tree' methods will need to search for all studies that are listed in the input matches.

bug when checking previously computed outputs

It seems there is a bug when trying to restart a crashed workflow using the local archive (I haven't tested with xnat). When checking the precomputed outputs it always return that there is nothing to do even if the final output has not been already produced. In order to make the pipeline work you need to delete the cache dir and all the previously produced outputs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.