Code Monkey home page Code Monkey logo

mapca's Introduction

Multi-echo ICA (ME-ICA) Processing of fMRI Data

โš ๏ธ PLEASE NOTE This code base is currently unmaintained. No new feature enhancements or bug fixes will be considered at this time.

We encourage prospective users to instead consider tedana, which maintains and extends many of the multi-echo-specific features of ME-ICA.

For fMRI processing more generally, we refer users to AFNI or fMRIPrep

This repository is a fork of the (also unmaintained) Bitbucket repository.

Dependencies

  1. AFNI
  2. Python 2.7
  3. numpy
  4. scipy

Installation

Install Python and other dependencies. If you have AFNI installed and on your path, you should already have an up-to-date version of ME-ICA on your path. Running meica.py without any options will check for dependencies and let you know if they are met. If you don't have numpy/scipy (or appropriate versions) installed, I would strongly recommend using the Enthought Canopy Python Distribution. Click here for more installation help.

Important Files and Directories

  • meica.py : a master script that performs preprocessing and calls the ICA/TE-dependence analysis script tedana.py
  • meica.libs : a folder that includes utility functions for TE-dependence analysis for denoising and anatomical-functional co-registration
  • meica.libs/tedana.py : performs ICA and TE-dependence calculations

Usage

fMRI data is called: rest_e1.nii.gz, rest_e2.nii.gz, rest_e3.nii.gz, etc. Anatomical is: mprage.nii.gz

meica.py and tedana.py have a number of options which you can view using the -h flag.

Here's an example use:

meica.py -d rest1_e1.nii.gz,rest1_e2.nii.gz,rest1_e3.nii.gz -e 15,30,45 -b 15s -a mprage.nii --MNI --prefix sub1_rest

This means:

-e 15,30,45   are the echo times in milliseconds
-d rest_e1.nii.gz,rest_e2...   are the 4-D time series datasets (comma separated list of dataset of each TE) from a multi-echo fMRI acqusition
-a ...   is a "raw" mprage with a skull
-b   15 means drop first 15 seconds of data for equilibration
--MNI   warp anatomical to MNI space using a built-in high-resolution MNI template. 
--prefix sub1_rest   prefix for final functional output datasets, i.e. sub1_rest_....nii.gz

Again, see meica.py -h for handling other situations such as: anatomical with no skull, no anatomical at all, applying FWHM smoothing, non-linear warp to standard space, etc.

Click here more info on group analysis.

Output

  • ./meica.rest1_e1/ : contains preprocessing intermediate files. Click here for detailed listing.
  • sub1_rest_medn.nii.gz : 'Denoised' BOLD time series after: basic preprocessing, T2* weighted averaging of echoes (i.e. 'optimal combination'), ICA denoising. Use this dataset for task analysis and resting state time series correlation analysis. See here for information on degrees of freedom in denoised data.
  • sub1_rest_tsoc.nii.gz : 'Raw' BOLD time series dataset after: basic preprocessing and T2* weighted averaging of echoes (i.e. 'optimal combination'). 'Standard' denoising or task analyses can be assessed on this dataset (e.g. motion regression, physio correction, scrubbing, blah...) for comparison to ME-ICA denoising.
  • sub1_rest_mefc.nii.gz : Component maps (in units of \delta S) of accepted BOLD ICA components. Use this dataset for ME-ICR seed-based connectivity analysis.
  • sub1_rest_mefl.nii.gz : Component maps (in units of \delta S) of ALL ICA components.
  • sub1_rest_ctab.nii.gz : Table of component Kappa, Rho, and variance explained values, plus listing of component classifications. See here for more info.

For a step-by-step guide on how to assess ME-ICA results in more detail, click here

#Some Notes

  • Make sure your datasets have slice timing information in the header. If not sure, specify a --tpattern option to meica.py. Check AFNI documentation of 3dTshift to see slice timing codes.
  • For more info on T2* weighted anatomical-functional coregistration click here
  • FWHM smoothing is not recommended. tSNR boost is provided by optimal combination of echoes. For better overlap of 'blobs' across subjects, use non-linear standard space normalization instead with meica.py ... --qwarp

mapca's People

Contributors

eurunuela avatar handwerkerd avatar notzaki avatar tsalo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mapca's Issues

Documentation missing

I believe we should add a documentation page that clearly explains what the algorithm does.

I think it would make it easier to understand for users of tedana especially, as we receive some questions about it in Neurostars (see this post for example).

Identified part of the issue with component misestimation

I have a dataset where the number of components with any criterion were sometimes way to high or low. This is an issue that has been reported by others. I've identified one reason this is happening.

The MAPCA algorithm depends on estimating the spatial dependence of voxels and then running a cost function on a sparse sampling of the voxels that are far enough apart to be independent & identically distributed (i.i.d.). That is, if neighboring voxels have dependence, then apply the cost function to every other voxel in 3D space (i.e. use 1/(2^3) of the total voxels).

I made a branch of mapca which outputted a lot more logging including this subsampling factor: sub_iid_sp_median The attached figure shows this subsampling factor on the x axis and the estimated number of components on the y axis. The different markers are different run types, but all have 300-350 volumes. When the subsampling factor is 1 (i.e. use every voxel) the number of components estimation always fails by giving way too many components. When the subsampling factor is 3 (1/27 of the total voxels used for estimation) the number of components estimation fails by being way too low. I'm Given the very large range of estimated components when the subsampling factor is 2, I suspect there are other issues, but this discrete parameter should be constant for data collected within the same acquisition parameters and bad things happen when it's not.

I'm starting to think about options for completely different approaches for estimating the number of components, but I wanted to document this here. As an intermediate step, we might want to add a few more values, such as this subsamplign factor, into logging.
image

As a tangent, I realized mapca used a LGR but it wasn't writing out the log. In my above branch, I changed the LGR declaration to "general" and changed a few other things. MAPCA still didn't output it's own log, but then the LGR output from mapca was included in the tedana log.
https://github.com/handwerkerd/mapca/blob/a03727b3346c0f92380b5a1699b7c001d43c6956/mapca/mapca.py#L31

Replacing utils._kurtn with scipy.stats.kurtosis

It looks like utils._kurtn(arr) can be replaced with scipy.stats.kurtosis(arr, axis = 0, fisher = True).
Any cons with making this replacement?

Few other notes:

  • The test for _kurtn converts an input with shape (2,3,4) into (3,1). I was expecting the output shape to be (3,4). Is this intentional?
    def test_kurtn():
    """
    Unit test for _kurtn function
    """
    test_data = np.random.rand(2, 3, 4)
    kurt = _kurtn(test_data)
    assert kurt.shape == (3, 1)
  • The docstring for _kurtn says that the kurtosis is calculated along the second dimension, but the code is calculating kurtosis along every dimension except the second dimension.

    mapca/mapca/utils.py

    Lines 317 to 329 in 0d01a2b

    Returns
    -------
    kurt : (1:N) array-like
    The kurtosis of each vector in x along the second dimension. For
    tedana, this will be the kurtosis of each PCA component.
    """
    kurt = np.zeros((data.shape[1], 1))
    for i in range(data.shape[1]):
    data_norm = detrend(data[:, i], type="constant")
    data_norm /= np.std(data_norm)
    kurt[i] = np.mean(data_norm ** 4) - 3

Operate on images in subsampling function

If the subsampling function operates on images, it will be easier to use nilearn's functions for masking/unmasking the data. See #32.

Original post:

How do you all feel about using nilearn.image.resample_img() for utils.subsampling()? I am working on #29, and I think it would be easier have a function that downsamples and returns images instead of arrays.

Scaler parameters overridden within MovingAveragePCA

@eurunuela in #26, it looks like you commented out code that generated a Scaler within the subsampling procedure, and instead reused the one for the general scaling:

mapca/mapca/mapca.py

Lines 174 to 177 in 6c1d58c

# Perform Variance Normalization
# scaler = StandardScaler(with_mean=True, with_std=True)
dat = self.scaler_.fit_transform(dat.T).T
# data = ((dat.T - dat.T.mean(axis=0)) / dat.T.std(axis=0)).T

My concern is that re-fitting self.scaler_ based on new data will make it invalid for rescaling data back to the original scale, as here:

mapca/mapca/mapca.py

Lines 331 to 332 in 6c1d58c

if self.normalize:
X_orig = self.scaler_.inverse_transform(X_orig.T).T

Does that make sense, or was there a specific reason you got rid of the original approach? Apologies for not noticing this in #26 and bringing it up while that PR was open.

maPCA in tedana does not currently z-score data before performing PCA

As discussed in tedana #636, the maPCA in tedana currently does not normalize data before performing PCA.

@tsalo already added the normalize option when he restructured the toolbox to be sklearn compatible. However, as I mentioned in the PR, I think we should add an if statement that checks the data is already z-scored and normalizes the data when it isn't. I'd remove the normalize option and do the normalization by default.

Drop spatial elements from sklearn-style class

Currently we pass around shape_3d (a three-element tuple with the shape of the image) and mask_vec (a 1D array indicating whether a given voxel falls within the mask) throughout the MovingAveragePCA class, and I would love to support 2D data, if that's possible.

Make class accept images instead of arrays

Originally, we wanted to have the function operate on images and the class operate on arrays, in case we could get rid of the spatial elements of the algorithm (see #5). Since that's not possible anymore, the way we pass around a 3D shape tuple and a 1D mask vector is more than a little awkward. We can just move the image-related code from the function to the class and pass in a data image and a mask image to clean up the interface.

[BUG] Dimensions of masked data not considered for fit_transform

While working on the integration test, I saw there is a little bug in the code that probably comes from losing the context we had in tedana.

Basically, we are not considering that the data has been masked when giving shape3d to fit_transform(). We're currently giving the original dimensions when it should be the masked ones.

Edit: The error comes from the fact that apply_mask from nilearn only keeps the values inside the mask, while mapca in tedana keeps everything.

Tests fail due to make command not being installed

As of today, tests fail due to the following error:

Package make is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'make' has no installation candidate

Exited with code exit status 100

Set up continuous integration

I copied the relevant tests from tedana over but we'll need to set up CI with workflows for the unit tests, coverage, and linting.

Deploy to PyPi

We'll want to deploy to PyPi to make it easier to use this package in tedana and other tools.

Add license

We need to incorporate information for the appropriate license from GIFT.

Update package installation and minimum Python version

Working on #59 has made me realize we probably want to increase the minimum Python version we support if we want to stay compatible with new versions of scikit-learn.

I would also follow the installation changes we did in tedana and move everything to the project.toml file.

`n_features_` from sklearn's PCA is deprecated

n_features_ from sklearn's PCA is deprecated and has to be updated to n_features_in_.

/export/home/mflores/.conda/envs/tedana/lib/python3.9/site-packages/sklearn/utils/deprecation.py:101: FutureWarning: Attribute `n_features_` was deprecated in version 1.2 and will be removed in 1.4. Use `n_features_in_` instead.
  warnings.warn(msg, category=FutureWarning)

Document typical results for different datasets/acquisition parameters

This stems from ME-ICA/tedana#849 and is related to #42. Basically, I think it would be awesome if we had some idea of how many components are "typical" for the different criteria, depending on a few factors, such as (1) number of volumes, (2) temporal resolution, and (3) spatial resolution. We could then plot and share those results in the MAPCA documentation, much like how the tedana documentation includes distributions of typical ME-EPI parameters in the literature.

Codecov not working

I thought #11 would fix the Codecov issue but apparently, it didn't.

The CircleCI config looks good to me. I'll probably look into it tomorrow.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.