jfortin1 / neurocombat Goto Github PK

View Code? Open in Web Editor NEW

100.0 4.0 35.0 44 KB

Harmonization of multi-site imaging data with ComBat (Python)

License: MIT License

Shell 0.22% Python 99.78%

harmonization combat python neuroimaging multi-site-imaging mri-images normalization

neurocombat's People

Contributors

Stargazers

Watchers

neurocombat's Issues

Handling of features that are constant across all sites and subjects

I'm trying to use NeuroCombat for segmentation harmonisation. I have GM, WM, and CSF segmentations for many subjects and sites, and am using the individual voxel values as features. As expected, there are voxels that end up being zero across all sites and subjects even if skull-stripping is performed: E.g.: For the CSF maps, in the central WM regions the voxel values are all zero for all images.

My understanding is that this poses problems for the NeuroCombat model as these features will have zero mean and variance. I can of course simply remove those voxels that exhibit this behaviour, but I wanted to check that this is the best course of action before doing so.

batch_levels of type "words" (e.g. "siteA") raise an error

neuroCombat/neuroCombat/neuroCombat.py

Line 107 in cab1853

'batch_levels': batch_levels.astype('int'),

[bug] ref_indices contains values

neuroCombat/neuroCombat/neuroCombat.py

Line 95 in cab1853

ref_indices = covars[:,batch_col][covars[:,batch_col]==ref_batch]

The following line solves the bug:
ref_indices = np.argwhere(covars[:, batch_col] == ref_batch).squeeze()

Example ¨Correcting from pandas.DataFrame as Data¨ needs a data transpose not to fail with shape not aligned

From @FinLouarn at ncullen93/neuroCombat#6

The first example ¨Correcting from Numpy Array as Data¨ works smoothly,
but the second example ¨Correcting from pandas.DataFrame as Data¨ fails with the following error of shape not aligned, unless you feed neuroCombat with data.T in stead of data.

python 3.7.7
pandas 0.25.3

ValueError Traceback (most recent call last)
in
15 batch_col=batch_col,
16 discrete_cols=discrete_cols,
---> 17 continuous_cols=continuous_cols)

/neuroCombat/neuroCombat/neuroCombat.py in neuroCombat(data, covars, batch_col, discrete_cols, continuous_cols)
97 # standardize data across features
98 print('Standardizing data across features..')
---> 99 s_data, s_mean, v_pool = standardize_across_features(data, design, info_dict)
100
101 # fit L/S models and find priors

/neuroCombat/neuroCombat/neuroCombat.py in standardize_across_features(X, design, info_dict)
159 sample_per_batch = info_dict['sample_per_batch']
160
--> 161 B_hat = np.dot(np.dot(la.inv(np.dot(design.T, design)), design.T), X.T)
162 grand_mean = np.dot((sample_per_batch/ float(n_sample)).T, B_hat[:n_batch,:])
163 var_pooled = np.dot(((X - np.dot(design, B_hat).T)**2), np.ones((n_sample, 1)) / float(n_sample))

<array_function internals> in dot(*args, **kwargs)

ValueError: shapes (8,57) and (22283,57) not aligned: 57 (dim 1) != 22283 (dim 0)

cannot xtfrm data frames

Warning message:
In xtfrm.data.frame(x) : cannot xtfrm data frames

How to fix this warning message?

Estimate harmonization parameters on controls and apply them to patients

Hello,
We are trying to harmonize DTI maps of subjects scanned in 6 different sites. Since the participants are in majority patients, to avoid any bias due to the disease we would like to consider only the healthy controls and then, for each site, apply the estimated harmonization to all the other subjects.
Looking at the code, we saw that the function “adjust_data_final” employs the parameters returned in the “estimates” dictionary however in input it does not take the original data but “s_data”. So, to solve our problem we thought to do the following:

Run the function neuroCombat passing in input dat=data_HC (which contains only the health controls’ data) and saving the dictionary estimates
Run the code of neuroCombat up to standardize_across_features() (line 127) using dat=data_PT (which contains only the patients’ data)
Call adjust_data_final(s_data, design, gamma_star, delta_star, s_mean, mod_mean, v_pool, info_dict, dat) using the gamma_star and delta_star obtained at step 1.

Is this approach correct or is there a more straightforward way to do it?
Thank you for the help.
Best,
Ilaria

Combat for non-uniform data?

Hello, we are trying to run Combat with "raw" BOLD images (4D images per subject). The datasets are from 2 different sites, one used a TR of 2sec and the other used TR of 0.8sec. This means that the number of volumes, in the same time range, would be more for the 0.8sec TR. One dataset has 177 volumes (2sec TR) and the other has 450 volumes (0.8sec TR). In summary is it possible to use a non-square matrix to run Combat for these two datasets? The variable in question is the "dat" variable with a (p x n) composition.

ref_batch with different number of subjects

Dear Jean-Phillippe,
I was wondering if I can also use neuroCombat when my ref_batch has a smaller number of subjects than the batch I want to harmonise. My reference batch is 94 subjects and the batch I want to harmonise is 112 subject, with both 105 radiomic features. How do I implement this in the main code? Thank you for your time!
Greetings, Lieke

Negative output

I was a new user of NeuroCombat. We have multi-site dwi image, before Combat DTI matrices showed site effect, combat was worked great to removed site effect. For the same data set we also have a measurement from Mrtrix3, this matrix didn't have site effect before combat, however, I thought may just apply the approach for all the matrices. For this Mrtrix3 matrix, we have found some negative value after combat, which is feasible. How this could happen and is there a way to control not being negative?

Jian

NeuroCombatFromTraining developed?

Hello, I was going through the source code and found the NeuroCombatFromTraining function. It currently says it is under development but it seems to be working alright when I tested it in a couple of examples.

Is the tag under development there because it has not been fully validated or are there parts that still need to be added? It looks pretty complete from my cursory look at it.

Is the information contained in the ‘covar’ regressed or preserved?

I see the documnets of R implementation, the demographic infomation will be protected. I don't know the condition in the Python implementation.

Sample or dummy data available?

Hi,

I would like to try Combat for harmonizing results of clinical data from quite a lot of scanners. However, I am struggling with the right inputs. Are sample or dummy data available? Or could you elaborate a bit more, like in the short Matlab script / tutorial?

Best,
Falk

scanner/batch covariate

Please let me know what is the scanner/batch covariate. Thanks

Preserving biological variables

Dear Jean-Philippe,
I was wondering what effect adding variables to be preserved has on the outcome.

I am working with volumetric data in a disorder. So far, I have preserved sex and disorder stage as categorical as well as age and eTIV as continuous variables. As my model appears to be different if leaving out e.g. disorder stage or eTIV as a preserved variable, I was wondering whether there are some criteria which variables should be preserved in the model?
As I regress out eTIV, age, and sex lateron in an ANCOVA anyways, I have trouble understanding whether and why they should be preserved in ComBat.
Any comments on that are appreciated!
Best, Melissa

Questions regarding two fMRI datasets with different TR

Dear neuroCombat expert,

I have three fMRI datasets preprocessed with fMRIprep having different volumes due to different TR: 1) 91x109x91x192 (TR=2510, slice thickness=2.51); 2) 91x109x91x570 (TR= 720, slice thickness=0.75); 3) 91X109X91X240 (TR=2000, slice thickness=2). When I convert the 4D image to a vector I have different vector lengths. Given neuroCombat requires the same vector length, I wonder how can I resclice/resample the data to have the same vector length. Any suggestions are appriciated!

Best wishes
Joyce

Singular matrix during inversion

Hi there,
Thank you for this amazing project. I was using this ADHD200 dataset to remove site effects. I computed Pearson coefficients for AAL functional connectivities as inputs and some sites have provided biological features such as gender, handedness and IQs. However, when calling the neuroCombat , I got the following error on this function get_beta_with_nan:
LinAlgError:Singular matrix

neuroCombat/neuroCombat/neuroCombat.py

Lines 211 to 221 in ac82a06

    
           def get_beta_with_nan(yy, mod): 
        
               wh = np.isfinite(yy) 
        
               mod = mod[wh,:] 
        
               yy = yy[wh] 
        
               B = np.dot(np.dot(la.inv(np.dot(mod.T, mod)), mod.T), yy.T) 
        
               return B 
        
           betas = [] 
        
           for i in range(X.shape[0]): 
        
               betas.append(get_beta_with_nan(X[i,:], design)) 
        
           B_hat = np.vstack(betas).T

I am no experts in maths, but according to this solution, it can be attributed to the same rows in the matrix so the inversion does not exist. I didn't fully understand this part of the code, but I guess I can have some patients who have the same biological features and that caused the issue? (assumption though). After I changed la.inv into la.pinv everything works fine now. I was wondering if you could have better insights towards this and whether modification is solid.

Thanks for the help :)

How to implement neuroCombat for voxel-wise data?

Hey Jean-Philippe,
thanks for the nice and helpful software! :) I have successfully used neuroCombat for Desikan-Killiany ROIs already, but I was trying to extend the analysis to voxel-wise data. However, the doc says that dat has to be of shape = (features, samples) e.g. cortical thickness measurements, image voxels. How do I put 3D voxel information into 1 dimension? How can I input voxel-wise data per subject into neuroCombat?
Best, Melissa

invalid value encountered in divide

Hi @Jfortin1!
Happy to greet you! First of all, thank you very much for your contributions in the field of harmonization!

I'm trying to use neuroHarmonize that run over combat, but I have an error that I've seen that you have solved before:

I've tried to modify the combat.py code but I can't get the whole implementation to work.
Could you indicate me which lines of code should I modify please?
The problem is in line 190 in harmonizationLearn.py

Thank you very much in advance!!

	def get_beta_with_nan(yy, mod):
	wh = np.isfinite(yy)
	mod = mod[wh,:]
	yy = yy[wh]
	B = np.dot(np.dot(la.inv(np.dot(mod.T, mod)), mod.T), yy.T)
	return B

	betas = []
	for i in range(X.shape[0]):
	betas.append(get_beta_with_nan(X[i,:], design))
	B_hat = np.vstack(betas).T

jfortin1 / neurocombat Goto Github PK

neurocombat's People

Contributors

Stargazers

Watchers

Forkers

neurocombat's Issues

Recommend Projects

Recommend Topics

Recommend Org