azizilab / starfysh Goto Github PK

Spatial Transcriptomic Analysis using Reference-Free auxiliarY deep generative modeling and Shared Histology

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 99.71% Python 0.29% Shell 0.01%

starfysh's Introduction

Starfysh: Spatial Transcriptomic Analysis using Reference-Free deep generative modeling with archetYpes and Shared Histology

Starfysh is an end-to-end toolbox for the analysis and integration of Spatial Transcriptomic (ST) datasets. In summary, the Starfysh framework enables reference-free deconvolution of cell types and cell states and can be improved with the integration of paired histology images of tissues, if available. Starfysh is capable of integrating data from multiple tissues. In particular, Starfysh identifies common or sample-specific spatial “hubs” with unique composition of cell types. To uncover mechanisms underlying local and long-range communication, Starfysh can be used to perform downstream analysis on the spatial organization of hubs.

Tutorials on Colab (recommend)

Quickstart tutorials

Please refer to Starfysh Documentation for additional tutorials & APIs

Installation

Github-version installation:

# Step 1: Clone the Repository
git clone https://github.com/azizilab/starfysh.git

# Step 2: Navigate to the Repository
cd starfysh

# Step 3: Install the Package
pip install .

Model Input:

Spatial Transcriptomics count matrix
Annotated signature gene sets (see example)
(Optional): paired H&E image

Features:

Deconvolving cell types & discovering novel, unannotated cell states
Integrating with histology images and multi-sample integration
Downstream analysis: spatial hub identification, cell-type colocalization networks & receptor-ligand (R-L) interactions

Directories

.
├── data:           Spatial Transcritomics & synthetic simulation datasets
├── notebooks:      Sample tutorial notebooks
├── starfysh:       Starfysh core model

How to cite Starfysh

Please cite Starfysh paper published in Nature Biotechnology:

He, S., Jin, Y., Nazaret, A. et al.
Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor–immune hubs.
Nat Biotechnol (2024).
https://doi.org/10.1038/s41587-024-02173-8

BibTex

@article{He2024,
  title = {Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor–immune hubs},
  ISSN = {1546-1696},
  url = {http://dx.doi.org/10.1038/s41587-024-02173-8},
  DOI = {10.1038/s41587-024-02173-8},
  journal = {Nature Biotechnology},
  publisher = {Springer Science and Business Media LLC},
  author = {He,  Siyu and Jin,  Yinuo and Nazaret,  Achille and Shi,  Lingting and Chen,  Xueer and Rampersaud,  Sham and Dhillon,  Bahawar S. and Valdez,  Izabella and Friend,  Lauren E. and Fan,  Joy Linyue and Park,  Cameron Y. and Mintz,  Rachel L. and Lao,  Yeh-Hsing and Carrera,  David and Fang,  Kaylee W. and Mehdi,  Kaleem and Rohde,  Madeline and McFaline-Figueroa,  José L. and Blei,  David and Leong,  Kam W. and Rudensky,  Alexander Y. and Plitas,  George and Azizi,  Elham},
  year = {2024},
  month = mar 
}

If you have questions, please contact the authors:

Siyu He - [email protected]
Yinuo Jin - [email protected]

starfysh's People

Contributors

Stargazers

Watchers

Forkers

yinuojin forget999 codylslater lsudupe siyuh aadimator apricottt duanwei617 aaabioinfo shengxinbaixiaosheng ateeq-khaliq schae211

starfysh's Issues

starfysh/AA.py:230 IndexError: index 1172 is out of bounds for axis 0 with size 1162

-03-28 21:32:23]Computing intrinsic dimension to estimate k...
-03-28 21:32:23]Estimating lower bound of # archetype as 9...
30 components are retained using conditional_number=30.00
-03-28 21:32:52]Calculating UMAPs for counts + Archetypes...
-03-28 21:32:54]Calculating UMAPs for counts + Archetypes...
-03-28 21:32:57]0.7599 variance explained by raw archetypes.
Merging raw archetypes within 20 NNs to get major archetypes
-03-28 21:32:57]Finding 20 nearest neighbors for each archetype...
-03-28 21:32:57]Finding 30 top marker genes for each archetype...

IndexError Traceback (most recent call last)
Cell In[21], line 8
5 arche_df = aa_model.find_archetypal_spots(major=True)
7 # (2). Find marker genes associated with each archetypal cluster
----> 8 markers_df = aa_model.find_markers(display=False)
10 # (3). Map archetypes to closest anchors within r nearest neighbors
11 # Choose the top 5% anchors of each cell type for mapping
12 percent_anchor = 0.05

File ~/miniconda3/envs/bioinfo_env/lib/python3.10/site-packages/starfysh/AA.py:230, in ArchetypalAnalysis.find_markers(self, n_markers, display)
227 for col in self.arche_df.columns:
228 # Annotate in-group (current archetype) vs. out-of-group
229 annots = np.zeros(self.n_spots, dtype=np.int64).astype(str)
--> 230 annots[self.arche_df[col]] = col
231 adata.obs[col] = annots
232 adata.obs[col] = adata.obs[col].astype('category')

IndexError: index 1172 is out of bounds for axis 0 with size 1162

while I'm running

Starfysh_tutorial_real.ipynb

aa_model = AA.ArchetypalAnalysis(adata_orig=adata_normed)
archetype, arche_dict, major_idx, evs = aa_model.compute_archetypes()

#(1). Find archetypal spots & archetypal clusters
arche_df = aa_model.find_archetypal_spots(major=True)

#(2). Find marker genes associated with each archetypal cluster
markers_df = aa_model.find_markers(display=False)

#(3). Map archetypes to closest anchors within r nearest neighbors
#Choose the top 5% anchors of each cell type for mapping
percent_anchor = 0.05
n_ancs = int(percent_anchor*adata.shape[0])

anchors_df = visium_args.get_anchors()
map_df, map_dict = aa_model.assign_archetypes(anchor_df=anchors_df[:n_ancs],
r=n_ancs)

AttributeError: 'VisiumArguments' object has no attribute 'sig_mean_znorm'

I run the same code as tutorials. When I run:
model, loss = utils.run_starfysh(visium_args,
n_repeats=n_repeats,
epochs=epochs,
device=device
)

The error shows like:

AttributeError Traceback (most recent call last)
Cell In[14], line 2
1 # Run models
----> 2 model, loss = utils.run_starfysh(visium_args,
3 n_repeats=n_repeats,
4 epochs=epochs,
5 device=device
6 )

File ~\AppData\Roaming\Python\Python311\site-packages\starfysh\utils.py:389, in run_starfysh(visium_args, n_repeats, lr, epochs, batch_size, alpha_mul, poe, device, seed, verbose)
386 dl_func = VisiumDataset
387 train_func = train
--> 389 trainset = dl_func(adata=adata, args=visium_args)
390 trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True, drop_last=True)
392 # Running Starfysh with multiple starts

File ~\AppData\Roaming\Python\Python311\site-packages\starfysh\dataloader.py:26, in VisiumDataset.init(self, adata, args)
24 x = adata.X if isinstance(adata.X, np.ndarray) else adata.X.A
25 self.expr_mat = pd.DataFrame(x, index=spots, columns=genes)
---> 26 self.gexp = args.sig_mean_znorm
27 self.anchor_idx = args.pure_idx
28 self.library_n = args.win_loglib

AttributeError: 'VisiumArguments' object has no attribute 'sig_mean_znorm'

However, when I run

visium_args = utils.VisiumArguments(adata,
adata_normed,
gene_sig,
img_metadata,
n_anchors=60,
window_size=3,
sample_id=sample_id
)

There are some warnings but no error:

[2024-03-29 15:15:08] Subsetting highly variable & signature genes ...
[2024-03-29 15:15:45] Smoothing library size by taking averaging with neighbor spots...
[2024-03-29 15:15:45] Retrieving & normalizing signature gene expressions...
WARNING: genes are not in var_names and ignored: Index(['IGLV3.25', 'HLA.DRA', 'STRA13', 'IGKV1.5', 'HULC', 'RP1.60O19.1',
'HLA.DPB1'],
dtype='object')
WARNING: genes are not in var_names and ignored: Index(['AP000769.1'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['CLEC3A', 'DSCAM.AS1', 'HLA.DRB1', 'MYEOV2', 'MLLT4'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['FYB', 'IL21'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['CD95L'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['MIR466I', 'MNDAL', 'TMEM55B', 'FAM196B'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['CCL3 CCL3L1'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['CCL3L3', 'SEPP1', 'GPX1'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['FAM26F', 'CCL3L3', 'GPX1'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['IGKV3-15', 'IGHV6-1', 'IGLV1-47', 'IGHV3-30', 'IGLV2-23',
'CH17-224D4.2', 'IGKV1-8', 'IGLV4-69', 'IGKV1-16', 'IGKV3D-11',
'IGHV3-15'],
dtype='object')
WARNING: genes are not in var_names and ignored: Index(['ARG1'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['GPX1', 'FAM26F', 'RP11-1143G9.4', 'CCL3L3'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['CD11b', 'CD11C', 'CD16', 'CD123', 'CD141', 'EpCAM', 'HLA-DR',
'CD172A'],
dtype='object')
WARNING: genes are not in var_names and ignored: Index(['CD123', 'IFNA', 'IFNB', 'TNFA'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['IGF2'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['PRKCDBP'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['LHFP'], dtype='object')
WARNING: genes are not in var_names and ignored: Index(['SDPR', 'PTRF'], dtype='object')
[2024-03-29 15:15:49] Identifying anchor spots (highly expression of specific cell-type signatures)...

ERROR: Cannot install large-image and large-image[sources]==1.0.0 because these package versions have conflicting dependencies.

Thank you for creating this intereseting tool!

I have been struggling to install Starfysh following the installation guide. I get the conflicting below.

ERROR: Cannot install large-image and large-image[sources]==1.0.0 because these package versions have conflicting dependencies.

I have tried different version of python and within virtual environments. Any help would be greatly appreciated.

Compatibility with Visium HD data?

Hey I was wondering if starfysh works with Visium HD data (over 200-300k spots/bins)? I'm getting memory errors in my local machine and on a server as well with a large amount of memory, so I was wondering if it's optimized to work with visium HD data

Thank you

Feature request: enhance plot_spatial_gene

Hi, it would be great to be able to add point_size and scale (log vs linear, etc) to plot_spatial_gene function. Tried to change the figsize to no avail.

Thanks!

run_starfysh() got an unexpected keyword argument 'patience'

The argument 'patience' might have been deleted. I can't find it in run_starfysh source code.

Error when using 'utils_integrate.run_starfysh'

Thank you for the nice tool!
I was trying to integrate ST Visium data, and all the previous steps went well until I got this error at the model training step.

[2024-04-18 22:36:37] Running Starfysh with 1 restarts, choose the model with best parameters...
[2024-04-18 22:36:37] === Restart Starfysh 1 ===

[2024-04-18 22:36:37] Initializing model parameters...

RuntimeError Traceback (most recent call last)
Cell In[26], line 5
3 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
4 ###
----> 5 model, loss = utils_integrate.run_starfysh(integrated_args,
6 n_repeats=n_repeats,
7 epochs=epochs,
8 poe=True,
9 device=device
10 )
11 # Save model
12 #torch.save(model.state_dict(), os.path.join(data_path, 'integ_model.pt'))

File ~/miniconda3/envs/stlearn/lib/python3.8/site-packages/starfysh/utils_integrate.py:480, in run_starfysh(visium_args, n_repeats, lr, epochs, batch_size, alpha_mul, poe, device, verbose)
477 scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.99)
479 for epoch in range(epochs):
--> 480 result = train_func(model, trainloader, device, optimizer)
481 torch.cuda.empty_cache()
483 loss_tot, loss_reconst, loss_u, loss_z, loss_c, loss_n, corr_list = result

File ~/miniconda3/envs/stlearn/lib/python3.8/site-packages/starfysh/starfysh.py:821, in train_poe(model, dataloader, device, optimizer)
819 inference_outputs = model.inference(x) # inference for 1D expr. data
820 generative_outputs = model.generative(inference_outputs, xs_k)
--> 821 img_outputs = model.predictor_img(img) # inference & generative for 2D img. data
822 poe_outputs = model.predictor_poe(inference_outputs, img_outputs) # PoE generative outputs
824 # Check for NaNs

File ~/miniconda3/envs/stlearn/lib/python3.8/site-packages/starfysh/starfysh.py:507, in AVAE_PoE.predictor_img(self, y)
504 y_n = torch.log1p(y)
506 # q(z | y)
--> 507 hidden_z = self.img_z_enc(y_n)
508 qz_m = self.img_z_enc_m(hidden_z)
509 qz_logv = self.img_z_enc_logv(hidden_z)

File ~/miniconda3/envs/stlearn/lib/python3.8/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
1509 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1510 else:
-> 1511 return self._call_impl(*args, **kwargs)

File ~/miniconda3/envs/stlearn/lib/python3.8/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
1515 # If we don't have any hooks, we want to skip the rest of the logic in
1516 # this function, and just call forward.
1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1518 or _global_backward_pre_hooks or _global_backward_hooks
1519 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520 return forward_call(*args, **kwargs)
1522 try:
1523 result = None

File ~/miniconda3/envs/stlearn/lib/python3.8/site-packages/torch/nn/modules/container.py:217, in Sequential.forward(self, input)
215 def forward(self, input):
216 for module in self:
--> 217 input = module(input)
218 return input

File ~/miniconda3/envs/stlearn/lib/python3.8/site-packages/torch/nn/modules/linear.py:116, in Linear.forward(self, input)
115 def forward(self, input: Tensor) -> Tensor:
--> 116 return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x2028 and 676x256)

Currently, I can not solve this issue. Looking forward for your help. Thank you!

A more complete tutorial is needed

I found that v1.1.1 have data integration function

run_starfysh error with poe argument

When I run run_starfysh function following the tutorial with poe argument, It comes an error:

{
"name": "RuntimeError",
"message": "mat1 and mat2 shapes cannot be multiplied (32x3072 and 1024x256)",
"stack": "---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[42], line 1
----> 1 model, loss = utils.run_starfysh(visium_args,
2 n_repeats=n_repeats,
3 epochs=epochs,
4 poe=True,
5 device=device
6 )

File ~/.local/lib/python3.9/site-packages/Starfysh-1.1.1-py3.9.egg/starfysh/utils.py:437, in run_starfysh(visium_args, n_repeats, lr, epochs, batch_size, alpha_mul, poe, device, seed, verbose)
435 try:
436 for epoch in range(epochs):
--> 437 result = train_func(model, trainloader, device, optimizer)
438 torch.cuda.empty_cache()
440 loss_tot, loss_reconst, loss_u, loss_z, loss_c, loss_n, corr_list = result

File ~/.local/lib/python3.9/site-packages/Starfysh-1.1.1-py3.9.egg/starfysh/starfysh.py:821, in train_poe(model, dataloader, device, optimizer)
819 inference_outputs = model.inference(x) # inference for 1D expr. data
820 generative_outputs = model.generative(inference_outputs, xs_k)
--> 821 img_outputs = model.predictor_img(img) # inference & generative for 2D img. data
822 poe_outputs = model.predictor_poe(inference_outputs, img_outputs) # PoE generative outputs
824 # Check for NaNs

File ~/.local/lib/python3.9/site-packages/Starfysh-1.1.1-py3.9.egg/starfysh/starfysh.py:507, in AVAE_PoE.predictor_img(self, y)
504 y_n = torch.log1p(y)
506 # q(z | y)
--> 507 hidden_z = self.img_z_enc(y_n)
508 qz_m = self.img_z_enc_m(hidden_z)
509 qz_logv = self.img_z_enc_logv(hidden_z)

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/container.py:215, in Sequential.forward(self, input)
213 def forward(self, input):
214 for module in self:
--> 215 input = module(input)
216 return input

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x3072 and 1024x256)"
}
I don't know what is mat1 and mat2

Data availability

Hi, Thank you for the very useful tool. Where can I find the spatial transcriptomic data used (or if not available do you have a rough date for when it will be?) Thanks!

Issue with utils.VisiumArguments

I did all the previous steps as the tutorial shows. When I run

visium_args = utils.VisiumArguments(adata, adata_normed,
gene_sig,
img_metadata,
n_anchors=60,
window_size=3,
sample_id=sample_id)

The error occurs:
[2024-03-29 12:08:42] Subsetting highly variable & signature genes ...
[2024-03-29 12:09:13] Smoothing library size by taking averaging with neighbor spots...
[2024-03-29 12:09:13] Retrieving & normalizing signature gene expressions...
[2024-03-29 12:09:13] Identifying anchor spots (highly expression of specific cell-type signatures)...

ValueError Traceback (most recent call last)
Cell In[112], line 1
----> 1 visium_args = utils.VisiumArguments(adata, adata_normed,
2 gene_sig,
3 img_metadata,
4 n_anchors=60,
5 window_size=3,
6 sample_id=sample_id)

File ~\AppData\Roaming\Python\Python311\site-packages\starfysh\utils.py:123, in VisiumArguments.init(self, adata, adata_norm, gene_sig, img_metadata, **kwargs)
119 # self.sig_mean_znorm = self._norm_sig(z_axis=self.params['z_axis'])
120
121 # Get anchor spots
122 LOGGER.info('Identifying anchor spots (highly expression of specific cell-type signatures)...')
--> 123 anchor_info = get_anchor_spots(self.adata,
124 self.sig_mean_znorm,
125 v_low=self.params['vlow'],
126 v_high=self.params['vhigh'],
127 n_anchor=self.params['n_anchors']
128 )
129 self.pure_spots, self.pure_dict, self.pure_idx = anchor_info
130 del self.adata.raw, self.adata_norm.raw

File ~\AppData\Roaming\Python\Python311\site-packages\starfysh\utils.py:782, in get_anchor_spots(adata_sample, sig_mean, v_low, v_high, n_anchor)
746 """
747 Calculate the top anchor spot enriched for the given cell type
748 (determined by normalized expression values from each signature)
(...)
773 Binary indicators of anchor spots (dim: [S, n_anchor])
774 """
775 highq_spots = (((adata_sample.to_df() > 0).sum(axis=1) > np.percentile((adata_sample.to_df() > 0).sum(axis=1), v_low)) &
776 ((adata_sample.to_df()).sum(axis=1) > np.percentile((adata_sample.to_df()).sum(axis=1), v_low)) &
777 ((adata_sample.to_df() > 0).sum(axis=1) < np.percentile((adata_sample.to_df() > 0).sum(axis=1), v_high)) &
778 ((adata_sample.to_df()).sum(axis=1) < np.percentile((adata_sample.to_df()).sum(axis=1), v_high))
779 )
781 pure_spots = np.transpose(
--> 782 sig_mean.loc[highq_spots, :].index[
783 (-np.array(sig_mean.loc[highq_spots, :])).argsort(axis=0)[:n_anchor, :]
784 ]
785 )
786 pure_dict = {
787 ct: spot
788 for (spot, ct) in zip(pure_spots, sig_mean.columns)
789 }
791 adata_pure = np.zeros([adata_sample.n_obs, 1])

File E:\anaconda\Lib\site-packages\pandas\core\indexes\base.py:5386, in Index.getitem(self, key)
5384 # Because we ruled out integer above, we always get an arraylike here
5385 if result.ndim > 1:
-> 5386 disallow_ndim_indexing(result)
5388 # NB: Using _constructor._simple_new would break if MultiIndex
5389 # didn't override getitem
5390 return self._constructor._simple_new(result, name=self._name)

File E:\anaconda\Lib\site-packages\pandas\core\indexers\utils.py:341, in disallow_ndim_indexing(result)
333 """
334 Helper function to disallow multi-dimensional indexing on 1D Series/Index.
335
(...)
338 in GH#30588.
339 """
340 if np.ndim(result) > 1:
--> 341 raise ValueError(
342 "Multi-dimensional indexing (e.g. obj[:, None]) is no longer "
343 "supported. Convert to a numpy array before indexing instead."
344 )

ValueError: Multi-dimensional indexing (e.g. obj[:, None]) is no longer supported. Convert to a numpy array before indexing instead.

Potential Bug in `IntegrativeDataset`

If I understand the code correctly IntegrativeDataset is actually used if we run the model without H&E image so self.image can/should be None (in contrast to IntegrativePoEDataset where indeed self.image should not be None.
These are the lines I am referring to:

starfysh/starfysh/dataloader.py

Lines 113 to 132 in 7407267

    
           class IntegrativeDataset(VisiumDataset): 
        
               """ 
        
               Loading multiple preprocessed ST sample AnnDatas, gene signature & Anchor spots for Starfysh training 
        
               """ 
        
               def __init__( 
        
                   self, 
        
                   adata, 
        
                   args, 
        
               ): 
        
                   super(IntegrativeDataset, self).__init__(adata, args) 
        
                   self.image = args.img 
        
                   self.map_info = args.map_info 
        
                   self.r = args.params['patch_r'] 
        
                   assert self.image is not None,\ 
        
                       "Empty paired H&E image," \ 
        
                       "please use regular `Starfysh` without PoE integration" \ 
        
                       "if your dataset doesn't contain histology image"

Normalization of Images

In the paper it says that the "original H&E images are first normalized to $[0,1]$ per channel". But I don't see that in the code if hchannel=False (or rather the normalization is commented out). Referring to the code here:

starfysh/starfysh/utils.py

Line 725 in 7407267

    
           # adata_image = (adata_image-adata_image.min())/(adata_image.max()-adata_image.min())

Did you decide to not scale the image channels, or is this happening somewhere else in the code?

bug in find_archettypal_spots when major=True

I have been going through the Starfysh_tutorial_real.ipynb and find it always crashes in this optional code block for when gene signatures aren't provided:

aa_model = AA.ArchetypalAnalysis(adata_orig=adata_normed)
archetype, arche_dict, major_idx, evs = aa_model.compute_archetypes(r=40)

# (1). Find archetypal spots & archetypal clusters
arche_df = aa_model.find_archetypal_spots(major=True)

# (2). Define "signature genes" as marker genes associated with each archetypal cluster
gene_sig = aa_model.find_markers(n_markers=30, display=False)
gene_sig.head()

It crashes in the find_markers function (with IndexError: index 1172 is out of bounds for axis 0 with size 1162), but I believe the error is in find_archetypal_spots. The arche_df object contains values as high as 1172, whereas aa_model.n_spots=1162. If I run find_archetypal_spots as above, but with major=False, then the max value of arche_df is 1161 as expected, and the remaining code runs fine. I've looked through the underlying code a bit and think there is an indexing error in find_archetypal_spots or get_knns (which it calls) but have not quite pinpointed it. Thanks.

POE analysis fails on run_starfysh

Great package! When trying to run

model, loss = utils.run_starfysh(visium_args,
                                 n_repeats=n_repeats,
                                 epochs=epochs,
                                 poe=True, 
                                 # Turn on/off for model with/without histology integration
                                 device=device) # 16'

I get the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/shared/apps/anaconda3/lib/python3.9/site-packages/starfysh/utils.py", line 440, in run_starfysh
    result = train_func(model, trainloader, device, optimizer)
  File "/shared/apps/anaconda3/lib/python3.9/site-packages/starfysh/starfysh.py", line 821, in train_poe
    img_outputs = model.predictor_img(img)  # inference & generative for 2D img. data
  File "/shared/apps/anaconda3/lib/python3.9/site-packages/starfysh/starfysh.py", line 507, in predictor_img
    hidden_z = self.img_z_enc(y_n)
  File "/shared/apps/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/shared/apps/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/shared/apps/anaconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/shared/apps/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/shared/apps/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/shared/apps/anaconda3/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x3072 and 1024x256)

When running with poe = False it works. I attach the model and the img_metadata.
model_img.zip

Thanks for any help!

advice for making a custom signature matrix

Hello I am interested in using starfysh on a large number of visium datasets.
I have lots of scRNA seq of high resolution cell types and want to deconvulute the spatial data for the same resolution.
I don't think the archetypal method for getting signatures will line up exactly with my finely grained cell types and as such I want to use a custom signature matrix, similar to the at described in the tutorial.
I was wondering if you could please provide some advice regarding how to define signatures especially between related cell types such as subtypes of CD8 T cells.

Should I just use all genes that fit a certain fold change cutoff in DGE analysis from the scRNA seq, should I use genes that are the most specific to each cell type? If two cell types like subtypes of macrophages have some shared genes that separate them from other cell types but not each other should I include those?

Thank you,
Ido

	class IntegrativeDataset(VisiumDataset):
	"""
	Loading multiple preprocessed ST sample AnnDatas, gene signature & Anchor spots for Starfysh training
	"""

	def __init__(
	self,
	adata,
	args,
	):
	super(IntegrativeDataset, self).__init__(adata, args)
	self.image = args.img
	self.map_info = args.map_info
	self.r = args.params['patch_r']


	assert self.image is not None,\
	"Empty paired H&E image," \
	"please use regular `Starfysh` without PoE integration" \
	"if your dataset doesn't contain histology image"