ST data presents challenges such as uneven cell density distribution, low sampling rates, and complex spatial structures. Traditional spot-based analysis strategies struggle to effectively address these issues. STMiner explores ST data by leveraging the spatial distribution of genes, thus avoiding the biases that these conditions can introduce into the results.
Here we propose โSTMinerโ. The three key steps of analyzing ST data in STMiner are depicted.
(Left top) STMiner first utilizes Gaussian Mixture Models (GMMs) to represent the spatial distribution of each gene and the overall spatial distribution. (Left bottom) STMiner then identifies spatially variable genes by calculating the cost that transfers the overall spatial distribution to gene spatial distribution. Genes with high costs exhibit significant spatial variation, meaning their expression patterns differ considerably across different regions of the tissue. The distance array is built between SVGs in the same way, genes with similar spatial structures have a low cost to transport to each other, and vice versa. (Right) The distance array is embedded into a low-dimensional space by Multidimensional Scaling, allowing for clustering genes with similar spatial expression patterns into distinct functional gene sets and getting their spatial structure.
Please visit STMiner Documents for installation and detail usage.
from STMiner import SPFinder
You can download test data here.
sp = SPFinder()
file_path = 'Path/to/your/h5ad/file'
sp.read_h5ad(file=file_path)
sp.get_genes_csr_array(min_cells=500, log1p=False)
sp.spatial_high_variable_genes()
You can check the distance of each gene by
sp.global_distance
Gene | Distance |
---|---|
geneA | 9998 |
geneB | 9994 |
... | ... |
geneC | 8724 |
sp.fit_pattern(n_comp=20, gene_list=list(sp.global_distance[:1000]['Gene']))
Each GMM model has 20 components.
sp.build_distance_array()
sp.cluster_gene(n_clusters=6, mds_components=20)
The result is stored in genes_labels:
sp.genes_labels
The output looks like the following:
gene_id | labels | |
---|---|---|
0 | Cldn5 | 2 |
1 | Fyco1 | 2 |
2 | Pmepa1 | 2 |
3 | Arhgap5 | 0 |
4 | Apc | 5 |
.. | ... | ... |
95 | Cyp2a5 | 0 |
96 | X5730403I07Rik | 0 |
97 | Ltbp2 | 2 |
98 | Rbp4 | 4 |
99 | Hist1h1e | 4 |
sp.get_pattern_array(vote_rate=0.3)
sp.plot.plot_pattern(vmax=99,
heatmap=False,
s=5,
reverse_y=True,
reverse_x=True,
image_path='E://cut_img.png',
rotate_img=True,
k=4,
aspect=0.55)
sp.plot.plot_intersection(pattern_list=[0, 1],
image_path='E://OneDrive - stu.xjtu.edu.cn/paper/cut_img.png',
reverse_y=True,
reverse_x=True,
aspect=0.55,
s=20)
sp.plot.plot_genes(label=0, vmax=99)
Attribute | Type | Description |
---|---|---|
adata | Anndata | Anndata for loaded spatial data |
global_distance | pd.DataFrame | OT distance between gene and background |
genes_labels | pd.DataFrame | Gene name and their pattern labels |
genes_patterns | dict | GMM model for each gene |
genes_distance_array | pd.DataFrame | Distance between each GMM |
kmeans_fit_result | obj | Result of k-means |
mds_features | pd.DataFrame | embedding features after MDS |
- Peisen Sun ([email protected])
- Kai Ye ([email protected])