Comments (10)
Thanks for those recommendations. Yes, this makes it more clear and I'm continuing to play around with the DGE and prediction analyses by drawing more samples etc.
I did run into masking issues for one of my samples which seems very similar to the problem described in [(https://github.com//issues/33)]. The sample does lie partially outside the fiducial boundary so I may try out the branch that you have created which reads the position list.
Thank you for your help!
from xfuse.
Hey Sanja, hope you're doing good! :)
The annotation file is an HDF5 file. Top-level keys should correspond to the names of the annotation layers. Each annotation layer is represented by an integer matrix of the same size as the image data containing the annotation labels. For example, for the analysis of the mitral cell layer in the manuscript, we had an annotation file with the following structure:
import h5py
import numpy as np
mcl_annotation = h5py.File("./mcl_annotation.h5", "r")
print(mcl_annotation.keys()) # outputs: <KeysViewHDF5 ['mitral', 'non-mitral']>
print(mcl_annotation['mitral']) # outputs: <HDF5 dataset "mitral": shape (9910, 9275), type "|u1">
print(np.unique(mcl_annotation['mitral'])) # outputs: [ 0 255]
In the config file, there are two different analysis modes that make use of annotations: analyses.differential_expression
and analyses.imputation
. In the former, all non-zero pixels are positive for the annotation. In the latter, expression values are imputed for each unique pixel value. Our analysis of the mitral cell layer had the following entry in its config file:
[analyses.differential_expression]
annotation_layer1 = "mitral"
annotation_layer2 = "nonmitral"
normalize_sample_differences = true
num_samples = 100
The process is a little bit involved right now. I'm looking at ways to improve the DGE and imputation analyses from a usability standpoint and to add concrete examples. Since the manuscript is under review, I will probably hold this change for now, but I'll keep this issue open as a reminder.
All the best and take care,
Ludvig
from xfuse.
Cool! Thanks for this!
I was thinking that you used the annotation info during training but you use it only during analysis as far as I can see. And totally good luck with the paper! I know things get hectic so will catch up with you later :)
from xfuse.
Yes, exactly. Thanks! Sounds good, come visit us when you're in the neighborhood! :)
from xfuse.
Hello authors!
Thanks for sharing the xfuse tool. I've been experimenting with it since a while and definitely like the super-resolution plots from the integration of my Visium data. So I'm trying to see what else is possible here including DGE and predictions. Been a while since the last comment on this thread but it seems to me that the input parameters and/or the formats have changed since? I have a few questions on your latest and how I can run DGE on my dataset.
I have a single section visium dataset for which I was able to run metagenes and gene-maps. Next I would like to run DGE and looks like it requires an 'annotation_layer' value in the config. I did some digging of your code/issues based on which I prepared one such file resembling this containing say the 'tumor' and 'stroma' regions in my section. After that I ran 'xfuse convert visium' along with the annotation file to scale down appropriately.
Based on your current design, how can I run DGE for both my 'tumor' and 'stroma' regions? Do I need to run them as separate jobs?
Also, looks like the analysis requires a 'comparisons' value in the config as well. What is the purpose of that and what is the format for the config?
If you have any current documentation/guidance I can take a look at that as well in case it answers my questions to know if I have the basic approach right or not. Thanks guys in advance!
from xfuse.
Hi, Thanks for the interest in our work!
You're right, the format for the annotation has changed slightly. The DGE module lacks documentation unfortunately, but it seems you're quite close to getting it to work!
Only a single annotation layer is used now, so you will need to merge the tumor and stroma annotations, encoding them with distinct values (e.g., 1 => tumor, 2 => stroma). The annotation file could look something like this:
import h5py
import numpy as np
annotation = h5py.File("./annotation.h5", "r")
print(annotation.keys()) # outputs: <KeysViewHDF5 ['tissue_type']>
print(annotation['tissue_type']) # outputs: <HDF5 dataset "tissue_type": shape (9910, 9275), type "<u2">
print(np.unique(annotation['tissue_type'])) # outputs: array([ 0, 1, 2], dtype=uint16)
To compare tumor to stroma, you would then add the following entry to your config.toml:
[analyses.analysis-differential_expression]
type = "differential_expression"
[analyses.analysis-differential_expression.options]
annotation_layer = "tissue_type"
comparisons = [[1, 2]]
Let me know if you run into any problems!
from xfuse.
Thanks for your response. Unfortunately, I do get an error with identifying comparisons again although a different one this time. Let me first explain the steps that I am taking:
-
I begin with preparing the annotation file containing tumor, stromal and 'other' tissue. This is what my annotation file looks like:
annotation.keys() # <KeysViewHDF5 ['tissue_type']> annotation['tissue_type'] #<HDF5 dataset "tissue_type": shape (30360, 30708), type "<u2"> np.unique(annotation['tissue_type'])) # array([0, 1, 2, 3], dtype=uint16)
-
I then
xfuse convert
the visium data into the desired format:
xfuse convert visium --image Image.tiff --bc-matrix filtered_feature_bc_matrix.h5 --tissue-positions tissue_positions_list.csv --scale-factors scalefactors_json.json --scale 0.05 --annotation annotation.h5 --save-path dge
The annotation file used here is the one generated in the first step. Here is also where things start to get confusing to me. The file structure for the scaled down h5 file that also contains the annotations looks like this:
data = h5py.File('dge/data.h5', 'r') data['annotation'].keys() # <KeysViewHDF5 ['tissue_type']>
data['annotation']['tissue_type'] # <HDF5 group "/annotation/tissue_type" (2 members)>
data['annotation']['tissue_type'].keys() # <KeysViewHDF5 ['label', 'names']>
data['annotation']['tissue_type']['label'] # <HDF5 dataset "label": shape (1587, 1668), type "<u2">
data['annotation']['tissue_type']['names'] # <HDF5 group "/annotation/tissue_type/names" (2 members)>
data['annotation']['tissue_type']['names'].keys() # <KeysViewHDF5 ['keys', 'values']>
data['annotation']['tissue_type']['names']['keys'] # <HDF5 dataset "keys": shape (4,), type "<i8">
data['annotation']['tissue_type']['names']['values'] # <HDF5 dataset "values": shape (4,), type "|O">
Running np.unique on the annotation dataset obviously looks different:
np.unique(data['annotation']['tissue_type']) # array(['label', 'names'], dtype='<U5')
- Now when I run the differential expression analysis I immediately get an error trying to pick up the 'comparisons` values:
[2022-03-03 19:39:36,040] INFO : Running analysis "analysis-differential_expression"
[2022-03-03 19:40:09,516] ERROR : KeyError: "None of [Int64Index([1, 2], dtype='int64')] are in the [columns]"
Traceback (most recent call last):
File "/home/lib/python3.9/site-packages/xfuse/run.py", line 191, in run
_analyses[analysis_type].function(**options)
File "/home/lib/python3.9/site-packages/xfuse/analyze/differential_expression.py", line 99, in _run_differential_expression_analysis
_save_comparison(a, b)
File "/home/lib/python3.9/site-packages/xfuse/analyze/differential_expression.py", line 79, in _save_comparison
samples[[a, b]]
File "/usr/prog/python/3.9.1/lib/python3.9/site-packages/pandas/core/frame.py", line 3030, in getitem
indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
File "/usr/prog/python/3.9.1/lib/python3.9/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "/usr/prog/python/3.9.1/lib/python3.9/site-packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([1, 2], dtype='int64')] are in the [columns]"
My hunch is that I'm not providing the annotations in the desired format. Do I need to provide them as a separate file and not part of the xfuse convert
output? Do I need to run xfuse convert
differently from how I am right now? Appreciate your help in understanding this.
from xfuse.
I think you're doing the right thing! Can you check the contents of data['annotation']['tissue_type']['names']['values']
? The values you provide to comparisons
should correspond to the entries there. It could be that you need to specify comparisons = [["1", "2"]]
in your config.toml file, as the values appear to be strings, or there may be some other difference.
from xfuse.
Hi,
That was it. By default, the annotation layers are being encoded as dtype:object. I am able to read these by specifying string type in the toml file so using comparison = [["1", "2"]]
works! Thanks for pointing that out.
Now that I'm able to run both differential expression and the predictions analyses, I'm trying to interpret the results of both of them as well.
Differential expression seems straightforward to understand with the value for each gene corresponding to the log normalized ratio between the two annotation layers.
How do I interpret the values in the prediction table? In short, what are we trying to predict here? Here's a snippet of the table:
,tissue_type,section,sample,A1BG,A1CF,A2M,A2ML1,A3GALT2,A4GALT
0,1,section0,1,88.8493,0.84028137,17132.33,13.994153,22.280005,859.4006
1,2,section0,1,3.354892,0.033604957,796.58997,0.53242856,0.83570814,113.00109
2,3,section0,1,5.2650294,0.049886085,1049.2319,0.8353669,1.3254343,58.289112
The tissue_type '0' contains the masked out and non-annotated regions so I'm really focusing on tissue_type 1 & 2 as also noted in the values in the 'comparisons' config.
from xfuse.
Great to hear that it's working (and sorry for giving you the wrong instructions initially!).
The interpretation depends on the config options passed to the prediction analysis. By default (predict_mean=True
, normalize_scale=False
, normalize_size=False
), it will return samples from the posterior of the mean total expression of each gene in the specified tissue_type
and section
. If you are comparing areas of different sizes, then I'd recommend setting normalize_size=True
to instead compute the average expression across space. It's also a good idea to set num_samples
to a value greater than one, say 10+. This will draw multiple samples from the posterior (indexed by the sample
column), and the sample variance can then give you an idea of how uncertain the prediction is.
Let me know if anything is unclear!
from xfuse.
Related Issues (20)
- Error: ValueError: The parameter loc has invalid values HOT 1
- License? HOT 1
- Error when Running with Windows HOT 1
- error when installing
- AttributeError: 'DataLoader' object has no attribute 'reset_workers' HOT 1
- In silico spatial transcriptomics HOT 1
- Runtime error related to tensor size when running "analysis-gene_maps" HOT 4
- prediction of spatial gene expression
- ModuleNotFoundError: No module named 'xfuse' HOT 12
- When is my model fully trained? HOT 9
- Is it possible for Gene_maps values to use the same scale instead of min-max per gene? HOT 3
- How do I make a tissue mask (with photoshop?) HOT 4
- What is the minimum of sample to train a model on? HOT 1
- Installation of xfuse HOT 11
- TypeError: cannot pickle 'weakref' object related to save_session HOT 2
- How to predict expression from histology in unseen samples? What specific input and configuration files should be prepared?
- Customizing Training Procedure
- RuntimeError: CUDA error: no kernel image is available for execution on the device HOT 2
- RuntimeError: Annotation layer "" is missing HOT 1
- Missing h5 file question
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xfuse.