Hey Lil'L! How does an example annotation file look file? <p dir

Annotation file about xfuse HOT 10 OPEN

mssanjavickovic commented on September 15, 2024

Annotation file

from xfuse.

Comments (10)

angadps commented on September 15, 2024 1

Thanks for those recommendations. Yes, this makes it more clear and I'm continuing to play around with the DGE and prediction analyses by drawing more samples etc.
I did run into masking issues for one of my samples which seems very similar to the problem described in [(https://github.com//issues/33)]. The sample does lie partially outside the fiducial boundary so I may try out the branch that you have created which reads the position list.
Thank you for your help!

from xfuse.

ludvb commented on September 15, 2024

Hey Sanja, hope you're doing good! :)

The annotation file is an HDF5 file. Top-level keys should correspond to the names of the annotation layers. Each annotation layer is represented by an integer matrix of the same size as the image data containing the annotation labels. For example, for the analysis of the mitral cell layer in the manuscript, we had an annotation file with the following structure:

import h5py
import numpy as np

mcl_annotation = h5py.File("./mcl_annotation.h5", "r")
print(mcl_annotation.keys())  # outputs: <KeysViewHDF5 ['mitral', 'non-mitral']>
print(mcl_annotation['mitral'])  # outputs: <HDF5 dataset "mitral": shape (9910, 9275), type "|u1">
print(np.unique(mcl_annotation['mitral']))  # outputs: [  0 255]

In the config file, there are two different analysis modes that make use of annotations: analyses.differential_expression and analyses.imputation. In the former, all non-zero pixels are positive for the annotation. In the latter, expression values are imputed for each unique pixel value. Our analysis of the mitral cell layer had the following entry in its config file:

[analyses.differential_expression]
annotation_layer1 = "mitral"
annotation_layer2 = "nonmitral"
normalize_sample_differences = true
num_samples = 100

The process is a little bit involved right now. I'm looking at ways to improve the DGE and imputation analyses from a usability standpoint and to add concrete examples. Since the manuscript is under review, I will probably hold this change for now, but I'll keep this issue open as a reminder.

All the best and take care,
Ludvig

from xfuse.

mssanjavickovic commented on September 15, 2024

Cool! Thanks for this!

I was thinking that you used the annotation info during training but you use it only during analysis as far as I can see. And totally good luck with the paper! I know things get hectic so will catch up with you later :)

from xfuse.

ludvb commented on September 15, 2024

Yes, exactly. Thanks! Sounds good, come visit us when you're in the neighborhood! :)

from xfuse.

angadps commented on September 15, 2024

Hello authors!

Thanks for sharing the xfuse tool. I've been experimenting with it since a while and definitely like the super-resolution plots from the integration of my Visium data. So I'm trying to see what else is possible here including DGE and predictions. Been a while since the last comment on this thread but it seems to me that the input parameters and/or the formats have changed since? I have a few questions on your latest and how I can run DGE on my dataset.

I have a single section visium dataset for which I was able to run metagenes and gene-maps. Next I would like to run DGE and looks like it requires an 'annotation_layer' value in the config. I did some digging of your code/issues based on which I prepared one such file resembling this containing say the 'tumor' and 'stroma' regions in my section. After that I ran 'xfuse convert visium' along with the annotation file to scale down appropriately.

Based on your current design, how can I run DGE for both my 'tumor' and 'stroma' regions? Do I need to run them as separate jobs?
Also, looks like the analysis requires a 'comparisons' value in the config as well. What is the purpose of that and what is the format for the config?

If you have any current documentation/guidance I can take a look at that as well in case it answers my questions to know if I have the basic approach right or not. Thanks guys in advance!

from xfuse.

ludvb commented on September 15, 2024

Hi, Thanks for the interest in our work!

You're right, the format for the annotation has changed slightly. The DGE module lacks documentation unfortunately, but it seems you're quite close to getting it to work!

Only a single annotation layer is used now, so you will need to merge the tumor and stroma annotations, encoding them with distinct values (e.g., 1 => tumor, 2 => stroma). The annotation file could look something like this:

import h5py
import numpy as np

annotation = h5py.File("./annotation.h5", "r")
print(annotation.keys())  # outputs: <KeysViewHDF5 ['tissue_type']>
print(annotation['tissue_type'])  # outputs: <HDF5 dataset "tissue_type": shape (9910, 9275), type "<u2">
print(np.unique(annotation['tissue_type']))  # outputs: array([  0, 1, 2], dtype=uint16)

To compare tumor to stroma, you would then add the following entry to your config.toml:

[analyses.analysis-differential_expression]
type = "differential_expression"
[analyses.analysis-differential_expression.options]
annotation_layer = "tissue_type"
comparisons = [[1, 2]]

Let me know if you run into any problems!

from xfuse.

angadps commented on September 15, 2024

Thanks for your response. Unfortunately, I do get an error with identifying comparisons again although a different one this time. Let me first explain the steps that I am taking:

I begin with preparing the annotation file containing tumor, stromal and 'other' tissue. This is what my annotation file looks like:

annotation.keys() # <KeysViewHDF5 ['tissue_type']> annotation['tissue_type'] #<HDF5 dataset "tissue_type": shape (30360, 30708), type "<u2"> np.unique(annotation['tissue_type'])) # array([0, 1, 2, 3], dtype=uint16)
I then xfuse convert the visium data into the desired format:

xfuse convert visium --image Image.tiff --bc-matrix filtered_feature_bc_matrix.h5 --tissue-positions tissue_positions_list.csv --scale-factors scalefactors_json.json --scale 0.05 --annotation annotation.h5 --save-path dge

The annotation file used here is the one generated in the first step. Here is also where things start to get confusing to me. The file structure for the scaled down h5 file that also contains the annotations looks like this:

data = h5py.File('dge/data.h5', 'r') data['annotation'].keys() # <KeysViewHDF5 ['tissue_type']>

data['annotation']['tissue_type'] # <HDF5 group "/annotation/tissue_type" (2 members)>
data['annotation']['tissue_type'].keys() # <KeysViewHDF5 ['label', 'names']>
data['annotation']['tissue_type']['label'] # <HDF5 dataset "label": shape (1587, 1668), type "<u2">
data['annotation']['tissue_type']['names'] # <HDF5 group "/annotation/tissue_type/names" (2 members)>
data['annotation']['tissue_type']['names'].keys() # <KeysViewHDF5 ['keys', 'values']>
data['annotation']['tissue_type']['names']['keys'] # <HDF5 dataset "keys": shape (4,), type "<i8">
data['annotation']['tissue_type']['names']['values'] # <HDF5 dataset "values": shape (4,), type "|O">

Running np.unique on the annotation dataset obviously looks different:

np.unique(data['annotation']['tissue_type']) # array(['label', 'names'], dtype='<U5')

Now when I run the differential expression analysis I immediately get an error trying to pick up the 'comparisons` values:

[2022-03-03 19:39:36,040] INFO : Running analysis "analysis-differential_expression"
[2022-03-03 19:40:09,516] ERROR : KeyError: "None of [Int64Index([1, 2], dtype='int64')] are in the [columns]"
Traceback (most recent call last):
File "/home/lib/python3.9/site-packages/xfuse/run.py", line 191, in run
_analyses[analysis_type].function(**options)
File "/home/lib/python3.9/site-packages/xfuse/analyze/differential_expression.py", line 99, in _run_differential_expression_analysis
_save_comparison(a, b)
File "/home/lib/python3.9/site-packages/xfuse/analyze/differential_expression.py", line 79, in _save_comparison
samples[[a, b]]
File "/usr/prog/python/3.9.1/lib/python3.9/site-packages/pandas/core/frame.py", line 3030, in getitem
indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
File "/usr/prog/python/3.9.1/lib/python3.9/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "/usr/prog/python/3.9.1/lib/python3.9/site-packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([1, 2], dtype='int64')] are in the [columns]"

My hunch is that I'm not providing the annotations in the desired format. Do I need to provide them as a separate file and not part of the xfuse convert output? Do I need to run xfuse convert differently from how I am right now? Appreciate your help in understanding this.

from xfuse.

ludvb commented on September 15, 2024

I think you're doing the right thing! Can you check the contents of data['annotation']['tissue_type']['names']['values']? The values you provide to comparisons should correspond to the entries there. It could be that you need to specify comparisons = [["1", "2"]] in your config.toml file, as the values appear to be strings, or there may be some other difference.

from xfuse.

angadps commented on September 15, 2024

Hi,

That was it. By default, the annotation layers are being encoded as dtype:object. I am able to read these by specifying string type in the toml file so using comparison = [["1", "2"]] works! Thanks for pointing that out.

Now that I'm able to run both differential expression and the predictions analyses, I'm trying to interpret the results of both of them as well.
Differential expression seems straightforward to understand with the value for each gene corresponding to the log normalized ratio between the two annotation layers.
How do I interpret the values in the prediction table? In short, what are we trying to predict here? Here's a snippet of the table:

,tissue_type,section,sample,A1BG,A1CF,A2M,A2ML1,A3GALT2,A4GALT
0,1,section0,1,88.8493,0.84028137,17132.33,13.994153,22.280005,859.4006
1,2,section0,1,3.354892,0.033604957,796.58997,0.53242856,0.83570814,113.00109
2,3,section0,1,5.2650294,0.049886085,1049.2319,0.8353669,1.3254343,58.289112

The tissue_type '0' contains the masked out and non-annotated regions so I'm really focusing on tissue_type 1 & 2 as also noted in the values in the 'comparisons' config.

from xfuse.

ludvb commented on September 15, 2024

Great to hear that it's working (and sorry for giving you the wrong instructions initially!).

The interpretation depends on the config options passed to the prediction analysis. By default (predict_mean=True, normalize_scale=False, normalize_size=False), it will return samples from the posterior of the mean total expression of each gene in the specified tissue_type and section. If you are comparing areas of different sizes, then I'd recommend setting normalize_size=True to instead compute the average expression across space. It's also a good idea to set num_samples to a value greater than one, say 10+. This will draw multiple samples from the posterior (indexed by the sample column), and the sample variance can then give you an idea of how uncertain the prediction is.

Let me know if anything is unclear!

from xfuse.

Annotation file about xfuse HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent