Comments (10)
So my question would be, which objects in the AIRR data model do we connect with epitope and specificity?
It sounds like linking to
Receptor
is the main one, but what about individualRearrangements
? If I have aRearrangement
that is a T-cell with a CDR3 of "AALIQGAQKLV" would you want in the AIRR TSV Rearrangement file the ability to record that that CDR3 has a known antigen or specificity (that is there is a TCR in IEDB with "chain1_cdr3_seq": "AALIQGAQKLV").Does it make sense to add a field to the Rearrangement that is "receptor_id":"IEDB_RECEPTOR:57"
Definitely. Actually I think this was the original idea, like in the recent U24 proposal for links between AIRR rearrangements and IEDB. Now Receptor
may be the more logical link, but I don't know if a rearrangement will always be able to be tied to a receptor or not.
For example, tcrmatch is an example tool that could suggest links between rearrangements and IEDB receptors.
At the very least, we (AIRR) should coordinate our plans for Receptor
with IEDB as they also have the same conceptual object.
from airr-standards.
My 2 cents:
- Biologically, the reactivity of a Receptor is determined by both chains (i.e., two
Rearrangements
) unless proven otherwise. There are examples for one of the chains dominating the reactivity, but these are exceptions and should not guide our data model. Therefore having areceptor_id
in theRearrangement
object is a no-go. Note that this is different from our discussion whetherCell
andReceptor
objects MAY be merged into a combined tabular structure for on-disk representation. - I agree that we want interoperability with IEDB and in the short run this could be implemented via a direct reference to their database. But this should not stop us from developing a universal ID generation mechanism for
Receptor
objects.
from airr-standards.
We are probably talking about separate questions here:
What is in a reference?
(independent of the level of experimental/inferential support that the referenced object has)
What does a reference (OWL: Object Property) connecting a rearrangement
with an external Receptor description signify?
IS_COMPONENT_OF_ABSTRACT_RECEPTOR
, i.e., this sequence would give rise to a component of the receptor described in the referenced record. This does neither imply that the actual receptor has been observered in the current experiment nor that there is any evidence that the annotatedrearrangement
has actually been paired with the other component REQUIRED for the receptor.IS_COMPONENT_OF_OBSERVED_RECEPTOR
, i.e., this sequences gives rise to a compontent of a receptor that has been observered (directly or indirectly) within the study. This implies that
a. all other REQUIRED components MUST also have been observed in the study and
b. that these components MAY be grouped solely based onreceptor_id
.CONTAINS_RECEPTOR
as merged reference fromCell
, i.e., this sequence has been observered in a Cell together with all other REQUIRED components for the referenced receptor.- nothing, i.e., such a reference does not exist in the Schema and we decide not to allow for merged objects in our data representations.
- Ad 1: IMO there is nothing problematic in making this statement, but it will make datasets very noisy, as ANY match based of an single Receptor component MAY be annotated. Furthermore, it could be mistaken as a way to group
rearrangements
, which it is explicitly not. So in @schristley's example, the references would be a (weak) suggestion in what therearrangement
might be involved in. I am not convinced that this provides an added value to the user. - Ad 2: This introduces a separate grouping mechanism into the Schema and is at variance with what we discussed in #410, i.e. that there is no link in the Schema between
Rearrangement
andReceptor
asCell
is the object responsible for observed and inferred pairing of receptor components. - Ad 3: This is closest to the current situation. It suffers from the fact that we have different representation and assumes that we do allow for merged objects.
What is in a receptor?
In terms of the Schema, it would not be a problem to use the proposed Receptor
object also for inferred reactivities, if these reactivities are based on a complete receptor sequences. Therefore aggregated scores of, e.g., a tetramer matrix (or something similar) could be attached as properties to an Receptor
object that contains full-length amino acid sequence of both chains, but they (the scores) cannot exist in isolation. Therefore we would need to discuss whether we would like to link to hypothetical receptors derived from such predictions or find a semantic distinction to link such information.
How do queries for reactivity work?
I see mainly two scenarios:
- Alice wants to know whether there is any reactivity data associated with a
rearrangement
. Assuming Reference Option (4) (see "What is in a reference") this would mean:- query
/rearrangemnt
for some features of arearrangement
, retrieving thecell_ids
- query
/cell
for thereceptor_ids
associated with the respectiveCell
- resolve the identifier and retrieve the Receptor description from an external repository
- query
- Bob wants to find
rearrangements
that are associated with a given reactivity. Assuming Reference Option (4) this would mean:- query the external repository for the reactivity, retrieve a
receptor_id
(which MAY be a CURIE) - query
/cell
for theCells
that contain thereceptor_ids
. Note that the endpoint will returnCell
objects that contain direct references toRearrangements
- retrieve the respective
rearrangement_ids
from/rearrangement
.
- query the external repository for the reactivity, retrieve a
Notes:
- Reference Option (3) would allow to skip the
/cell
query if API queries are allowed to return merged objects. - "external repository" means external to the ADC, otherwise this would be a standardized query to the
/receptor
endpoint - Assuming Reference Option (1), Alice will potentially get swamped with receptor annotations that are difficult to validate
from airr-standards.
Here is what we've arrived to with the VDJdb database: https://github.com/antigenomics/vdjdb-db
It lacks link to IEDB ids, but its a planned feature and is quite easy to add given a linear peptide sequence. Things with non-peptidic epitopes are far more complex. Moreover, the specificity can be inferred by other methods (e.g. expanded T-cell clones from culture).
from airr-standards.
text copied from #524
IEDB has released a beta query API for their database.
Of interest is they have a curie_map endpoint which returns a structure similar to our CURIEMap
. I've noticed however that the IRI goes to the human-readable html page instead of the API which returns machine-readable JSON. For example, IEDB_EPITOPE: 7355
resolves to:
https://www.iedb.org/epitope/7355
versus
https://query-api.iedb.org/epitope_search?structure_id=eq.7355
The nice thing is that now we should be to add a single field in Rearrangement if we want to link to an epitope.
Though that opens the question on whether we should put entries into our CURIEMap for resolving IEDB_EPITOPE
or if we should use IEDB's?
We also can consider how we might link with receptors. The IEDB API has tcr_search
and bcr_search
endpoints:
https://query-api.iedb.org/tcr_search?limit=5
https://query-api.iedb.org/bcr_search?limit=5
The receptor ids have IEDB_RECEPTOR
as their CURIE though oddly it's missing from the above curie_map. However, we might want to consider how we can link our Receptor
to IEDB's.
from airr-standards.
So my question would be, which objects in the AIRR data model do we connect with epitope and specificity?
It sounds like linking to Receptor
is the main one, but what about individual Rearrangements
? If I have a Rearrangement
that is a T-cell with a CDR3 of "AALIQGAQKLV" would you want in the AIRR TSV Rearrangement file the ability to record that that CDR3 has a known antigen or specificity (that is there is a TCR in IEDB with "chain1_cdr3_seq": "AALIQGAQKLV").
Does it make sense to add a field to the Rearrangement that is "receptor_id":"IEDB_RECEPTOR:57"
Or is that plain wrong...
from airr-standards.
@bussec just to be clear, I was suggesting a receptor_id
as an ID that points to external information - essentially another annotation as per tcrmatch if I understand correctly.
I did not mean to imply that it was the ID of the Receptor
object in the AIRR data model.
Not sure if that clarification helps or not, but my use of receptor_id
was ambiguous...
from airr-standards.
Therefore having a
receptor_id
in theRearrangement
object is a no-go.
To make sure I understand, this is because we'd need to put arrays in the receptor_id
fields which we are trying to avoid with the TSV format? Or it's because normally only one chain is sequenced, so identifying it with a specific receptor would be misleading? If you knew both chains, it seems reasonable to link them with receptor_id
in the TSV.
TRA, receptor_id==abc
TRB, receptor_id==abc
I likely confused the matter by mixing a biologically-measured receptor pair, with inferences about similar receptors and similar specificities with tools like tcrmatch. Those inferred links will need to be put in their own fields versus using receptor_id
which should be reserved for the biological receptor pair, but the ID nomenclature may be the same for both.
from airr-standards.
For the record, this query to IEDB looks for info for a TCR CDR3
curl https://query-api.iedb.org/tcr_search?chain2_cdr3_seq=eq.ASSIRSSYEQY
from airr-standards.
The main points originally raised by Bjoern are now included in #404.
from airr-standards.
Related Issues (20)
- Do we need adc_keywords HOT 1
- ADC API versions are confusing, and not right for AIRR v1.4.1 and v1.5.0 releases HOT 7
- R library validation does not handle properties with ontologies correctly
- deprecate OpenAPI 2 spec for ADC API? HOT 4
- Re-visit implementation decisions around `Cell` object HOT 30
- Add a "nonphysical" keyword to Rearrangement and Cell HOT 7
- reference libraries do not provide access to CURIEMap and InformationProvider for CURIE resolution HOT 2
- AIRR could use a specification of mutations counts/frequencies and maybe aggregates HOT 1
- specification of AA physiochemical properties of the CDR3
- Add consistency checks to make sure unit test data are identical between R and python HOT 3
- Remove "experimental" designation from Alignment schema? HOT 1
- Consider new event model for AIRR
- Add verification for Manifest to AIRR libraries
- How do I capture known antigen/epitope reactivity to AIRR objects (Rearrangement/Cell) HOT 18
- Validation of logical fields in R
- version number in spec files is not correct for 1.5.x HOT 20
- JavaScript Errors
- python error when using airr.read_rearrangement with gzip file HOT 1
- Incompatibility with Python 3.12 HOT 6
- Inconsistency in required elements and properties in `RearrangedSequence` and `UnrearrangedSequence`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from airr-standards.