Code Monkey home page Code Monkey logo

Comments (10)

schristley avatar schristley commented on July 17, 2024 1

So my question would be, which objects in the AIRR data model do we connect with epitope and specificity?

It sounds like linking to Receptor is the main one, but what about individual Rearrangements? If I have a Rearrangement that is a T-cell with a CDR3 of "AALIQGAQKLV" would you want in the AIRR TSV Rearrangement file the ability to record that that CDR3 has a known antigen or specificity (that is there is a TCR in IEDB with "chain1_cdr3_seq": "AALIQGAQKLV").

Does it make sense to add a field to the Rearrangement that is "receptor_id":"IEDB_RECEPTOR:57"

Definitely. Actually I think this was the original idea, like in the recent U24 proposal for links between AIRR rearrangements and IEDB. Now Receptor may be the more logical link, but I don't know if a rearrangement will always be able to be tied to a receptor or not.

For example, tcrmatch is an example tool that could suggest links between rearrangements and IEDB receptors.

At the very least, we (AIRR) should coordinate our plans for Receptor with IEDB as they also have the same conceptual object.

from airr-standards.

bussec avatar bussec commented on July 17, 2024 1

My 2 cents:

  1. Biologically, the reactivity of a Receptor is determined by both chains (i.e., two Rearrangements) unless proven otherwise. There are examples for one of the chains dominating the reactivity, but these are exceptions and should not guide our data model. Therefore having a receptor_id in the Rearrangement object is a no-go. Note that this is different from our discussion whether Cell and Receptor objects MAY be merged into a combined tabular structure for on-disk representation.
  2. I agree that we want interoperability with IEDB and in the short run this could be implemented via a direct reference to their database. But this should not stop us from developing a universal ID generation mechanism for Receptor objects.

from airr-standards.

bussec avatar bussec commented on July 17, 2024 1

We are probably talking about separate questions here:

What is in a reference?

(independent of the level of experimental/inferential support that the referenced object has)

What does a reference (OWL: Object Property) connecting a rearrangement with an external Receptor description signify?

  1. IS_COMPONENT_OF_ABSTRACT_RECEPTOR, i.e., this sequence would give rise to a component of the receptor described in the referenced record. This does neither imply that the actual receptor has been observered in the current experiment nor that there is any evidence that the annotated rearrangement has actually been paired with the other component REQUIRED for the receptor.
  2. IS_COMPONENT_OF_OBSERVED_RECEPTOR, i.e., this sequences gives rise to a compontent of a receptor that has been observered (directly or indirectly) within the study. This implies that
    a. all other REQUIRED components MUST also have been observed in the study and
    b. that these components MAY be grouped solely based on receptor_id.
  3. CONTAINS_RECEPTOR as merged reference from Cell, i.e., this sequence has been observered in a Cell together with all other REQUIRED components for the referenced receptor.
  4. nothing, i.e., such a reference does not exist in the Schema and we decide not to allow for merged objects in our data representations.
  • Ad 1: IMO there is nothing problematic in making this statement, but it will make datasets very noisy, as ANY match based of an single Receptor component MAY be annotated. Furthermore, it could be mistaken as a way to group rearrangements, which it is explicitly not. So in @schristley's example, the references would be a (weak) suggestion in what the rearrangement might be involved in. I am not convinced that this provides an added value to the user.
  • Ad 2: This introduces a separate grouping mechanism into the Schema and is at variance with what we discussed in #410, i.e. that there is no link in the Schema between Rearrangement and Receptor as Cell is the object responsible for observed and inferred pairing of receptor components.
  • Ad 3: This is closest to the current situation. It suffers from the fact that we have different representation and assumes that we do allow for merged objects.

What is in a receptor?

In terms of the Schema, it would not be a problem to use the proposed Receptor object also for inferred reactivities, if these reactivities are based on a complete receptor sequences. Therefore aggregated scores of, e.g., a tetramer matrix (or something similar) could be attached as properties to an Receptor object that contains full-length amino acid sequence of both chains, but they (the scores) cannot exist in isolation. Therefore we would need to discuss whether we would like to link to hypothetical receptors derived from such predictions or find a semantic distinction to link such information.

How do queries for reactivity work?

I see mainly two scenarios:

  1. Alice wants to know whether there is any reactivity data associated with a rearrangement. Assuming Reference Option (4) (see "What is in a reference") this would mean:
    • query /rearrangemnt for some features of a rearrangement, retrieving the cell_ids
    • query /cell for the receptor_ids associated with the respective Cell
    • resolve the identifier and retrieve the Receptor description from an external repository
  2. Bob wants to find rearrangements that are associated with a given reactivity. Assuming Reference Option (4) this would mean:
    • query the external repository for the reactivity, retrieve a receptor_id (which MAY be a CURIE)
    • query /cell for the Cells that contain the receptor_ids. Note that the endpoint will return Cell objects that contain direct references to Rearrangements
    • retrieve the respective rearrangement_ids from /rearrangement.

Notes:

  • Reference Option (3) would allow to skip the /cell query if API queries are allowed to return merged objects.
  • "external repository" means external to the ADC, otherwise this would be a standardized query to the /receptor endpoint
  • Assuming Reference Option (1), Alice will potentially get swamped with receptor annotations that are difficult to validate

from airr-standards.

mikessh avatar mikessh commented on July 17, 2024

Here is what we've arrived to with the VDJdb database: https://github.com/antigenomics/vdjdb-db

It lacks link to IEDB ids, but its a planned feature and is quite easy to add given a linear peptide sequence. Things with non-peptidic epitopes are far more complex. Moreover, the specificity can be inferred by other methods (e.g. expanded T-cell clones from culture).

from airr-standards.

schristley avatar schristley commented on July 17, 2024

text copied from #524

IEDB has released a beta query API for their database.

Of interest is they have a curie_map endpoint which returns a structure similar to our CURIEMap. I've noticed however that the IRI goes to the human-readable html page instead of the API which returns machine-readable JSON. For example, IEDB_EPITOPE: 7355 resolves to:

https://www.iedb.org/epitope/7355

versus

https://query-api.iedb.org/epitope_search?structure_id=eq.7355

The nice thing is that now we should be to add a single field in Rearrangement if we want to link to an epitope.

Though that opens the question on whether we should put entries into our CURIEMap for resolving IEDB_EPITOPE or if we should use IEDB's?

We also can consider how we might link with receptors. The IEDB API has tcr_search and bcr_search endpoints:

https://query-api.iedb.org/tcr_search?limit=5
https://query-api.iedb.org/bcr_search?limit=5

The receptor ids have IEDB_RECEPTOR as their CURIE though oddly it's missing from the above curie_map. However, we might want to consider how we can link our Receptor to IEDB's.

from airr-standards.

bcorrie avatar bcorrie commented on July 17, 2024

So my question would be, which objects in the AIRR data model do we connect with epitope and specificity?

It sounds like linking to Receptor is the main one, but what about individual Rearrangements? If I have a Rearrangement that is a T-cell with a CDR3 of "AALIQGAQKLV" would you want in the AIRR TSV Rearrangement file the ability to record that that CDR3 has a known antigen or specificity (that is there is a TCR in IEDB with "chain1_cdr3_seq": "AALIQGAQKLV").

Does it make sense to add a field to the Rearrangement that is "receptor_id":"IEDB_RECEPTOR:57"

Or is that plain wrong...

from airr-standards.

bcorrie avatar bcorrie commented on July 17, 2024

@bussec just to be clear, I was suggesting a receptor_id as an ID that points to external information - essentially another annotation as per tcrmatch if I understand correctly.

I did not mean to imply that it was the ID of the Receptor object in the AIRR data model.

Not sure if that clarification helps or not, but my use of receptor_id was ambiguous...

from airr-standards.

schristley avatar schristley commented on July 17, 2024

Therefore having a receptor_id in the Rearrangement object is a no-go.

To make sure I understand, this is because we'd need to put arrays in the receptor_id fields which we are trying to avoid with the TSV format? Or it's because normally only one chain is sequenced, so identifying it with a specific receptor would be misleading? If you knew both chains, it seems reasonable to link them with receptor_id in the TSV.

TRA, receptor_id==abc
TRB, receptor_id==abc

I likely confused the matter by mixing a biologically-measured receptor pair, with inferences about similar receptors and similar specificities with tools like tcrmatch. Those inferred links will need to be put in their own fields versus using receptor_id which should be reserved for the biological receptor pair, but the ID nomenclature may be the same for both.

from airr-standards.

bcorrie avatar bcorrie commented on July 17, 2024

For the record, this query to IEDB looks for info for a TCR CDR3

curl https://query-api.iedb.org/tcr_search?chain2_cdr3_seq=eq.ASSIRSSYEQY

from airr-standards.

bussec avatar bussec commented on July 17, 2024

The main points originally raised by Bjoern are now included in #404.

from airr-standards.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.