Code Monkey home page Code Monkey logo

go-annotation's Introduction

go-annotation

This repository hosts the tracker for issues pertaining to GO annotations. For issues with the EBI GOA tools such as QuickGO, Protein2GO and the GOA database and annotations please email [email protected].

simple-report-system

Overview

This is an experimental GH Actions-based report runner.

As it stands now (and this still a bit of a mess) what we have is:

  • when an issue is opened
  • multiple GH workflows are triggered
  • if a workflow matches the label on the issue, it continues (otherwise skipped)
  • a matching workflow
    • greps out GO terms from the issue body.
    • makes annotation TSVs for the matched terms
    • puts them into a reports/ directory for the opened issue number
    • commits the reports back into main

Current reports by label

direct_ann_to_list_of_terms

A set of TSVs of the annotations directly annotated the given GO terms. This report also includes direct mappings to all terms in the list.

reg_ann_to_list_of_terms

A set of TSVs of the direct and indirect annotations over the regulates closure for the given GO terms. This report also includes mappings (direct and indirect annotations over the regulates closure) to all terms in the list.

Things to ponder

  • all sorts of fun triggers and actions can be thought of here
  • cleaning/archiving could be ticket closing
  • maybe use gist API (pass secret)
    • allow for (easier-to-access?)raw TSVs
    • could append link comments to ticket once produced
  • act on locks instead of open
  • other APIs, not just cheaping out on GOlr
  • remote trigger to bigger machines
  • grebe
  • more structured runner / delagation
  • editing / deleting output with the GH "code" editors; curator workflow control

SOP for housekeeping of this repo

We use labels to organize the work in this repo. These are useful for housekeeping; ie checking the status of old tickets, closing tickets that have been done, that are out-of-date, and reminding people of works that needs to be done.

label:annotation review

Query for tickets that were opened for ontology work:

  • is:open label:"annotation review" label:"direct_ann_to_list_of_terms","reg_ann_to_list_of_terms" (the comma is an OR).
  • Query for tickets that were opened - maybe for ontology work (a while ago, before the scripts existed), or for other reasons:
  • is:open label:"annotation review" -label:"direct_ann_to_list_of_terms","reg_ann_to_list_of_terms" (the comma is an OR).

Tickets will fall into a few broad categories

A. The ticket requested a review for a term obsoletion:

  1. If the go-ontology ticket is closed:
  • Open the associated Google spreadsheet, and if every annotation has been addressed, change the title to add DONE at the begining of the spreadsheet name; otherwise, put CLOSED at the beginning of the spreadsheet name
  • Add a comment to the ticket This term was obsoleted; remaining annotations will appear in GORULES error reports.
  • Example: issue-4639
  1. If the go-ontology ticket is still open:
  • Check the corresponding Google spreadsheet to see whether all reviewed have been done.
    • 2.1. If all annotations have been reviewed:
      • Change the title of the Google spreadsheet to add DONEat the begining of the file name
      • Close the go-annotation ticket
      • Add a comment to the go-ontology ticket: All annotations have been fixed and add the label ready.
    • 2.2 If some annotations have been not been reviewed:
      • 2.1.1 If the annotation review is > 6 months old: Change the title of the Google spreadsheet to add CLOSEDat the begining of the file name
      • 2.1.2 Close the annotation ticket, and add a comment to the go-ontology ticket Annotation Review is out-of-date and was closed.
      • 2.1.2 If the review is < 6 months old, ping the assignees that still need to review annotations. (People should not be pinged more than 2-3 times; after we assume that they will not get to the work).
      • Example: issue-4639
  1. If the go-ontology ticket is not linked in the go-annotation, search the go-ontology repo with the term ID or label. If there is no results, close the ticket with the comment that The corresponing go-ontology ticket does not exist.

B. The ticket requested a review, without requesting an obsoletion:

  • Check the corresponding Google spreadsheet to see whether all reviewed have been done.
  1. If all annotations have been reviewed:
    • Change the title of the Google spreadsheet to add DONEat the begining of the file name
    • Close the go-annotation ticket
  2. If some annotations have been not been reviewed:
    • Ping the assignees that still need to review annotations. (People should not be pinged more than 2-3 times; after we assume that they will not get to the work).
    • Ping the author of the ticket to see if the review is still valid.

C. The ticket doesn't formulate a clear request:

  • Ping the author of the ticket. This can be done 2-3 times, at > 2-3 months intevals. If there is no reply, close as not planned.
  • Consider removing old label:"annotation review" label:"direct_ann_to_list_of_terms" and/or label:"reg_ann_to_list_of_terms"

label:PAINT annotation and label:PAINT - waiting for primary annotation

  • Monitored by the PAINT annotation team.
  • If there has not been any comment on a ticket in > 1 year, ping the assignee(s) or close the ticket if it is out-of-date.

go-annotation's People

Contributors

antonialock avatar balhoff avatar cmungall avatar dougli1sqrd avatar edwong57 avatar kltm avatar mcourtot avatar monicacecilia avatar pgaudet avatar raymond91125 avatar suzialeksander avatar ukemi avatar valwood avatar vanaukenk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-annotation's Issues

More Annotation Camp Topics from WormBase

Here are some more annotation camp topics from WormBase:

1) Phenotype Penetrance and Process Terms

Most of our Biological Process annotations come from
analysis of
mutant phenotypes. For pleiotropic mutations, there
are often
many defects of varying penetrance. What is appropriate
curation
for defects that are weakly penetrant, but observed
nonetheless?
For example, in a paper on dauer formation at high
temperatures
(PMID: 14504222), a number of singly and doubly mutant
animals display a dauer phenotype with
penetrance ranging from 0% to 34% (Table 7). Should
all mutants with dauer formation above 0% be annotated
to a dauer formation process term, even if the
penetrance is quite low?

2) Process Curation for Gene Products like RNA Pol II

This next issue is somewhat related as it also concerns
pleiotropies.
For gene products whose primary function is well
established, yet whose mutant phenotype is pleiotropic,
how far away from the primary process is good GO
curation? For example, AMA-1, the C. elegans large
subunit of RNA polymerase II, is annotated to the
process term, transcription from Pol II promoter. Loss
of AMA-1 function in the early embryo results in
embryonic lethality with defects in tissue
differentiation, cell division cycles, and gastrulation
movements (PMID: 8812143). We have annotated AMA-1 to
the process terms related to these embryonic defects,
but how many
other transcription-dependent processes should we add?
Worms require transcription for vulval development,
post-embryonic cell migrations, male tail formation,
etc., but at what point do you stop adding process
terms for gene products with generalized functions like
RNA polymerase II?

3) Using the NOT qualifier

What is appropriate use of the NOT qualifier? Is it
intended to capture only those experimental results
that are really unexpected, or is it intended to also
capture negative results? Two examples: UNC-129 is a
TGF-beta-like signalling molecule required for cell
migration, but it does not appear to interact with the
only known Type II TGF-beta receptor in worms, DAF-4
(PMID: 11018016). This suggests that UNC-129 functions
in some way other than through the known TGF-beta
signalling pathway in worms. So, would this be an
appropriate use of the NOT qualifier:

NOT MF:transforming growth factor receptor binding IGI
with DAF-4

RNAi of one of the three class I histone deacetylases
in C. elegans results in embryonic lethality, while
RNAi of the other two has no discernible effect (PMID:
9875852). HDA-1 has been annotated to embryonic
development, but should the other two be annotated to
NOT embryonic development?

4) How should we deal with “data not shown”?

In general, what is the appropriate evidence code to
use when an experimental result is described in a paper
as “data not shown”. If it is clear from the paper
that the authors used the same assay as one they
describe in the paper, but for which they didn’t
actually present the data for the gene product you are
annotating, is IDA okay? If the statement is made in
the discussion section of a paper and you’re not
absolutely certain how the information was obtained
would you then use NAS?

5) Expression pattterns and IDA vs. IEP

To be certain, is cellular component information
generated from antibodies or reporters always annotated
using the IDA evidence code? Have other databases used
the IEP evidence code and if so, for what types of
experiments? We have been thinking of using microarray
data to annotate to a few select processes (aging,
dauer formation, spermatogenesis), but have not yet
done so. Is this the intended use of IEP?

6) IEA Annotation Maintenance

How much effort do other databases put into keeping
their IEA annotations up-to-date? WormBase is released
fortnightly and so theoretically (since gene models do
change with every release), we could update our IEA
annotations every two weeks. How often do other
databases update their IEAs? To what extent are IEA GO
annotations used by people who are doing informatics or
other types of analyses, and therefore, what is the
appropriate level of upkeep for IEAs?

7) Species- and Taxon-specific terms

One issue we’ve been grappling with at WormBase is how
to provide annotations of deep granularity without
having to introduce an abundance of species- or
nematode-specific terms. This point becomes especially
relevant when we think about using the very
well-defined anatomy of C. elegans. For example,
vulval development in C. elegans involves specification
of what is known as primary and secondary vulval cell
lineages. Different gene products are involved in each
of these separate specification processes, both of
which could be children of the existing GO term, vulval
development. But what is the cut-off for introducing
species-
or taxon-specific anatomy terms? While a term like
“specification of primary vulval cell lineages” does
not strike me as too species-specific, a term like
“specification of the embryonic
blastomere EMS” does. Can GO provide any more guidance
to annotators about species-specific terms, especially
with respect to anatomy? I know that cross-product
ontologies have been suggested as a solution to the
potentially infinite expansion of the gene ontology if
too many species-specific anatomy terms are introduced.
Are any of the MODs currently using a cross-product
ontology? WormBase has an existing anatomy ontology,
so would it be appropriate for us to start developing
and implementing a cross-product ontology for our
users? How would this ontology fit in with the rest of GO?

8) Appropriate use of sensu terms

Related to this, I need to annotate a C. elegans gene
product to the process term oogenesis. Two oogenesis
terms currently exist in GO: oogenesis (sensu Insecta)
and oogenesis (sensu Mammalia). Since I don’t have
nematode-specific child terms for oogenesis yet, should
I propose the general term, oogenesis, the more
specific term oogenesis (sensu Nematoda), or both?

9) Annotate to gene, protein, or transcript?

We would also like more guidance on the appropriate
selection for Column 12 of the gene association file,
DB_Object_Type. Some gene association files have “gene”
in this column, even when the annotation is clearly to
a protein (i.e., GO: CC: some intracellular complex,
Evidence code: IPI). We decided to place protein or
transcript in Column 12, but are now wondering if this
is correct. If a gene product is determined to have
kinase activity based upon sequence similarity, not
upon a direct biochemical assay, what is the correct
object type for column 12? Gene or protein?

Reported by: vanaukenk

Original Ticket: geneontology/annotation-issues/16

GOA: Interpro-GO citrate synthase

IPR002020 maps to
citrate (Si)-synthase activity (GO:0004108)
EC: 2.3.3.1

but the citrate synthase I have with this domain is
EC 2.3.3.8 ATP citrate synthase

so possibly needs to map to

(Amelia I can’t find a suitable paretn term for both,
could you check this out?)

cheers

Val

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/28

Incorrect annotations to viral terms.

We’ve been noticing that there are quite a few annotations of
viral terms e.g. ‘GO:0046718 : viral entry’ to host rather than
viral gene products.

I’m sure this is incorrect – by doing this we’re annotating
abnormal, pathological processes in the host.

For an example, see gps annotated directly to ‘GO:0016032 :
viral life cycle’ e.g. MKT1, SGD &quot;Protein involved in propagation
of M2 dsRNA satellite of L-A virus&quot; – surely this isn’t a normal
process for yeast? There are lots of other examples from many
different databases…

Reported by: jl242

Original Ticket: geneontology/annotation-issues/1

GO annotation camp issues from UniProt

1. Likely/potential functions:
Do other annotators assign GO terms to potential/likely
functions? If so, what code do you use? For example, in
pubmed 12077138, the protein has been shown
experimentally to bind to Gram-negative & Gram-positive
bacteria and authors suggest a function in defence
against bacterial infection. Should ‘defense response
to bacteria’ also be added?

2. Results from sequence analysis:
What about terms suggested by results from sequence
analysis programs? For example, if a protein is
predicted to be a type I membrane protein, should a
cellular_component term be assigned on this basis
alone?

3. Related terms:
Are there any easy ways of finding terms related to a
term you’re adding (apart from the ‘Often assigned
with’ feature in QuickGO)? When assigning a particular
term, it would be helpful to know if there are other
terms which should also be added with this term.

4. Protein complexes:
Are there any GO annotation practices for large protein
complexes?

5. Recombinant proteins:
Where a recombinant protein is purified and its
function assayed directly eg enzyme activity measured,
what evidence code should be used? Does it matter in
what system the protein is expressed eg this example is
a zebrafish protein expressed in bacteria – Pubmed
12755695.

Reported by: magrane

Original Ticket: geneontology/annotation-issues/19

interpro2go IPR000685->RuBisCo GO terms

I think the following interpro2go mappings may need to
be adjusted since they result in translations to
‘ribulose-bisphosphate carboxlyase’ related GO terms in
Zebrafish genes as well as several other vertebrate
species. It is my understanding that none of these
species have a ‘RuBisCo’ complex or activity as this is
related to photosynthesis in some way…

InterPro:IPR000685 Ribulose bisphosphate carboxylase,
large chain > GO:ribulose-bisphosphate carboxylase
activity ; GO:0016984
InterPro:IPR000685 Ribulose bisphosphate carboxylase,
large chain > GO:carbon utilization by fixation of
carbon dioxide ; GO:0015977
InterPro:IPR000685 Ribulose bisphosphate carboxylase,
large chain > GO:ribulose bisphosphate carboxylase
complex ; GO:0009573

-Doug

Reported by: doughowe

Original Ticket: geneontology/annotation-issues/29

Human Calreticulin Annotations

[This item has been transferred from the curator
requests tracker (905470) and was submitted by Pascale
Gaudet (pgaudet)].

Hello all,

The calreticulin gene of human (CRTC_HUMAN),
SwissProt ID P27797, is annotated with regulation of
transcription, DNA dependent, and transcription
corepressor activity. Is this correct?

Pascale

Reported by: jl242

Original Ticket: geneontology/annotation-issues/8

nitric oxide binding, regulator activity, regulation of NOS

Hello,

I have a gene product that interacts with and inhibits
the activity of nitric oxide synthase.
PMID: 8864115 and references therein

GO has only nitric oxide regulator acrivity, regulation of
nitric-oxide biosynthesis, positive and negative.

I’m interested in having terms for NOS binding and
inhibition. For consistency reasons the following terms
are proposed:

GO:0030235 – nitric-oxide synthase regulator activity
GO:new – nitric-oxide activator activity
Synonym: nitric oxide activator activity
Def: Increases the activity of a nitric-oxide synthase
GO:new – nitric-oxide inhibitor activity
Synonym: nitric oxide inhibitor activity
Def: Stops, prevents or reduces the activity of a nitric-
oxide synthase.

GO:0019899 – enzyme binding
GO:new – nitric-oxide binding
Synonym: nitric oxide binding
Def. Interacting selectively with a nitric-oxide synthase

GO:0050790 – regulation of enzyme activity
GO:new – regulation of nitric-oxide synthase
activity
Synonym: regulation of nitric oxide synthase
Def:Any process that modulates the activity of a nitric-
oxide synthase
GO:new – negative regulation of nitric-
oxide synthase activity
Synonym: negative regulation of nitric synthase activity
Def: Any process that stops, prevents or reduces the
activity of a nitric-oxide synthase
GO:new – positive regulation of nitric-
oxide synthase activity
Synonym: positive regulation of nitric oxide activity
Def: Any process that activates or increases the
activity of a nitric-oxide synthase

Reported by: vpetri

Original Ticket: geneontology/annotation-issues/7

IPR001800; Lipoprotein_6

This isn’t a mapping problem, but it is somthing GOA
may want to consider…

Prosite motif
ipr001800
maps to defense response
and
outer membrane (sensu Gram-negative Bacteria)

which are correct, but this motif gives a lot of false
positives in eukaryotic genomes.

Maybe you will need to apply taxa specific filters to
some mappings?

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/23

IPR002197/ Q9US09 FP

Q9US09 = S pombe btn1 which is the homologue of the
battenin family protein.

It has a false positive Interpro mapping to IPR002197
which is Helix-turn-helix, Fis-type, and pulling in a
GO mapping to transcription factor.

Not sure why this mapping becuase the ‘IPR003492
Batten’s disease protein Cln3’ covers the entire
protein?

cheers

Val

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/33

GO annotation camp issues from Alex/MGI.

This is just a short list of annotation issues of interest to me,
not necessarily to other curators at MGI, which I would like
discussed at the GO annotation camp.

1) Use of response to virus, bacteria, fungus terms for
immune responses, and detection of etc.

2) “extracellular space” for PM proteins versus
“extracellular” versus “external side of plasma membrane”
versus “extrinsic to plasma membrane” versus “cell surface”

3) IGI annotations and double knockout papers. PMID:
15066262 Neuron. 2004 Apr 8;42(1):23-36. PMID: 14738763
Immunity. 2004 Jan;20(1):37-46.

4) Proper annotation of signaling pathways.

\ Alex

Reported by: addiehl

Original Ticket: geneontology/annotation-issues/17

Three Annotation Issues for GO Camp from WormBase

Here are three annotation issues that I’ve encountered
while doing GO curation for the C. elegans UNC-31 protein.

Some Background:
When compared to yeast and mammals, the C. elegans
field still does relatively little biochemistry-type
experiments. The upshot of this is that many of our
molecular function, and to a lesser extent cellular
component, annotations are ISS annotations drawn from
direct assays performed in other species. The
following annotation issues come from reading a paper
(PMID: 11927595) about what I believe is the rat
UNC-31 ortholog, CAPS, in which the authors describe
results of direct assays regarding CAPS function and
subcellular localization. The rat protein has not yet
been annotated to any GO terms.

#1-CAPS is described as a cytosolic protein that also
associates with the plasma membrane, dense core
(secretory granule) membranes, and Golgi membranes. It
does not appear to be an integral component of these
membranes, but likely transiently associates with
distinct phospholipid components. What are the
appropriate annotations for this type of localization
which although transient, is critical to CAPS function?
Does annotating to plasma membrane, Golgi membrane,
and secretory granule membrane imply integral membrane
association? Would this be an example of when to use
the associates_with qualifier? Is an associates_with
qualifier going to be added to the accepted list of
qualifiers?

#2-CAPS is shown to exhibit nonspecific acidic
phospholipid binding and stereoselective PtdIns(4,5)P2
binding. In order to capture both functions, would
people assign both MF:phospholipid binding and
MF:phosphoinositide (4,5) bisphosphate binding even
though there would be redundancy in these annotations
since the latter term is a child of the former?

#3-When using the ISS evidence code, what is the
appropriate way to reference the annotation? Should I
list the original paper (in this case the rat paper) as
well as the C. elegans paper that describes the
sequence similarity? Just the rat paper? Just the C.
elegans paper?

Thanks,
Kimberly

Reported by: vanaukenk

Original Ticket: geneontology/annotation-issues/13

spkw sensory transduction-> sensory perception + editorial

I notice some Uniprot entries have a keyword
‘sensory transduction’. I suspect that from a pombe
perspective this really means signal transduction.

These currently map to GO:0007600 : sensory perception

this is incorrect as it has the parent
GO:0050877 : neurophysiological process

I suspect the definiton of’sensory perception is
inadequate…

“The series of events required for an organism to
receive a sensory stimulus, convert it to a molecular
signal, and recognize and characterize the signal.”

because from this the annotation seem fine, but to me
‘perception’ implies a brain?

maybe this should go on both trackers….

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/34

GOA:spkw lectin

maps to
heterophilic cell adhesion
this need removing

This isn’t true for S. pombe calnexin which is ER
membrane associated (involved in protein folding of
core-glycosylated trimmed ligands)

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/22

component annotations and transient association

from Val; originally arose by email, and crops up in curator request tracker item 856322:

Could we could also use the new ‘associated with’ for component
assignments to indicate somthing is sometimes ‘associated with’ (but not
a member of a complex) in the same way that a can be ‘associated with’
an activity ?

one comment (from yours truly):
I think we’ve been annotating both stably associated ‘members’ and
transiently associated things to component terms. Val’s suggestion would
let us distinguish between the two, without having to add a ‘complex
binding’ term.

examples:
- SPAC3A11.08 copurifies with signalosome (PMID:12695334)
I could annotate as associated with GO:0008180

has been called too confusing, but is worth another look because it would enable us ta make a distinction that is currently lost

Reported by: mah11

Original Ticket: geneontology/annotation-issues/4

interconversion or synthesis?

Hello,

Lauren Brinkac, a TIGR microbial annotator, has raised
an annotation question that I would like to get some
clarification on.

There is a set of proteins involved in the conversion
of the various nucleotides between their mono-, di-,
and tri-phosphate forms. (Ex. adenylate kinase,
thymidylate kinase).

At TIGR we annotate these to “nucleobase\, nucleoside
and nucleotide interconversion” GO:0015949 and its
children.
However, when I look at other group’s annotation, I
find that no one else (wth the exception of one FlyBase
gene which is not one of these kinases) is using these
terms.

I have seen other groups use the process term “ATP
biosynthesis” or “dTTP biosynthesis” (as appropriate
for the above proteins). I can see the logic behind
this assignment but it seems like the interconversion
terms are more precise.
In my Stryer text these reactions are described as
interconversions between the forms, so even though the
terms in GO don’t have definitions, it seems clear that
is what they were intended to be.

Should both the biosynthesis and interconversion terms
be assigned? Is there a reason other groups are
avoiding the interconversion terms?

Thanks for your time,
Michelle

Reported by: mlgwinn

Original Ticket: geneontology/annotation-issues/5

annotation query ATP-binding cassette (ABC) transporter acti

There is a gene in SGD (YBT1)
annotated to
GO:0004009
ATP-binding cassette (ABC) transporter activity
and
GO:0015125
bile acid transporter activity

I was wondering whether there were any guidelines when
annotating to 2 terms to represent one activity?

obviously
i) both annotations are correct
but
ii) both annotations are referring to the same
activity

This comment relates to the decomposition of compond
terms but i am not sure if the guidlines are entirely
clear at present to enable annotators to make the
correct annotations in cases like these.

For instance…..should the
comment for GO:0004009 explicitly say
&quot;consider annotating to the term specific for the
particular substrate&quot;
(analogous to GO:0005085) guanine-nucleotide
transferase activity which says
&quot;consider also annotating to guanine nucleotide binding

OR…..
should the annotators be able to see the additional
relationships
(and therefore reduce the liklihood of incomplete
annotation) by creating child terms which link these 2
terms

Fo example
bile acid transporter activity

ATP-binding cassette (ABC) transporter activity
—bile acid ATPase transporter activity
and
bile acid transporter activity
-
-bile acid ATPase transporter activity

other appropriate child terms for ATP-binding cassette
(ABC) transporter activity would be….

http://www.ebi.ac.uk/intenz/query?cmd=SearchEC&amp;ec=3.6.3

also with their appropriate substate terms as parents

I guess my question is what are the guidlines with
respect to creating child terms to link 2 terms
_within_an_ontology with respect to the ongoing
decomposition of terms

(b.t.w. i’m not keen on the name of the term for
ATP-binding cassette (ABC) transporter activity…I
suspect it has somehow been inherited from the Pfam
domain name
….wouldn’t somthing with ATPase in the term name be
better ? ….ABC transporter sounds like a product )

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/2

Hsp 27 binding

Hello,

I have a gene product that specifically interacts with
the heat shock protein 27 – hsp 27 – and exerts an
inhibitory effect upon its functioning.
PMID: 11546764 and references therein
Heat shock protein 27 participates in an number of
processes; its activity is mudlated via interactions with
a range of proteins

Proposed terms
GO:0005515 – protein binding
GO:new – Hsp 27 binding
Def: Interacting selectively with Hsp 27, a member of
the light weight heat shock proteins.

GO:0030188 – chaperone regulator activity
GO:new – Hsp 27 regulator acitivty
Def: Modulates the activity of light weight chaperone
proteins.

GO:0030190 – chaperone inhibitor activity
GO_new – Hsp 27 inhibitor acitrivy
Def: Stops, prevents or reduces the activity of light
weight chaperone proteins.

Thank you,

Victoria

Reported by: vpetri

Original Ticket: geneontology/annotation-issues/6

IPR003027 -> exonuclease (and SPKW)

I think this mapping should be removed.There is an
older paper which repoorts an exonuclease activity but
according to Paul Russell nobody belives it, and it
hasn’t been repeated. I haven’t even tracked down the
original source of this as the SGD annotation is from
an ISS…..does anybody know whereit was from
originally?

Also not sure about DNA binding…it is loaded onto DNA
but I dont think there is any covalent interaction,
though I’d have to check.

val

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/32

IPR003439 ->ATPase activity, coupled to transmembrane moveme

and membrane
and transport

These mapping need to be removed.

It would probably be best if this domain was renamed.
It is the ATP-binding region, and only should have the
mapping transport if it is accompanied by the
transmembrane regions.

This region is also present in a subunit of the CCR4
not complex, an mRNA export factor and a translation
elongation factor.

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/48

GO annotation camp issues from RGD

GO Camp annotation issues from RGD.

The PDFs for the articles have been sent to Midori.
.
Topics address issues such as:
Delineating a cellular component and
once a component term is assigned
how to best describe it?
Delineating scaffold activity
How to best handle and represent glutamate
receptor(s)
Gow to deal with splice variants
One process, various directions
Using GFP (green fluorescent protein) and
evidence code

Delineating a cellular component � what are the
boundaries, can they always be defined?
Generally a cellular component is a complex, an
organelle, a vesicle or a compartment. However,
sometimes there are just specialized structures located
at, near or associated with a cellular component. For
instance:

A characteristic feature of the glutamatergic synapse is
its asymmetry. Presynaptically, there is an electron-
dense specialization termed the active zone (AZ) where
synaptic vesicles are docked and fused with the plasma
membrane; postsynaptically there is an electron-dense
thickening of the membrane juxtaposed against AZ and
termed the postsynaptic density (PSD). Either is
constituted by specific proteins that aid in the
docking/fusion of synaptic vesicles and enable
glutamate release (AZ) or aid in the glutamate response
processes (PSD). A number of proteins are known to
be part of either specialization and PSD has been
estimated to contain over 300 proteins.

A notable feature of the proteins associated with either
specialization is their multi-domain structure and the
range of interactions they are involved in. The Shank
family of proteins � they contain Src homology 3
domain and ankyrrin repeat-containing, also known as
proline-rich synapse-associated protein/somatostin
receptor-interacting protein � whose range of direct and
indirect interactions is too long to cite here have been
dubbed master scaffolding proteins of the PSD, PSD
assembly is essential for the recruitment of AMPA,
NMDA receptors to the nascent synapse. Studies aiming
at monitoring the assembly characteristics of AZ and
PSD have used GFP (green fluorescent protein) fusion
constructs.
PMID: 14960624
PMID: 11509555

(1) Active zone (AZ) and postsynaptic density (PSD)
should deserve to be thought of valid cellular
components.
(2) Maybe scaffold activity term, PSD-related, should be
revisited. GKAP/Homer scaffold activity is the only term
in GO � studies have shown that members of Shank
family contact members of the metabotropic glutamate
receptors via Homer/Ves1. What do others think about.
We can talk about both Shanks and Homer members –
the latter represented by three genes with multiple
splice variants.
(3) How to look upon results using GFP tagged
molecules, evidence codewise. I think IMP is more
suitable than IDA; the fusion construct may have
properties/behavior different than either constituent
protein.

Two of the four subunits of the AMPA receptor have
two splice variants (I have a rat paper on the two splice
variants � the flip variant is preferentially expressed
during pre-natal development, the flop is postnatally).
AMPA is one type of ionotropic glutamate receptor.
Splice variants are also known for the GABA receptors (I
have a rat paper on GABA splice variants as well). GABA
is the major inhibitory neurotransmitter. Yet GABA
receptors, along with some olfactory receptors and
metabotropic glutamate receptors constitute one family
of the GPCR receptors.
PMID: 12379442
PMID: 14657159

(4) How do people handle splice variants? I know this
issue has been brought up before, but I think it is worth
talking about. Splice variants usually have a distinct
symbol � example subunit 1 has two splice variants, 1A
and 1B; they tend to have similar functions but differ
in detail � example different desensitization
susceptibilities and/or current potentials; finally as the
rat example suggests they may vary with respect to the
process they support.
(5) I also think we should revisit the glutamate receptor
term, which may also impact on the GPCR term(s). I�ll
have a diagram (hopefully) and we could discuss it.

Clathrin-mediated endocytosis is believed to be the
major route for synaptic vesicle recycling. Studies have
recently shown that the metabotropic glutamate
receptor mGlur5 – a member of Group I metabotropic
glutamate receptors � is endocytosed via a clathrin-
independent route.
PMID: 14985443
PMID: 12529370

(6) How could we reconcile the two routes within GO?
There is no term for clathrin-mediated endocytosis in GO
and I�ll submit a request for it in SourceForge, since is
the major endocytic route. But how about the other,
maybe non-clathrin endocytosis? What do others think?

SNAREs are membrane proteins that catalyze membrane
fusion � one the best known systems is involved in the
membrane fusion at the synapse (synaptic SNARE). The
proteins are defined by a characteristic ~70 residue
motif that includes eight heptad repeats and known as
the SNARE motif. Four SNARE motifs � one each from
synaptobrevin and syntaxin, and two from SNAP-25 �
form a tight complex that initiates membrane fusion.
PMID: 14529716

(7) The SNARE complex and how it may impart on the
SNARE activity terms.

Reported by: vpetri

Original Ticket: geneontology/annotation-issues/18

question on GO:0030254

Hello � this is Candace Collmer (email:
[email protected]) from the PAMGO (Plant-Associated
Microbe Gene Ontology) interest group. We are
beginning the work of developing suggested GO terms
for the Biological Process ontology that could be used to
annotate gene products of different plant pathogens �
bacteria, fungi, oomycetes, nematodes, etc. We are
currently working on an outline of higher order terms
(and definitions) that we hope will be applicable to both
plant and animal pathogens as well as to microbes that
enter into various types of symbiotic relationships with
hosts. We are also trying to find a way to integrate
these suggested new terms into existing categories in
the GO Biological Process ontology (e.g. host-pathogen
interactions, pathogenesis, etc.), and will soon send a
proposed scheme for your evaluation.
In the meantime, as we are working on this project,
we have a question about what gene products are
appropriate to annotate to an existing term in the GO
biological process ontology, GO:0030254 = type III
protein secretion system (definition: a bacterial
secretion system in which secretion occurs in a
continuous process without the distinct presence of
periplasmic intermediates; does not involve proteolytic
processing of secreted proteins). Given this is a
biological process term, can we annotate gene products
to it that are transported via the type III system as well
as those gene products that make up the injection
apparatus itself?

Reported by: ccollmer

Original Ticket: geneontology/annotation-issues/9

GO camp annotation issues from ZFIN

ZFIN

Areas for discussion at Cambridge GO camp:

PDFs of the 4 pubs cited in the following have been
sent to Mike Cherry.

1) How do we delimit a process? Which gene products
are involved in the process�which are not? A general
discussion of this issue would be great.

Specific example (ZDB-PUB-040216-17 ; PMID:14757644)
Phillips BT et al. 2004. Development 131, 923-931.

Wnt8<—>fgf3/8 ->otic placode/vesicle development

An epistatic relationship exists between wnt8 and
fgf3/8. wnt8 regulates expression of fgf3 and fgf8
and vice versa. Expression of fgf3/8 is necessary and
sufficient to induce formation of otic placodes and
vesicles. Loss of fgf 3/8 expression blocks otic
vesicle formation. Loss of wnt8 delays expression of
fgf3/8 and delays expression of preotic markers, but
otic vesicles form eventually.

a. Should wnt8, fgf3 and fgf8 all be annotated with
�otic vesicle formation�?
b. wnt8 increases expression of fgf3/8�Is this positive
regulation of FGFR signaling pathway?
c. What if a new pub comes along and shows that fgf3/8
are inducing expression of some other gene that is
required for otic visicle formation?

The decision to include a gene in a process seems
loosely defined and somewhat arbitrarily based on what
a paper publishes, how involvement in the process is
demonstrated, when it is published, and what else we
know about the process at the time.

2) How are others handling the overlap between
phenotype annotation and biological process GO
annotation? For example mutation of nic1
(muscle-specific Nicotinic AchR) results in abnormal
muscle fiber organization and motor axon extension.
Phenotype annotations involve abnormal muscle and
axons, but should GO for nic1 only be regarding muscle
development/morphogenesis, since nic1 is not expressed
in the axon? Involvement of nic1 in axon extension is
likely to be indirect, but users may reasonably expect
to find nic1 in a list of genes involved in axon
extension because of their familiarity with that
phenotype…

Similar to #1�how do we delimit the process?
Specific Example (ZDB-PUB-040510-5 ; PMID 15128655)
Lefebvre et al. 2004 Development 131(11): 2605-2618.

3) How do we annotate when a gene product has an
activity by itself (a homomeric ion channel complex for
example), but has altered activity or characteristics
in the presence of other subunits (heteromeric ion
channel complex for example)? Does a protein that
forms a homomeric complex have a molecular function
itself or is it contributing to the function of a
complex?
Specific Example (ZDB-PUB-040225-4 ; PMID 14970195)
Paukert et al. 2004 JBC 279(18):18783-18791)
page 18788

4) How do we annotate when a process is compromised by
alteration of gene A (IMP), but the severity or
penetrance of the phenotype is modified by alteration
of gene A and gene B (IGI)? This could be pairs of
morpholinos or mutants at multiple loci.
Specific Example (ZDB-PUB-030304-12 ; PMID 12591239)
Lekven et al. 2003. Dev. Biol. 254(2):172-187.
Page: 182

What if we were dealing with multiple alleles of a
single gene (IMP?).

5) New GO term creation: Should we create new terms
when a concept can be represented by several existing
terms? Creating new terms vs. waiting for slots
approach? Granularity of new terms? General
discussion of what would make a new term desirable or
undesirable in the big picture would be good.

6) If authors claim a protein is a transmembrane
protein based on a hydrophobicity plot, is that ISS, NAS?

Reported by: doughowe

Original Ticket: geneontology/annotation-issues/15

question about clathrin binding (GO:30276)

Hello,

The definition of clathrin binding is:
“Interacting selectively with clathrin, a 180 kDa
protein that is the main component of the coat of
coated vesicles and coated pits, and which also occurs
in synaptic vesicles.”

From the limited reading I have done, it looks like
clathrin is composed of 3 heavy subunits of 180-190
kDa, and 3 light subunits of ~25 kDa. (see PMID 10966473)

I am a bit confused because the definition talks about
the 180 kDa subunit, so I am wondering whether I should
annotate the light subunit to clathrin binding- but
somehow I don’t think that’s what the definition means.

Should we remove the molecular weight part of the
definition?

Thanks, Pascale

Reported by: pgaudet

Original Ticket: geneontology/annotation-issues/30

IPR008011 -> respiratory chain complex I (sensu Eukarya)

IPR008011 -> maps to respiratory chain complex I (sensu
Eukarya)
and
oxidoreductase activity, acting on NADH or NADPH
and

also IPR002023 both of these domains
maps to
NADH dehydrogenase activity
and
respiratory chain complex I (sensu Eukarya)

My summer student Lynda Groocock has pointed out that
this complex is a ~30-40 subunit complex.
But these 2 subunits are the only 2 which are present
in S. pombe, and only IPR008011 is present in S.
cerevisiae which is reported not to have respiratory
chain complex i (as far as we know).

Therefore, we think that the associated mappings should
be refined or removed.
possibly also the swiss-prot entries should be updated.

Both appear to be an iron sulpher binding proteins.

considering the following from PMID: 1518044

“No function has been ascribed to many of the subunits,
but some of the sequences indicate the presence of
hitherto unsuspected biochemical functions. Most
notably the identification of an acyl carrier protein
in both the bovine and Neurospora crassa complexes
provides evidence that part of the complex may play a
role in fatty acid biosynthesis in the organelle,
possibly in the formation of cardiolipin

…perhaps these 2 subunits are part of a subcomplex
which has other functions?

…also the deletion of IPR008011 containing protein in
S. cerevisiae is not viable rather than petite which
seems to suggest a function other than aaerobic
respiration in this yeast.

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/31

IPR001708 -> signal transduction

I think this needs removing

It is present in a mitochondrial inner membrane protein
required for cytochrome oxidase assembly

Required for the insertion of integral membrane
proteins into the mitochondrial inner
membrane. Essential for the activity and assembly of
cytochrome c oxidase. Plays a central role in the
translocation and export of the N-terminal part of the
COX2 protein into the mitochondrial intermembrane
space.

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/49

IPR001926-> ethylene biosynthesis

1-aminocyclopropane-1-carboxylate deaminase activity
3.5.99.7

from KEGG
http://www.genome.ad.jp/dbget-bin/show\_pathway?MAP00640+3.5.99.7

it looks like

amino acid derivative and/or
propanoate metabolism may be more suitable
(although I cant find a GO term for this process
grouping though?)

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/35

is antibody x-reaction an ISS?

I’d like to discuss the issue of whether or not people
think antibody cross-reactivity is good enough for ISS.
For example, I have been reading many papers that use
mouse, human and rat tissue or cell lines to assay the
function, location and process of a gene product that
is assayed for by cross-reactivity of an antibody. How
far should we carry the cross-species annotations based
on this kind of evidence?

David

Reported by: ukemi

Original Ticket: geneontology/annotation-issues/11

genes associated to nerve growth factor binding

nerve growth factor receptor binding
Accession:GO:0005163
Synonyms:
nerve growth factor receptor ligand
neurotrophin
NGF receptor binding

Genes: fgf2 and fgf5

How comes fibroblast growth factors are associated to
that term?

I suggest:

for the receptor side:

low affinity nerve growth factor receptor
Ensembl Gene: ENSG00000064300

tropomyosine related kinase A
ENSG00000117029

tropomyosine related kinase B
ENSG00000148053

tropomyosine related kinase C
ENSG00000140538

for the ligand side:

nerve growth factor
ENSG00000134259

brain derived neurotrophic factor
ENSG00000176697

neurotrophin 3
ENSG00000185652

neurotrophin 4
ENSG00000167744

Reported by: lenov

Original Ticket: geneontology/annotation-issues/10

IPR005467 False positive?

S. pombe pyruvate dehydrogenase (lipoamide) kinase
Q9P6P9 is picking up an Interpro mapping to

two-component sensor molecule activity

and

signal transduction from IPR005467

looks like this domain is a FP for Q9P6P9 (and the S.
cerevisiae orthologs), or the name of the family is too
broad.

cheers

Val

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/41

use of NOT

I just started using NOT and reilised I have used it
incorrectly.

I annotated a gene to
NOT Golgi to vacuole transport
based on the evidence of no mis-sorting of
carboxypeptidase Y in PMID: 14575697 suggesting that
the Pep12 homolog is not required for vacuolar protein
transport

I reilised it was incorrect to annotate somthing to NOT
based on this kind of negative evidence.
i.e. not required for but may be involved in
and
not required for this particular product (assay)

This may be an example where the annotation guidelines
could be extended to make the use of NOT expicit for
first time users.
with examples.

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/3

IPR002208 -> protein secretion

This is the same, really means endocytosis in GOspeak,
but this is too granular….

this is a translocon subunit
so suggested mappings

GO:0045047 protein-ER targeting

GO:0005784 translocon

protein translocase activity is OK

(it may be worth checking mappings to secretion
generally

In GO:0046903
The regulated release of a substance by a cell or group
of cells.
)

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/25

spkw -> microsome

Ev, I wonder how generally useful GOA mapping to
artefactual cell fractions are?

Kati, there are 4 pombe proteins which are picking up a
mapping to microsome. i think in all cases the Keyword
could probably be replaced. for eg one is signal
peptidase subunit

Reported by: ValWood

Original Ticket: geneontology/annotation-issues/27

Go camp annotation issues from FlyBase

Here’s some topics FB curators would like discussed at
the GO annotation camp:

1. Process Assays In Non-Drosophila Cells:

In FBrf0105796 (PMID:9857177), the authors expressed
Drosophila grim in mouse 3T3 fibroblasts, and showed
that grim induces death in these cells.
Because the assay is not performed in Drosophila cells,
is it ok to annotate to ‘apoptosis ; GO:0006915’ and
its children?

2. Cellular component Assays in Non-Drosophila Cells:

In FBrf0105796 (PMID:9857177), the authors go on to
examine the subcellular localization of Grim in the
mammalian fibroblasts. They find the Grim protein in
the cytoplasm and the mitochondria. Because the cells
are not Drosophila cells, is this assay adequate to
assign cytoplasm ; GO:0005737 and mitochondrion ;
GO:0005739 component terms?

Similarly, in FBrf0087337 (PMID:8524796) in an antibody
cellular localization assay, the authors show that
Drosophila Tra and Tra2 are found in speckles in
mammalian cells.

3. For immuno-localization experiments, when a
GFP-fusion protein is made, is this an adequate assay
to assign GO cellular component information, since it
is not always known whether the GFP tag affects the
subcellular localization of the gene product?

Eg in FBrf0132409 (PMID:11158320) and FBrf0167463
(PMID:14651932)

4. What evidence code do MODs use when curating
conference abstracts?

Do other MODs use just NAS for curating abstracts, or
is using IMP, ISS, IGI, IPI etc ok even though the
primary data is not shown.

E.g:
http://fbserver.gen.cam.ac.uk:7081/.bin/fbidq.html?FBrf0145796&resultlist=fbrf21903.data
Loss-of-function studies of amnesiac indicate that the
encoded peptide participates in the innate immune
response. Because the mutant data itself is not
presented, can the evidence code IMP be used?

http://fbserver.gen.cam.ac.uk:7081/.bin/fbidq.html?FBrf0133865&resultlist=fbrf22533.data
Similarly, mutations in Unc-76, disrupt axon cargo
transport. Though the primary data is not presented in
an abstract, can IMP be used?

The authors of this abstract also say that Unc-76 binds
kinesin in pull-down assays. Can we annotate to
‘kinesin binding ; GO:0019894 | IDA’ ?

http://fbserver.gen.cam.ac.uk:7081/.bin/fbidq.html?FBrf0106385&resultlist=fbrf22754.data
*E FBrf0106385 == hb000218.e == Danos et al., 1999, A.
Dros. Res. Conf. 40: 362A

The authors say ‘The protein is similar to vertebrate
TAK1s throughout, with a well conserved kinase domain
showing 56% identity and 73% similarity in amino acid
sequence to the mouse TAK1. Can we annotate to ’MAP
kinase kinase kinase activity ; GO:0004709 ; EC:2.7.1.-
| ISS’ ?

5. Do other MODs differentiate between the
function/process/location of an RNA and that of a
protein gene product?

For example, in flies a number of mRNAs are localized
during oogenesis (eg oskar and nanos mRNAs are
localized to the pole plasm). In our GO annotations we
have cellular component information relating to the
mRNA, and process information relating to the protein.
All are just attributed to the gene at the moment.

6. Similar to one of the WB issues: how to annotate
proteins that are on the surface of/associated-with a
cellular component.

In FBrf0110050 (PMID:10466937), mt:srRNA RNA iis
found"enriched on the surface of polar granules".
Does this count as being localized to a ‘polar granule
; GO:0018994’?

7. How do other MODs deal with conflicting GO data from
different sources?

Eg in FBrf0109981 (PMID:10385622) the authors claim
Acf1 gene product did not purify with components of
CHRAC. In FBrf0138380 (PMID11447119), the authors claim
that Acf1 is a component of CHRAC.

And the following 2 papers disagree on whether MBD-like
has methyl-CpG binding function or not: FBrf0130046
(PMID:10982856) and FBrf0123223 (10581020).

I’ll forward pdfs of the references here to Midori/Mike
as requested.

Thanks,
Becky

Reported by: beckyfoulger

Original Ticket: geneontology/annotation-issues/14

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.