Code Monkey home page Code Monkey logo

modelseeddatabase's People

Contributors

cshenry avatar ctseto avatar fxe avatar jamesjeffryes avatar janakagithub avatar jeffkimbrel avatar mmundy42 avatar qzzhang avatar samseaver avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modelseeddatabase's Issues

Add ClassyFire compound classifications to biochemistry compounds

"ClassyFire is a web-based application for automated structural classification of chemical entities. This application uses a rule-based approach that relies on a comprehensible, comprehensive, and computable chemical taxonomy. ClassyFire provides a hierarchical chemical classification of chemical entities (mostly small molecules and short peptide sequences), as well as a structure-based textual description, based on a chemical taxonomy named ChemOnt, which covers 4825 chemical classes of organic and inorganic compounds. Moreover, ClassyFire allows for text-based search via its web interface. It can be accessed via the web interface or via the ClassyFire API."

Potentially very helpful for searching models and generating compound information
https://jcheminf.springeropen.com/articles/10.1186/s13321-016-0174-y

compartment indices don't match reaction DB

The compartment indices specified in compartments.master.tsv and compartments.default.tsv don't match those in reactions.master.tsv (or any reaction DB file, for that matter). The reaction DB files only contain 0, 1, and 2 for compartment indices in the equation, stoichiometry, and definition.

What the indices in the reaction DB files currently correspond to is unclear, since 1 is used interchangeably for extracellular and periplasm. 2 appears to be unique to Ton transport system reactions.

List of issues surrounding merging of Complex Roles

With the changes made in this commit: 3e061cc one can generate a full JSON object that contains both the ModelSEED and newly generated KEGG complex roles. Again, the output is not added within the scope of this repository, as I guess its home should be in the ProbModelSEED repository.

There are several issues:

  1. Several role aliases matched a ModelSEED role that did not have the same name as the new role name, these are conflicts and have to be resolved.

  2. I did not include role and complex aliases yet

  3. We have to come up with a consistent approach to using identifiers for both complexes and roles in ProbModelSEED as its currently a mixture of formats.

  4. The file that Mike generated has several possible sources for the "type" field in a complexrole. As of now, I'm using the last type column that is labelled "type", but we will need to re-visit this once the specifications for the Mapping object in a PATRIC workspace has been updated.

Lots of reactions with status CPDFORMERROR

In the master reactions file there are 34,701 eactions of which 6,350 have a status of CPDFORMERROR. The CPDFORMERROR status is set when at least one compound in the reaction does not have a formula or the formula is determined to be invalid. All of the reactions with status CPDFORMERROR are because of the no formula condition.

A quick check showed that some of the reactions are transport reactions for the compound. For example, rxn39240 has stoichiometry -1:cpd30677:0:0:"D-Glucosamine Hydrochloride";1:cpd30677:1:0:"D-Glucosamine Hydrochloride". It has status OK before running the Update_Reaction_Status.pl script but has status CPDFORMERROR after because D-Glucosamine Hydrochloride does not have a formula in the master compounds file.

In the master compounds file there are 27,692 compounds of which 5,068 do not have a formula.

Should there be a special case for transport reactions so that reactions that are transporting one compound between compartments have a status of OK even if there is no compound formula?

Otherwise the reactions aren't going to be usable for models.

Curation of reversibility in database

This is a placeholder to remind us to discuss how we'd want to update and curate any reaction direction and thermodynamic reversibility in the biochemistry and in the templates. We came across a good example:

rxn00929 (1) NAD[0] + (1) L-Proline[0] > (1) NADH[0] + (2) H+[0] + (1) 1-Pyrroline-5-carboxylate[0]

In all the pathways I've observed, the reaction is the last step in proline biosynthesis, but the deltaG and thermodynamic heuristics list it as irreversible going from left to right. It needs to be at least reversible for proline biosynthesis to occur, at least in eukaryotes.

General SBML unit question about second multiplier

Hi

I have a general SBML question about unit tag and the value of multiplier for second.
Most models use mmol_per_gDW_per_hr, they define mole, gram, and second.
They all use exponent="-1" for second because this is "per hour".
But they do not use the same value for multiplier, to go from second to hour.

For BiGG models and some others, the multiplier value is 3600.
For SEED and BioCyc models, the multiplier value is 1/3600.

What is the right value for second in mmol_per_gDW_per_hr ?
The SBML documentation about that is poor.

Malformed InChI

From @smoretti on October 14, 2016 13:40

We found about 1208 InChI strings that cannot be read by the software Chemaxon from the ModelSEED compound data.
E.g.

  • cpd00194 InChI=1S/C12H14N2.Cl2H2/c1-13-7-3-11(4-8-13)12-5-9-14(2)10-6-12;;/h3-10H,1-2H3;2*1H/q+2;;
  • cpd00271 InChI=1S/C6N6.Fe/c6*1-2;/q;;;;;;-3
  • cpd00418 InChI=1S/NO/c1-2/rad

Copied from original issue: ModelSEED/ModelSEED-UI#60

Wrong InChI for cpd00030

From @smoretti on May 10, 2017 10:57

According to its name (Mn2+) and charge (+2), InChI string of compound cpd00030 is wrong.

Should be InChI=1S/Mn/q+2 and not InChI=1S/Mn

Mn(III) alias should not be associated either.

Maybe it is Mn atom, without charge, because Manganese(2+) (cpd20863) and Manganese(3+) (cpd20864) already exist in SEED.

Copied from original issue: ModelSEED/ModelSEED-UI#64

Stable ObjectAPI codebase needed

Was recently unable to run Print_Master_Reactions_From_Files.pl, the error was innocuous and can easily be fixed, but stemmed from small differences in the ProbModelSEED and KBaseFBAModeling ObjectAPI.

This issue is a reminder that we need to make sure that the codebase is identical between the two codebases, with respect to creating and loading biochemistry objects. This is something I can work on, but not just yet.

move database ids to separate fields

Creating this ticket as a reminder. It would be easier and probably more scalable to have KEGG, Metacyc, and AraCyc ids as their own fields, instead of being in aliases.

Templates/Core files have inconsistent compartment names

I'm using Build_Model_Template.py to create template files for the ProbModelSEED software. It has run successfully for GramNegative, GramPositive, and Microbial, but failed on Core:

Traceback (most recent call last):
File "./Build_Model_Template.py", line 91, in
helper.readReactionsFile(reactionsFile, includeLinenum=False)
File "/users/tfarrah/ModelSEEDDatabase/scripts/TemplateHelper.py", line 453, in readReactionsFile
raise CompartmentNotFoundError('Compartment %s not found in current list' %(compartmentIds[cindex]))
TemplateHelper.CompartmentNotFoundError: Compartment c not found in current list

I saw that, for Core, Compartments.tsv lists compartments c0 and e0, but Reactions.tsv uses compartments c and e.

To fix, I replaced all instances of c0 with c in Compartments.tsv and BiomassCompounds.tsv

Is this a reasonable fix?

Placeholder issue for decisions regarding use of reaction identifiers in ModelTemplate

I'm currently starting to re-build the Plant ModelTemplate. Its not so much that I'm changing the list of reactions, but I'm changing how compartmentalization is assigned to reactions, as there'll be some manual curation to that effect.

I'm doing this with the current version of the ModelTemplate, pending any changes made to the new version that'll be published with ProbModelSEED, but it serves a way for me to organize the data in the meantime.

But the whole point of this issue is that I'm also bearing in mind that we're working towards a unified and compartment-free biochemistry, and as such, I'm trying to anticipate which reaction identifiers I should use.

In the case where two identical reactions are otherwise in different compartments, I will use the reaction id with the lowest number. In the ModelTemplate, I will create duplicate template reactions, but otherwise assign different compartments.

To an extent, this is obvious, but I wanted to be explicit about it.

Malformed EC strings in JSON

From @smoretti on October 7, 2016 14:47

Hi

There are lots of badly formatted EC numbers in the JSON returned by the API.
Some have spaces in front of the EC number; some have several EC number per entry in the EC JSON array, separated by

  • ** or **
  • ** / **
  • several white spaces

some EC strings end with

  • ,
  • ) characters

Copied from original issue: ModelSEED/ModelSEED-UI#56

Invalid value in deltag and deltagerr fields

There are 1088 compounds in the Biochemistry/compounds.default.tsv and Biochemistry/compounds.master.tsv files where the value of the the deltag field is "deltaG" and the value of the deltagerr field is "deltaGErr". Should the values be reset to 10000000 to indicate the value is unknown?

What data should be included in a compartment-free Biochemistry object?

Currently a Biochemistry object has the following data:

  • compounds
  • reactions
  • compartments
  • cues
  • compoundSets
  • reactionSets
  • compound_aliases
  • reaction_aliases

Questions:

  1. With reactions defined in a compartment-free notation, are compartments still needed in the Biochemistry or can they only be included in Model Templates?
  2. Do we need to define source files for cues, compoundSets, and reactionSets so they can be included in the Biochemistry?
  3. Does the Biochemistry object need to include compound_aliases and reaction_aliases or do the aliases only need to be available via SOLR?

How did status field get updated in reactions.master.tsv?

I'm trying to verify my changes to the Print_Master_Reactions_List.pl script and I'm not sure how the status field changed in the reactions.master.tsv file. Looking at rxn00001, in reactions.default.tsv the status is "OK":

rxn00001    PPA Pyrophosphate phosphohydrolase  (1) cpd00001[c] + (1) cpd00012[c] <=> (2) cpd00009[c]   -1:cpd00001:c:0:"H2O";-1:cpd00012:c:0:"PPi";2:cpd00009:c:0:"Phosphate";2:cpd00067:c:0:"H+"  0   (1) cpd00001[c] + (1) cpd00012[c] <=> (2) cpd00009[c] + (2) cpd00067[c] (1) H2O[c] + (1) PPi[c] <=> (2) H+[c] + (2) Phosphate[c]    >   =   null    null    null    null    -5.391  1.22474 cpd00067;cpd00001;cpd00009;cpd00012 OK

And in reactions.plantdefault.tsv the status is "OK":

rxn00001    R00004  diphosphate phosphohydrolase    (1) cpd00001[c] + (1) cpd00012[c] <=> (2) cpd00009[c]   -1:cpd00001:c:0:"H2O";-1:cpd00012:c:0:"Diphosphate";2:cpd00009:c:0:"Orthophosphate";1:cpd00067:c:0:"H+" 0   (1) cpd00001[c] + (1) cpd00012[c] <=> (2) cpd00009[c] + (1) cpd00067[c] (1) Diphosphate[c] + (1) H2O[c] <=> (1) H+[c] + (2) Orthophosphate[c]   =   =   null    null    null    null    4.14    1.22    cpd00012;cpd00009;cpd00001;cpd00067 OK

But in reactions.master.tsv the status is "OK|HB":

rxn00001    R00004  diphosphate phosphohydrolase    (1) cpd00001[0] + (1) cpd00012[0] <=> (2) cpd00009[0]   -1:cpd00001:0:0:"H2O";-1:cpd00012:0:0:"PPi";2:cpd00009:0:0:"Phosphate";1:cpd00067:0:0:"H+"  0   (1) cpd00001[0] + (1) cpd00012[0] <=> (2) cpd00009[0] + (1) cpd00067[0] (1) H2O[0] + (1) PPi[0] <=> (2) Phosphate[0] + (1) H+[0]    =   =   null    null    null    null    4.14    1.22    cpd00012;cpd00009;cpd00001;cpd00067 OK|HB

I'm guessing this related to commit 51fb37e but I'm not sure how the updated reactions.master.tsv in the commit was generated.

URL for linking to ModelSEED compounds and reactions?

Hi folks,

I would like to link MetaCyc reactions and compounds to their equivalents in ModelSEED. Is there a ModelSEED website that has a unique URL for each compound or reaction identifier? The URL for the Aliases file in Github could sort of work, but I was hoping for something a bit more permalink like.

If not, when do you anticipate there will be one?

Sincerely,

Jeremy

Are (S)-3-Hydroxyisobutyrate / (S)-3-Hydroxyisobutyric acid really what they are supposed to be?

Hi

I wonder if (S)-3-Hydroxyisobutyrate (cpd19043) and (S)-3-Hydroxyisobutyric acid (cpd23281) are really what they are supposed to be.

  • (S)-3-Hydroxyisobutyrate (cpd19043) and (S)-3-Hydroxyisobutyric acid (cpd23281) have different stereochemistry. They look more like S and R forms instead of acid/base forms.
  • (S)-3-Hydroxyisobutyrate (cpd19043) reference to KEGG:C06001 is for S form; reference to MetaCyc:CPD-12176 is for the R form.
  • (S)-3-Hydroxyisobutyric acid (cpd23281) reference to MetaCyc:CPD-12175 is for the S form but cpd23281 InChI is for the R form.

Regards

Build_Model_Template.py failing for Mycobacteria

I am using Build_Model_Template.py to create template files for the ProbModelSEED software. It failed when building a template for Mycobacteria:

Traceback (most recent call last):
File "./Build_Model_Template.py", line 91, in
helper.readReactionsFile(reactionsFile, includeLinenum=False)
File "/users/tfarrah/ModelSEEDDatabase/scripts/TemplateHelper.py", line 488, in readReactionsFile
raise ReactionFormatError('Reaction %s on line %d has invalid stoichiometry "%s" in master reaction' %(reaction['id'], linenum, masterReaction['stoichiometry']))
TemplateHelper.ReactionFormatError: Reaction rxn34330_c0 on line 17573 has invalid stoichiometry

Digging in a little, I found that the failure was in the following line of code:
compCompound = self.addCompCompound(parts[1], compartmentIds[compartmentIndex])

... and that it was failing for the following reaction part from Biochemistry/compounds.master.tsv:
-1:cpd30510:2:0:"photon (380 to 750 nm, visible spectrum)

Here, the value of compartmentIndex is 2 -- an unusual value for this field.

I stopped investigating at this point. Hoping you can take it from here and fix whatever needs to be fixed in order for me to build this template file.

Merge all alias files

Biochemistry
/Aliases
We want to keep only aliases for main databases, clean current published models and keep only BIGG
Rxn_aliases.tsv: modelseed_id, ext_id, source
Cpd_aliases.tsv: modelseed_id, ext_id, source

inconsistent orders in equation/definition

I'm trying to make compounds in equations linkable as in the following screenshot, but there's no easy to way to do this since the order of ids is different from the order of names.

Here's an example from https://www.patricbrc.org/api/model_reaction/?http_accept=application/solr+json&eq(id,rxn00017):

(2) cpd00003[c] + (2) cpd00165[c] <=> (2) cpd00004[c] + (2) cpd00067[c] + (1) cpd01252[c]

vs

(2) Hydroxylamine[c] + (2) NAD[c] <=> (2) H+[c] + (1) Hyponitrite[c] + (2) NADH[c]

Anyway we can make these match? @mmundy42 , this may be related to the get_model method.

screen shot 2015-07-15 at 8 46 05 am

Curate/fix template mappings

It's been reported that there's are template mappings between a reaction's EC number and the EC number of the functional role its linked to in the template complexes.

Here's an example:

rxn02282 (EC 3.5.3.5; http://rest.kegg.jp/get/R03188) --> cpx34518 --> ftr34193 --> allantoate deiminase (EC 3.5.3.9)

Place-marker for MetaCyc reaction RXN-1623 (rxn37218)

Firstly, I'm not going to attempt to fix this reaction directly right now, though I've now come to understand what happened, as it's only available in the plant template.

This reaction was brought to my attention because it's the last of the reactions in the ModelTemplate which was previously balanced, and is now mass-imbalanced. It's one of the reactions in a BioCyc database where I attempted to "expand" from the generic compound classes into a series of balanced reactions with individual compounds that have a defined structure (and successfully so in many cases).

But what happened in this particular case is that the compounds in question, being the most commonly addressed in metabolism (for want of a better phrase, i.e. 1-oleyl-2-lyso-phosphatidate is one of the most common components of the cellular membrane) were also assigned the generic names, such as L-1-lysophosphatidate, and thus somehow being merged with the generic compounds with no defined structure.

My problem really is that the compound that is represented here:

http://metacyc.org/META/NEW-IMAGE?type=NIL&object=L-1-LYSOPHOSPHATIDATE

has been completely subsumed somehow, and I'll have to add a whole new compound object back into the compounds list.

Question on compounds and reactions master files

@samseaver, I'm confused on the compounds.master.tsv and reactions.master.tsv files in the Biochemistry directory. I ran the Print_Master_Reactions.pl script and it generated a Master_Reaction_List.tsv file. When I compare that file with reactions.master.tsv they are not the same.

It looks like the script is getting the default and plantdefault biochemistry by downloading from a KBase workspace and also processing the reactions.default.tsv and reactions.plantdefault.tsv files and applying the modifications from reactions.master.mods to generate Master_Reaction_List.tsv.

I haven't tried the Print_Master_Compounds.pl script but it looks similar.

Can you explain the roles of the various files for me.

Glutathione tautomerization

A collaborator of mine noticed that the glutathione(cpd00042) is represented with enols, rather than as a standard keto peptide bond. I could change update the TSV files submit a pull request but I'm realizing that this might cause changes to calculated properties ect. What's the best way to go about updating these compounds?

PlaceHolder for discussing issues in building ModelTemplates

In ccf6164 I submitted a script for building the ModelTemplates from the data downloaded from the workspace. It's not finished.

Major points:

  1. I believe I have every field in every sub-object defined here, so even if it's empty (or set to a default value in some cases), it should be present in the resulting JSON object.
  2. I've not yet made an effort to define references and identifiers, and I'll likely add a section to the ReadMe to document/discuss this. So for now, the object's biochemistry is largely unconnected. For the same reason, the compcompartments list is undefined, as it would be full of duplicates.
  3. The compartments are defined in the ModelTemplate, but, by reading in the master reactions file, I'm reading generalized reactions, so I will need to expand the script to read in the reagent compartmentalization from the default files. This will have to serve as a starting point for the compartmentalization but ultimately, each ModelTemplate will be taken in its own direction, and the default files will be made redundant.
  4. Both Aliases (needs coding in the default files) and Pathways have not been done yet.

Untrimmed names/aliases and duplicated aliases

From @smoretti on December 16, 2016 8:34

Some reactions have untrimmed (space characters in front or at the end of string) names and aliases.

E.g. rxn13772 or rxn12186

This produces duplicated aliases (e.g. rxn12186) one name: with the name prefixed with a blank character, and one name: with the right name without blank prefix.

Copied from original issue: ModelSEED/ModelSEED-UI#62

potential cpd issue

User submitted...

In the meantime I have found a compound entry with some possible errors: cpd11684

This compound looks to be the same than CHEBI:16042 (http://www.ebi.ac.uk/chebi/searchId.do?chebiId=16042) or http://metacyc.org/META/NEW-IMAGE?type=COMPOUND&object=Halide-Anions

So it can be bromide, chloride, fluoride or iodide.
In consequence its mass should be unknown because is the mass of one of those 4. Not 1 for sure.
And its charge should be -1 according to ChEBI and MetaCyc.

List of reaction issues

Strange violations of mass-balance
rxn05842 rxn05845 rxn05844 rxn07227 rxn05843
Each reaction is involved in the interconversion of "Phosphate" into either ADP, CDP, GDP, NDP, UDP.

The full reaction on KEGG is: RNA + Orthophosphate <=> RNA + X, where X is ADP, CDP, GDP, NDP.

Unusual "either-or" cases

rxn: 023270 and 3717[23] all endo-fenchol dehydrogenase. 232720 has an unusual metabolite cpd27638:0:0:"NAD-P-OR-NOP" with mate cpd27640:0:0:"NADH-P-OR-NOP", which seems to be the equivalent of rxn37172 and rxn37173.
rxn04701: NADPH/NADP
rxn17098: :cpd27640:0:0:"NADH-P-OR-NOP";
rxn35621: NADH

rxn19954 :cpd27640:0:0:"NADH-P-OR-NOP";
rxn19955 :cpd27640:0:0:"NADH-P-OR-NOP";
rxn19956
rxn19957
rxn26478 :cpd27640:0:0:"NADH-P-OR-NOP";
rxn26479 :cpd27640:0:0:"NADH-P-OR-NOP";


Duplicates because of ambiguous metabolites
rxn02336 and and rxn19950 duplicates, differ by S&P cpd00826:0:0:"(-)-trans-Isopiperitenol"...cpd01540:0:0:"Isopiperitenone" and cpd00826:0:0:"(-)-trans-Isopiperitenol" and cpd19023:0:0:"(-)-Isopiperitenone"

rxn16471: forms cpd21001:0:0:"Patchoulol"
rxn20955: forms cpd06598:0:0:"Patchouli alcohol"
(Both are likely the same alcohol)

rxn02344: forms cpd00874:0:0:"(3S,5S)-3,5-Diaminocaproate"
rxn19233: forms cpd22771:0:0:"(3S,5S)-3,5-diaminocaproate"

rxn01981-1:cpd00506:0:0:"gamma-Glutamylcysteine";1:cpd00084:0:0:"L-Cysteine";1:cpd01293:0:0:"5-Oxoproline"
rxn06472: -1:cpd12381:0:0:"(5-L-Glutamyl)-L-amino acid";1:cpd01293:0:0:"5-Oxoproline";1:cpd11623:0:0:"L-Amino acid"
(rxn01981 appears to be a more specific case of rxn06472, which is generic amino acid)

Identical chemistry (column E), different names (Column B)
rxn02111 & rxn15382: Duplicates of borneol:NAD+ oxidoreductase
rxn07233 rxn14240 (07233 uses free hydronium as the proton donor, 14240 uses cpd11844:0:0:"HX" as the proton donor)
rxn07231 rxn14116 (same as above)
rxn02954 rxn02955 (one uses PPi, the other uses Phosphate)
rxn25055 rxn06940 (-2:cpd27005:0:0:"ETF-Oxidized";1:cpd06710:0:0:"E-Phenylitaconyl-CoA";2:cpd27006:0:0:"ETF-Reduced" vs -2:cpd12505:0:0:"Electron-transferring flavoprotein";1:cpd06710:0:0:"E-Phenylitaconyl-CoA";2:cpd12576:0:0:"Reduced electron-transferring flavoprotein", identical biochemically)

Assessing rxn04820, rxn15643 and rxn15644 they differ in stereochemistry. Drawing structures, it would appear the diols originate on sp3 hybridized carbons and must be cis /or/ trans, and rxn04820 is a little unclear as to the nature of the product. The reaction rxn15643 and being cis (above and below the plane) and rxn15644 being trans (two forms, hydroxyl group proximal to benzyl ring above or below plane) forms each have two possible products. These are (1R,2S)-;(1S,2R)-; (1R,2R)-; (1S,2S)-; dihydronapthalene-1,2-diol.

The first paper that mentions this particular diol involves napathlene products added to rat and rabbit diets, making this a potentially relevant product for human metabolic models. Young et al's 1947 paper reports optical activity of the isolated compound (possibly compounds, in the absence of chromatography) as -154 degrees in ethanol and -328 in water. Having some difficulty determining whether or not the stereochemistry of the experimentally isolated compound, so perhaps it may make sense to account for all four stereoisomers of cpd03712.

The metabolites implicated in above also appear in the following reaction set:
rxn04821 rxn15645 rxn15646 rxn32737

Possible bug with reported deltaG using NADPH

I've just discovered two pairs of reactions that use NADH and NADPH but deltaG is wrong for the second (NADPH) one:

seaver@twig:~/Projects/ModelSEEDDatabase/Biochemistry$ cat reactions.tsv | cut -f1,8,9,10,15,16 | grep rxn04113
rxn04113        (1) NADPH[0] + (1) H+[0] + (1) 1-Hydroxy-2-methyl-2-butenyl 4-diphosphate[0] <=> (1) H2O[0] + (1) NADP[0] + (1) Isopentenyldiphosphate[0]   >
        >       -16.86  1.23
seaver@twig:~/Projects/ModelSEEDDatabase/Biochemistry$ cat reactions.tsv | cut -f1,8,9,10,15,16 | grep rxn08756
rxn08756        (1) NADH[0] + (1) H+[0] + (1) 1-Hydroxy-2-methyl-2-butenyl 4-diphosphate[0] <=> (1) H2O[0] + (1) NAD[0] + (1) Isopentenyldiphosphate[0] <   <
        16.86   1.23

I now believe there might be a bug in my original code for heuristically handling phosphorylated compounds. I haven't got the time at the moment to follow up on this, but we will be migrating that code into Python, in this repository, and re-running it to double-check for any problems such as this one.

Plan for biochemistry database

This issue will serve to be a place-holder for the list of things I need to check in the Biochemistry database, to ensure its quality.

Generally speaking, there's several goals for the biochemistry:

  1. All conditional/universal reactions in the ModelTemplates have an "OK" status.
  2. All gapfilling reactions in ModelTemplates have an "OK" status.
  3. All reactions from various sources are appropriately merged.
  4. As many reactions as possible have an "OK" status.

The first two goals are the highest priority for testing and release in ProbModelSEED.

However, it then follows that there's a series of sub-tasks that arise from trying to achieve these goals:

  1. Editing compound charge and formula
  2. Merging compounds
  3. Splitting compounds (tricky)
  4. Checking compound structures
  5. Checking compound aliases

In case #3, there are compounds where there was an incorrect merge between database sources (usually discovered via aliases) and "splitting" them may result in the generation of a new compound object (and identifier), and consequently in new reactions (and identifiers). Where possible, I'll attempt this last.

Integrating Pathway Data

This is from an old email of Chris' that I left open, I'm initiating some of this via a commit coming today, but posting Chris' email as a placeholder of the discussion:

"I would like us to deal far more thoroughly with pathways, including such data from KEGG, MetaCyc, and BIGG. In playing with BIGG recently, I realize they now offer easy download of virtually all their data, and we should exploit this.

I propose we have an entire primary table dedicated just to pathways, with the following columns:

Name: primary human readable name for the pathways
ID: we should assign our own consistent IDs to pathways (e.g. path.1, path.2 etc)
Source: where the pathway came from (e.g. KEGG, MetaCyc, BIGG)
Source_ID: what is the ID of the pathway in the database it came from
Aliases: list of aliases for the pathway
Reactions: list of reactions in the pathway

Pathways are useful in a 1000 ways and not handling them thoroughly has been a real impediment for us.

I can think of many pathway sources we could integrate: KEGG, MetaCyc, BIGG, Subsystems, Scenarios. And we should do them all.

But a key question? Should we be working towards ultimately reconciling this data to maintain our own pathway ontology??? Something to think about. The issue with integrated pathway data is such data will never be consistently applied across the board.

I generally don't want to sign on for excessive curation commitments, but if we could come up with a computational rule we could automatically apply to maintain our own pathway ontology, I would favor that..."

Load published models

  • Transfer scripts, models, and knowledge to Lena
  • Create scripts to translate between SBML and TSV formats

Remove duplicate reactions

There are 4319 duplicated reactions currently. Addressing this will likely require updating template files.

Placeholder issue for reactions with charged protein reagents

This issue concerns reactions where the one or more reagents are proteins, typically with a side-group/chain that's involved in the reaction, and the status indicates a charge imbalance.

A recent commit shows that, rather than attempt to balance the reaction by editing the charge itself, which may have undesired consequences in many other reactions, the individual reactions are "cleared" to be OK by manually appending the "OK|CK" keyword where "CK" means "Checked":

052d83b

I'm not altogether comfortable with this solution, but I don't have the time right now to find an appropriate balance, or to determine if its at all possible. As one colleague describes it, "it's like pushing down bumps in a newly laid carpet, new bumps arise in other places". Nevertheless, this issue is placed here to remind us we have such an issue, and we should take a closer look at it at some point.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.