knowledge-graph-hub / kg-microbe Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://knowledge-graph-hub.github.io/kg-microbe/index.html
License: BSD 3-Clause "New" or "Revised" License
Home Page: https://knowledge-graph-hub.github.io/kg-microbe/index.html
License: BSD 3-Clause "New" or "Revised" License
From this resource and paper:
https://github.com/bacteria-archaea-traits/bacteria-archaea-traits
https://www.nature.com/articles/s41597-020-0497-4
Specifically the output file with NCBI taxon ids:
https://github.com/bacteria-archaea-traits/bacteria-archaea-traits/blob/master/output/condensed_traits_NCBI.csv
Split from #2.
1 trithionate_oxidation
1 carbonmonoxide_oxidation
1 tetrathionate_oxidation, iron_reduction
1 pyrrhotite_oxidation
1 galena_oxidation
1 thiocyanate_oxidation
1 carbonylsulfide_oxidation
...
182 sulfur_reduction
186 thiosulfate_reduction
353 aerobic_chemo_heterotrophy
366 fermentation
371 denitrification
400 nitrite_reduction
983 NA
1420 nitrate_reduction
These should all map to GO
include infectious disease branch of mondo, plus associated taxa
you may want to use robot here
note; we may want to remap. I would do this with a SPARQL CONSTRUCT
mondo has
we want to transform these into organism-organism edges, and annotate the edges with the disease
e.g.
<<sars-cov-2 interacts-with homo-sapiens>> causes: mondo:COVID-19
in mondo you can assume the organism is human unless it has an in-taxon edge
we may actually want to push the generation of this kgx file into the mondo pipeline cc @matentzn
we may want to include these edges in kg-covid-19 cc @justaddcoffee @realmarcin
Build currently fails at the transform step as follows:
13:20:08 + python3.8 run.py transform
13:20:11 org.semanticweb.owlapi.io.OWLOntologyInputSourceException: java.io.FileNotFoundException: data/raw/chebi.owl (No such file or directory)
13:20:11 Use the -vvv option to show the stack trace.
13:20:11 Use the --help option to see usage information.
13:20:14 Traceback (most recent call last):
13:20:14 File "run.py", line 121, in <module>
13:20:14 cli()
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
13:20:14 return self.main(*args, **kwargs)
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
13:20:14 rv = self.invoke(ctx)
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
13:20:14 return _process_result(sub_ctx.command.invoke(sub_ctx))
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
13:20:14 return ctx.invoke(self.callback, **ctx.params)
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
13:20:14 return __callback(*args, **kwargs)
13:20:14 File "run.py", line 69, in transform
13:20:14 kg_transform(*args, **kwargs)
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform.py", line 43, in transform
13:20:14 t.run()
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform_utils/traits/traits.py", line 131, in run
13:20:14 create_termlist(self.input_base_dir, "chebi")
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/utils/nlp_utils.py", line 109, in create_termlist
13:20:14 transform(
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 560, in transform
13:20:14 transform_source(
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 869, in transform_source
13:20:14 transformer.transform(input_args, output_args)
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/transformer.py", line 236, in transform
13:20:14 self.process(source_generator, intermediate_sink)
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/transformer.py", line 332, in process
13:20:14 for rec in source:
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/source/obograph_source.py", line 62, in parse
13:20:14 yield from chain(n, e)
13:20:14 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/source/obograph_source.py", line 84, in read_nodes
13:20:14 FH = open(filename, "rb")
13:20:14 FileNotFoundError: [Errno 2] No such file or directory: 'data/raw/chebi.json'
chebi.json
is missing because ROBOT doesn't complete its conversion, and it doesn't do that because it can't find chebi.owl
.
chebi.owl.gz
is successfully downloaded, but the issue here is likely with the TraitsTransform - the CHEBI transform knows it has to work with a compressed version but the TraitsTransform expects to have chebi.json
ready to use.
github obophenotype
Unfortunately the RIKEN media data seems to have some bad license:
https://www.jcm.riken.jp/cgi-bin/jcm/jcm_grmd?GRMD=422
Copyright © 2023 Microbe Division (JCM) - All Rights Reserved
Most likely this will involve running this tool on select genomes/MAGs/metagenomes and then modeling the source data and inferences and knowledge.
https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-021-01213-8
The data used for figure 3 is publicly available and the source is listed in the figure legend. It was taken from Table S3 and S5 from Oh et al. 2014.
Source data is found here:
Table S3. Relative Abundance of Propionibacterium acnes Strains, Related to Supplemental Experimental Procedures
https://www.cell.com/cms/10.1016/j.cell.2016.04.008/attachment/755072a6-35e2-4b3e-8f9e-eb0b9ec09274/mmc4.xlsx
Table S5. Relative Abundance of Staphylococcus epidermidis Strains and Presence/Absence of Gene Clusters, Related to Supplemental Experimental Procedures
https://www.cell.com/cms/10.1016/j.cell.2016.04.008/attachment/a80c1d13-73b7-4bd3-90bc-c2e42b3ec6b3/mmc6.xlsx
Source paper link:
https://www.cell.com/fulltext/S0092-8674(16)30399-3#supplementaryMaterial
Secondary referenced article with figure:
Living in Your Skin: Microbes, Molecules, and Mechanisms
https://journals.asm.org/doi/10.1128/IAI.00695-20?utm_source=informz&utm_medium=email&utm_campaign=sign-up-journals&utm_term=20220127&utm_content=nn-hmb-mktg&utm_source=Informz&utm_medium=Email&utm_campaign=Campaign&utm_content=Message_Name&_zs=6bprl&_zl=CBs62
The gutMEGA resource provides a few useful files for KG-Microbe.
http://gutmega.omicsbio.info/download.php
The first set of ingests cover different microbial taxonomy resources, including NCBITaxonomy which is already present in KG-Microbe. Looking at these text files, there may be some alignment necessary between these different taxonomies, ideally as an NER task. There will be some disagreements in taxonomy structure, and we even pick clique leaders. Note that the three non-NCBI taxonomies are all specific to microbes (as opposed to NCBI, that is why we had to trim).
NCBI taxonomy | Reformatted NCBI taxonomy information, including diifferent ranks of NCBI taxa |
Greengenes taxonomy | Reformatted Greengenes taxonomy information, including diifferent ranks of Greengenes taxa |
RDP taxonomy | Reformatted RDP taxonomy information, including diifferent ranks of RDP taxa |
SILVA taxonomy | Reformatted SILVA taxonomy information, including diifferent ranks of SILVA taxa |
Quantitative data table ingest, this will be valuable and to start can be an NER task (but need to identify reference ontology set). These will be taxa -> condition -> relative abundance, where the 'condition' is a free text short description like a sample title.
gutMEGA data table | All quantification events provided in gutMEGA |
This dataset provides the literature provenance for the quantitative data in the data table (above):
Literature information summary | Related information about the curated literature
Download full csv across ~1200 taxa here (CSV button):
https://services.bromberglab.org/fusiondb/explore
This is the unique of chemicals listed under the carbon substrate column:
https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/schemas/distinct_carbon_substrates.txt
The goal is to create a SSOM mapping file to encode the results of the above chemical matching, as here for pathways:
https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/pathways.sssom.tsv
Build currently fails when calling transforms:
11:51:07 + python3.8 run.py transform
11:51:09 Traceback (most recent call last):
11:51:09 File "run.py", line 121, in <module>
11:51:09 cli()
11:51:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
11:51:09 return self.main(*args, **kwargs)
11:51:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
11:51:09 rv = self.invoke(ctx)
11:51:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
11:51:09 return _process_result(sub_ctx.command.invoke(sub_ctx))
11:51:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
11:51:09 return ctx.invoke(self.callback, **ctx.params)
11:51:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
11:51:09 return __callback(*args, **kwargs)
11:51:09 File "run.py", line 69, in transform
11:51:09 kg_transform(*args, **kwargs)
11:51:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform.py", line 43, in transform
11:51:09 t.run()
11:51:09 TypeError: run() missing 1 required positional argument: 'data_file'
If the function is reaching t.run()
without including any sources, then some sources exist but aren't being recognized by name.
Could be related to specifying
def transform(input_dir: str, output_dir: str, sources: Optional[List[str]] = None) -> None:
instead of
def transform(input_dir: str, output_dir: str, sources: List[str] = None) -> None:
to avoid the no-implicit-optional thing.
Similar to KG-COVID-19, set up Jenkins to run build on build.berkeleybop.io, and push to S3, probably http://kg-hub.berkeleybop.io/kg-microbe, per convo with @cmungall.
kg-covid-19 Jenkinsfile for reference
General info here:
http://www.webofmicrobes.org/about
Download SQLite via 'download' button.
This is the primary relevant data:
"The Web", links the actions of microorganisms on metabolites from within a single environment."
And these are the two basic assertions:
There are two types of assertions that are made on the WoM:
Assertions of 'present in environment': The metabolite to be annotated as detected in >2/3 replicates. These are indicated by tan table cells or filled in circles on the web.
Assertions of 'increase' or 'decrease' by an organism: Metabolites that were significantly different from the control environment versus the transformed environment are asserted as increased (red cells on tables and red lines on The Web) or decreased (blue cells on tables and blue lines on The Web) with darker shading indicating a greater fold change.
19 traits for 19,455 bacterial strains
https://www.sciencedirect.com/science/article/pii/S1470160X21007123
Possible complete download here:
https://www.sciencedirect.com/science/article/pii/S1470160X21007123
Website:
https://bacdive.dsmz.de/
No direct download on website, there is an API:
https://api.bacdive.dsmz.de
Disbiome database is a collection of microbiome compositional effects related to human disease
https://disbiome.ugent.be/home
Here is an example record for a disbiome entry:
https://disbiome.ugent.be/experimentdetail/11342
The pertitent pieces of info for KG ingestion are:
All of the data is here (84G total):
http://genomics.lbl.gov/supplemental/bigfit/
The numerical relative growth data would have to be converted - growth vs no growth, via eg thresholding.
Just taking the first organism as an example:
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/
On the organism page, under 'Genes' the 'Specific phenotypes' link gives a table of most significant phenotype per gene for this KO dataset:
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/specific_phenotypes
and this file can serve as the primary data source.
These columns:
sysName desc name lrn t Group Condition_1 Concentration_1 Units_1
provide the following data:
gene name
description
internal name
log ratio normalized
t-statistic
condition group
condition name
concentration
unit
For reference under 'Genes' the 'Gene fitness' link gives a full table of relative fitness values:
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_logratios_good.tab
The y-axis labels are 'locusId' which are gene ids and the x-axis labels are condition (sample) ids including a text description.
There is additional data on each condition on the organism page under 'Tables' then 'Experiments' then 'Detailed metadata for experiments':
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/expsUsed
A basic ingest of this data would model as mutant alleles or a gene-condition relation indicating that this gene X is essential for growth in condition Y. As key supporting data the gene annotations should also be ingested:
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_genes.tab
with the caveat that these are 'free text' annotations so may require standardization.
Further ingests could include:
Split from #2
1 spirochete
1 ring
1 triangular
1 cell_shape
2 tailed
3 branced
3 spindle
3 star
4 irregular
5 flask
7 square
11 disc
12 fusiform
60 pleomorphic
289 vibrio
328 filament
400 spiral
402 coccobacillus
1563 coccus
3794 NA
4035 bacillus
I think the majority should map to PATO. We may want to consider making OBA terms for some
Provides three additional properties for 3637 DSMZ media:
Is Complex | Is Aerobic | Is SubMedium
For a subset of DSMZ, here:
https://komodo.modelseed.org/servlet/KomodoTomcatServerSideUtilitiesModelSeed?MediaList
https://docs.google.com/spreadsheets/d/19q0re9z7ngOs1cKF9q-GWJRDXZmsoOMwktctBrNW0PY/edit#gid=0
Paper:
https://www.nature.com/articles/ncomms9493
https://komodo.modelseed.org
Context: Semantic ETL manifesto
We should make this repo an exemplar of a KG ETL repo. We should formally describe upstream sources in a lightweight way and make it very transparent how things map.
We have done most of the work already here:
https://github.com/Knowledge-Graph-Hub/kg-microbe/tree/master/schemas
This describes the core trait table that is ingested. We need to switch this to use linkml first (#21), but we may add more.
We can generate markdown for this using linkml, but we need to think how this integrates with https://knowledge-graph-hub.github.io/kg-microbe/index.html
I like how dipper includes a sources section: https://dipper.readthedocs.io/en/latest/sources.html
Would it make sense to
Alternatively we could use the generated python from the schema
ProGenomes3 has over 900k annotated prokaryotic genomes.
The download provides:
https://knowledge-graph-hub.github.io/kg-microbe/modules.html
but it looks like we have rst for it?
A curated resource of information about 80,000 microbes. This includes viruses which we can deprioritize for now.
The source data tables from The Microbe Directory can be build using 'make' in this repository:
Data:
https://github.com/dcdanko/MD2
Table 1: Inventory parameters and descriptions
Parameter
Definition and notes
Optimal pH
The optimal pH at which this species grows. If the species was not widely studied, the American Type Culture Collection (ATCC) was used to determine the optimal pH for storage. If two far ranges of pH were determined, the average was taken.
Optimal temperature
The optimal temperature at which this species grows. If the species was not widely studied, the ATCC was used to determine the optimal temperature for storage. If two far ranges of temperatures were determined, the average was taken.
COGEM pathogenicity rating
COGEM released a comprehensive database of pathogenicity assessment of around 2575 bacterial species in 201110. The database ranks the pathogenicity of species on a scale of 1 to 4 - 1 being not belonging to a recognized group of disease-invoking agents in humans or animals and having an extended history of safe usage and 4 being a species that can cause a very serious human disease, for which no prophylaxis is known.
Antimicrobial susceptibility
Are there any known antibiotics that this species is sensitive to? No = 0, Yes = 1
Spore-formation
Is the species spore-forming? No = 0, Yes = 1
Biofilm-formation
Is the species biofilm-forming? No = 0, Yes = 1
Extremophile
Extremophiles are organisms that live in extreme environments, as opposed to organisms that live in moderate (mesophilic) environments. This category includes acidophiles, thermophiles, osmophiles, halophiles, oligotrophs, and others. Mesophiles = 0, Extremophile = 1
Gram-stain
Negative = 0, Positive = 1, Indeterminate = 2
Found in human microbiome
Microbes that live anywhere in the human body and are not pathogenic to humans (i.e. capable of causing human disease) No=0, Yes=1
Plant pathogen
Does the species causes disease in plants? No = 0, Yes = 1
Animal pathogen
Does the species causes disease in animals? No = 0, Yes =1
From GitHub: Details for available fields, across domains of life
Virus
Genetic material: Virus have either RNA or DNA as their genetic material
Strand: The nucleic acid may be single (ss) or double stranded (ds).
Capsid symmetry: The way in which the capsid units are arranged.
Helical
Icosahedral
Complex
Envelop: The outer layer of a virus that protects the nucleic acid. Virus without envelop are called naked.
Is it a pathogen? If yes, which is its host.
Human
Animal
Plant
Bacteria
Fungi
Bacteria and Archaea Only
Gram stain: Used to distinguish and classify bacterial species into two large groups: Gram-positive and Gram-negative.
Antimicrobial resistance (AMR): Antimicrobial resistance occurs naturally over time, usually through genetic changes. However, the misuse and overuse of antimicrobials is accelerating this process.
Type of metabolisms: the nutrition mode of microbes according to the sources of energy and carbon needed for living, growth and reproduction. All sorts of combinations may exist in nature.
Primary source of energy:
Phototrophs: Light is absorbed in photo receptors and transformed into chemical energy
Chemotrophs: Bond energy is released from a chemical compound.
Primary sources of reducing equivalents:
Organotrophs: Organic compounds are used as electron donor.
Lithotrophs: Inorganic compounds are used as electron donor.
Primary sources of carbon
Heterotrophs: Organic compounds are metabolized to get carbon for growth and development.
Autotrophs: Carbon dioxide (CO2) is used as source of carbon.
Bacteria, Archaea and Eukarya
Biofilm forming: Biofilms are multicellular communities held together by a self-produced extracellular matrix. Biofilms impact humans in many ways as they can form in natural, medical, and industrial settings.
Spore forming: Also referred to as endospores, are the dormant form of vegetative microbes and are highly resistant to physical and chemical influences.
Microbiome: Host or environment where microbes are usually found.
Host: Microbes might be commensal or pathogenic to their host. Commensal microbes are found to be crucial to the survival of their hosts.
Sponges
Corals
Fungi
Plant
Animal
Human: Body sites of Human Microbiome Project
Soil: Microbes are essential for soils. They are main drivers of nutrient cycles in soils, decompose organic matter, promote plant growth and control pests and diseases.
Tundra
Grassland
Croplands
Forest
Tropical
Temperate
Boreal
Extreme: Microbes that live in habitats considered hard to survive in due to its extreme conditions such as temperature, accessibility to different energy sources or under high pressure.
Desert
Polar
Deep ocean
Space
Water: Water can support the growth of many types of microorganisms. Microbes are main drivers of biogeochemical processes and nutrient cycling.
Ocean
Fresh
Mangrove
Sediments
Is it a pathogen? if Yes, which is its host:
Fungi
Plant
Animal
Human: Body sites of Human Microbiome Project
Extremophile: a microbe that thrives in physically or geochemically extreme conditions that are detrimental to most life on Earth. Microbes that can only live under optimal conditions are called Mesophiles.
If extremophile, which type.
Acidophile: Microbes that live in acidic systems with pH -0.06 to 4.0.
Alkaliphile: Microbes capable of survival in alkaline environments with pH 8.5–11
Halophile: Microbes that thrive in high salt concentrations.
Metallotolerant: Microbes that survive in environments with a high concentration of dissolved heavy metals in solution
Barophile: Also called piezophile, are microbes which thrive at high pressures such as deep seas.
Psychrophile: Also called cryophiles, are microbes capable of growth in low temperatures, ranging from −20°C to 10°C.
Radioresistant: Microbes capable of withstand high levels of ionizing radiation.
Thermophile: Microbes that live at high temperatures between 41°C and 122°C.
Xerophile: Microbes that grow and reproduce in conditions with a low availability of water.
Hypolith: Organisms that live underneath rocks in cold deserts.
Oligotroph: Microbes capable of growth in nutritionally limited environments.
The data is from this paper:
Exploring the functional composition of the human microbiome using a hand-curated microbial trait database
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04216-2
The dataset itself is Additional File 1:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04216-2#MOESM1
To start we could perform NER with the same dictionaries as Madin et al - so NCBI Taxonomy, ENVO, ECOCORE, ChEBI.
There are additional numerical columns of interest here beyond what Madin et al provided:
"Towards the biogeography of prokaryotic genes"
https://www.nature.com/articles/s41586-021-04233-4
Downloads here:
https://gmgc.embl.de/download.cgi
More info soon ...
Create categories for recognized entities based on comparison between TokenizedTerm and PreferredTerm
On the most recent Jenkins build, the graph merge appears to complete as expected, but the KGX stats file isn't found:
17:57:46 [KGX][cli_utils.py][ merge] INFO: Merged graph has 2715138 nodes and 2979399 edges
18:00:07 [KGX][cli_utils.py][ merge] INFO: Writing merged graph to merged-kg-tsv
[Pipeline] sh
18:09:44 + cp merged_graph_stats.yaml merged_graph_stats_20230106.yaml
18:09:44 cp: cannot stat 'merged_graph_stats.yaml': No such file or directory
So either the stats file isn't created or the Jenkinsfile is set to look in the wrong place for it.
These are the two main public resources with this data:
https://www.frontiersin.org/articles/10.3389/fcimb.2018.00424/full
Split from #2
1 strictly anaerobic
342 obligate anaerobic
544 microaerophilic
1035 obligate aerobic
2328 anaerobic
2655 facultative
3108 aerobic
4250 NA
I think these should map to ecocore, cc @diatomsRcool
from OBO owl, after ROBOT pruning
The graph currently fails to build on Jenkins (see error below).
08:23:38 Collecting pandas (from kg-microbe==1.0.0)
08:23:38 Using cached https://files.pythonhosted.org/packages/99/f0/f99700ef327e51d291efdf4a6de29e685c4d198cbf8531541fc84d169e0e/pandas-1.3.5.tar.gz
08:23:39 Complete output from command python setup.py egg_info:
08:23:39 Traceback (most recent call last):
08:23:39 File "<string>", line 1, in <module>
08:23:39 File "/tmp/pip-build-2_cqjoc1/pandas/setup.py", line 18, in <module>
08:23:39 import numpy
08:23:39 ModuleNotFoundError: No module named 'numpy'
08:23:39
08:23:39 ----------------------------------------
08:23:39 Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-2_cqjoc1/pandas/
A full refresh of the Jenkins build may require a new Docker image, as currently used for KG-IDG or KG-COVID-19.
We had various tickets open, don't have time to go back to them all, but the SOP is to record the mapping of the categorical values to the ontology term id in the enums section:
https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/schemas/trait_condensed.yaml
But these still have automated junk
The graph build process could use a refresh overall, primarily to better align with other KG-Hub graphs.
These should be updated:
Build currently encounters this problem:
14:33:09 Traceback (most recent call last):
14:33:09 File "run.py", line 121, in <module>
14:33:09 cli()
14:33:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
14:33:09 return self.main(*args, **kwargs)
14:33:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
14:33:09 rv = self.invoke(ctx)
14:33:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
14:33:09 return _process_result(sub_ctx.command.invoke(sub_ctx))
14:33:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
14:33:09 return ctx.invoke(self.callback, **ctx.params)
14:33:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
14:33:09 return __callback(*args, **kwargs)
14:33:09 File "run.py", line 69, in transform
14:33:09 kg_transform(*args, **kwargs)
14:33:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform.py", line 43, in transform
14:33:09 t.run()
14:33:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform_utils/traits/traits.py", line 500, in run
14:33:09 write_node_edge_item(
14:33:09 File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/utils/transform_utils.py", line 83, in write_node_edge_item
14:33:09 fh.write(sep.join(data) + "\n")
14:33:09 TypeError: sequence item 2: expected str instance, Series found
This function attempts to write a single tab-delimited list of strings to a file, but in this case, it can't join the data because it's passed a pandas Series.
Jenkins build encounters this error when setting up index pages:
13:34:04 + multi_indexer -v --directory kg-microbe --prefix https://kg-hub.berkeleybop.io/kg-microbe/ -x -u
13:34:04 /var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo@tmp/durable-51f8b30b/script.sh: 1: multi_indexer: not found
The Docker image in use may not have multi_indexer
installed.
Have the Jenkins run install it.
PREGO is a text-mined resource for information about microbes linked with additional information.
Paper:
https://www.mdpi.com/2076-2607/10/2/293/htm#app5-microorganisms-10-00293
Data sources for the PREGO association network are here:
https://www.mdpi.com/2076-2607/10/2/293/htm#table_body_display_microorganisms-10-00293-t0A2
@deepakunni3 @hrshdhgd @cmungall @ @realmarcin
If we are going to interact with other OBO ontologies (e.g., ENVO, PATO, COB), I think it would be helpful if we were could provide some clarity as to how the use of trait differs from other OBO terms such as quality, and characteristic.
Getting clear on this may help us integrate better with OBO ontologies.
This task involves avoiding pinning package version in the requirements.txt
and setup.py
files.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.