This Git repository holds all of the RDF converter scripts used to generate Bio2RDF linked data.
See the wiki for details.
Licensed under MIT License, see license page for details.
Scripts that Bio2RDF users have created to generate RDF versions of scientific datasets
Home Page: http://bio2rdf.org/
License: Other
This Git repository holds all of the RDF converter scripts used to generate Bio2RDF linked data.
See the wiki for details.
Licensed under MIT License, see license page for details.
I'm trying to create a graph/tree of the MESH thesaurus in order to do some semantic inferencing by performing graph traversal on semantic relationships.
The bio2rdf MESH datasets is very well suited to let me load it into a graph database. However, I don't find the Mesh Concepts, and Terms and their relationships into the three dataset available (descriptor_record.nt, qualifier_records.nt and supplementary_records.nt)
Do you know where I could find that information in the form of rdf files ? If no, should we/I upgrade the extraction script to extract such information ?
Dear all,
So while processing chembl components, the parser throws the following error messages:
==Error messages==
Processing components... PHP Notice: Undefined variable: component_classes_row in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on
Notice: Undefined variable: component_classes_row in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 594
PHP Fatal error: Call to a member function fetch_assoc() on a non-object in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 5
Fatal error: Call to a member function fetch_assoc() on a non-object in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 594
URIs used to refer to the same term between the three records (descriptor, qualifer and supplamentary ) files are not normalized. See: ftp://nlmpubs.nlm.nih.gov/online/mesh/.asciimesh/
We suggest to provide the following information for references to datasets:
such that for ns:id,
:a bio2rdf_vocabulary:x-ns ns:id .
ns:id a ns_vocabulary:resource;
dc:identifer "ns:id".
when I try to serach bio2rdf for
NCT00299741
I get no results.
But a trial http://clinicaltrials.gov/ct2/show/NCT00299741?resultsxml=true
truely exists
http://clinicaltrials.gov/ct2/show/NCT00299741
Why it is not in bio2rdf?
I am using this access:
http://clinicaltrials.bio2rdf.org/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fclinicaltrials:NCT00299741
or even pure text string.
(it is in linkedct.org data
http://linkedct.org/resource/trial/NCT00299741
but even there it does not list correctly the result reference. It has 1 and the RDF data say 0.
Vojtech Huser
Hi,
I would like for at least 1 person other than the submitter to comment (no problems or raise issue), and be responsible to merge and close the pull request. We could either do: whoever gets to the request first, or we could assign a reviewer for each committer.
what do you think?
When looking at diseases in Pharmgkb there are encoding issues with labels with special characters:
For example:
http://bio2rdf.org/pharmgkb:PA165108196
What comes out is (http://cu.pharmgkb.bio2rdf.org/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fpharmgkb%3APA165108196&graph=http%3A%2F%2Fbio2rdf.org%2Fpharmgkb_resource%3Abio2rdf.dataset.pharmgkb.R3):
rdfs:label Münchausen's syndrome [pharmgkb:PA165108196]
dcterms:title Münchausen's syndrome
The actual name from UMLS (via its cui) is:
Münchausen's syndrome (SY)
or
Munchausen's syndrome (PT)
This seems to happen in at least two browsers (Firefox and Chrome) and with both the bio2rdf Pharmgkb endpoint as well as a local copy I have. This happens for a few diseases in Pharmgkb.
Here is my original query - I am trying to get all diseases in Pharmgkb and their CUI's (if they have one):
PREFIX dc: http://purl.org/dc/elements/1.1/
PREFIX dct: http://purl.org/dc/terms/
PREFIX pharm: http://bio2rdf.org/pharmgkb_vocabulary:
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
SELECT ?pgkb_id ?labl ?CUI WHERE {
?pgkb_id a pharm:Disease .
?pgkb_id rdfs:label ?labl .
OPTIONAL {?pgkb_id pharm:x-UMLS ?CUI .}
} order by ?pgkb_id
2 incorrect URIs are being generated by the current version of the ncbi_gene parser i.e:
http://bio2rdf.org/geneid:vocabulary:Gene
http://bio2rdf.org/geneid:vocabulary:protein-coding-gene
Each of these should respectively read:
http://bio2rdf.org/geneid_vocabulary:Gene
http://bio2rdf.org/geneid_vocabulary:protein-coding-gene
only one synonym is being extracted
e.g. only "2-Acetoxybenzenecarboxylic acid" in http://bio2rdf.semanticscience.org:12050/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fdrugbank%3ADB00945
where drugbank has 2 dozen
http://www.drugbank.ca/drugs/DB00945
The predicate used to identify the clearance does not expand to its' full URI
drugbank_vocabulary:clearance
should be
For the resource http://bio2rdf.org/geneid:100309489
I found is being generated:
http://bio2rdf.org/geneid:100309489 http://bio2rdf.org/geneid_vocabulary:has_dbxref 810
suggest checking output from ncbi script.
The PHP-LIB offers support for setting a version for all source data, which is useful for datasets like uniprot or refseq. In GO Annotations, each dataset has a different version, so the current approach is not sufficient and would require a per-source versioning.
Hello,
Which branches are compatible between php-lib and bio2rdf-scripts?
Thanks,
Paul
Running the OMIM parser is generating these warnings:
Warning Invalid namespace:img for qname:img:Epicanthus-small.jpg. Using http://bio2rdf.org/img:Epicanthus-small.jpg
Is this expected behaviour?
parse action labels and assert more info
Has anyone encountered the following issues while loading the iproclass dataset into virtuoso (version 7.1.0)? There seems to be a malformed quad with multiple an invalid uniprot ids.
==Error message excerpts below==
Loading /var/preserve/bio2rdf/mirror/release/3/iproclass/t2.txt into http://bio2rdf.org/iproclass ...
Skipping line:1 ... Problem in: <http://bio2rdf.org/iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937> http://purl.org/dc/terms/identifier "iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937"^^http://www.w3.org/2001/XMLSchema#string http://bio2rdf.org/bio2rdf.dataset:bio2rdf-iproclass-20131213 .
Loading /var/preserve/bio2rdf/mirror/release/3/iproclass/t2.txt into http://bio2rdf.org/iproclass ...
Skipping line:1 ... Problem in: <http://bio2rdf.org/iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937> http://bio2rdf.org/bio2rdf_vocabulary:namespace "iproclass"^^http://www.w3.org/2001/XMLSchema#string http://bio2rdf.org/bio2rdf.dataset:bio2rdf-iproclass-20131213 .
Loading /var/preserve/bio2rdf/mirror/release/3/iproclass/t2.txt into http://bio2rdf.org/iproclass ...
Skipping line:1 ... Problem in: <http://bio2rdf.org/iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937> http://bio2rdf.org/bio2rdf_vocabulary:identifier "uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937"^^http://www.w3.org/2001/XMLSchema#string http://bio2rdf.org/bio2rdf.dataset:bio2rdf-iproclass-20131213 .
Hi gang,
So I've encountered this error while attempting to process Chembl. The error message below occurs several hundred times. The parser is still running and has been for several days now.
==Error message snippet==
Notice: Undefined index: molformula in /home/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 697 [60/9422]
#0 error_handler(256, $literal is not a literal, /home/bio2rdf/php-lib/rdfapi.php, 139, Array ([s_uri] => http://bio2rdf.org/chembl:CHEMBL25706,[p_uri] => http://bio2rdf.org/chembl_vocabulary:compound-key,[literal] => ,[lang] => ,[lt_uri] => http://www.w3.org/2001/XMLSchema#string,[g_uri] => http://bio2rdf.org/bio2rdf.dataset:bio2rdf-chembl-20130930))
#1 trigger_error($literal is not a literal, 256) called at [/home/bio2rdf/php-lib/rdfapi.php:139]
#2 RDFFactory->QuadL(http://bio2rdf.org/chembl:CHEMBL25706, http://bio2rdf.org/chembl_vocabulary:compound-key, , , http://www.w3.org/2001/XMLSchema#string,http://bio2rdf.org/bio2rdf.dataset:bio2rdf-chembl-20130930) called at [/home/bio2rdf/php-lib/rdfapi.php:191]
#3 RDFFactory->QQuadL(chembl:CHEMBL25706, chembl_vocabulary:compound-key, , , xsd:string) called at [/home/bio2rdf/php-lib/bio2rdfapi.p
hp:512]
#4 Bio2RDFizer->triplifyString(chembl:CHEMBL25706, chembl_vocabulary:compound-key, ) called at [/home/bio2rdf/bio2rdf-scripts/chembl/ch
embl.php:714]
#5 ChemblParser->compounds(mysqli Object ([affected_rows] => ,[client_info] => ,[client_version] => ,[connect_errno] => ,[connect_error] => ,[errno] => ,[error] => ,[error_list] => ,[field_count] => ,[host_info] => ,[info] => ,[insert_id] => ,[server_info] => ,[server_version] => ,[stat] => ,[sqlstate] => ,[protocol_version] => ,[thread_id] => ,[warning_count] => )) called at [/home/prigor/projects/gradschool/genomics/newcrick/pediatrics/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php:137]
#6 ChemblParser->process() called at [/home/bio2rdf/bio2rdf-scripts/chembl/chembl.php:58]
#7 ChemblParser->run() called at [/home/bio2rdf/bio2rdf-scripts/runparser.php:52]
#8 Bio2RDFApp->__construct(Array ([0] => runparser.php,[1] => parser=chembl,[2] => files=all,[3] => download=false,[4] => db_name=Bio2Rdf,[5] => db_user=pediatrics,[6] => db_pass=pediatrics,[7] => indir=/home/bio2rdf/download/chembl/2/,[8] => outdir=/home/bio2rdf/rdf/chembl/2/,[9] => db_ip=db-igb.ics.uci.edu,[10] => db_port=5000,[11] => registry_dir=/home/bio2rdf/download/,[12] => process=true)) called at [/home/bio2rdf/bio2rdf-scripts/runparser.php:82]
We have a question about how to create URIs for components of the LinkedSPLs resource. The "core" of the resources would represent all SPL sections and have predicates for active moieties and other items. Some of these will point to graphs that we develop as part of the project. We think that that main graph URI should be perhaps:
http://linkedSPLs.bio2rdf.org#
And the other graph URIs like:
http://linkedSPLs.bio2rdf.org/activeMoiety#
http://linkedSPLs.bio2rdf.org/adverseEvents#
etc.
Does this seem in line with current practice? We could also see using 'http://linkedSPLs.bio2rdf.org/core#' for the 'core' data so that all parts are named as distinct sub-graphs.
Looking for thoughts on how we can appropriately test generated data from scripts. For example, script output must pass through x parser (e.g. rapper, sesame console, jena) before being accepted.
comments?
Some studies post summary resuts to CTG and other don't
For example this study has results posted
http://clinicaltrials.gov/show/NCT00320788
I can see the results in the RDF, but no simple flag indicating - resutls yes or no.
How can I access it?
Consider querying studies for wet macular degeneration: like this
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX dc: http://purl.org/dc/elements/1.1/
PREFIX dct: http://purl.org/dc/terms/
PREFIX ct: http://bio2rdf.org/clinicaltrials:
PREFIX ctv: http://bio2rdf.org/clinicaltrials_vocabulary:
SELECT *
WHERE {
?x ctv:condition-mesh http://bio2rdf.org/clinicaltrials_resource:4938444ea4887b824f41147e31329e6c.
?x rdfs:label ?label.
?x ctv:phase [rdfs:label ?phase]
}
and I want to go to "ctv:has_result" flag.
the resulting PubMed RDF does not contain (year) dates and page ranges.
There is strange output when running the HGNC parser
php hgnc.php indir=/tmp/ outdir=/tmp/
The following is generated:
**Warning** Invalid namespace: for qname: $('.gene-search-form #gntext').val('').css('color', '#000');
. Using http://bio2rdf.org/:$('.gene-search-form #gntext').val('').css('color', '#000');
**Warning** Invalid namespace: for qname: $('.gene-search-form #gntext').val('').css('color', '#000');
. Using http://bio2rdf.org/:$('.gene-search-form #gntext').val('').css('color', '#000');
**Warning** Invalid namespace: for qname: $('.gene-search-form #gntext').val('').css('color', '#000');
. Using http://bio2rdf.org/:$('.gene-search-form #gntext').val('').css('color', '#000');
**Warning** Invalid namespace: for qname: }
. Using http://bio2rdf.org/:}
**Warning** Invalid namespace: for qname: }
. Using http://bio2rdf.org/:}
**Warning** Invalid namespace: for qname: }
DrugBank 4.0 recently released. The dataset is now available in XML at http://www.drugbank.ca/downloads/. It would be helpful to run the bio2rdf scripts to synchronize with the current version of Drugbank.
Thanks!
Serkan Ayvaz
Need to use safe literal on the generated label.
Hi Al,
when trying to run the homologene parser without parameters:
php homologene.php
the application runs in the default without printing the list of parameters. It looks like there are no required parameters. Is this on purpose or should there be at least one required parameter?
Dana
update parser using the new php-lib.
Currently, we are querying rdf graph from virtuoso endpoint to get dailymed setid by rxcuis. This causes dump and load linkedSPLs core graph twice during update. I suggests that we switch to SQL agaist Mysql endpoint because all the mappings are stored before the dump process. This will be a speed up the whole updating process.
It turns out that some SPLs list incorrect data in the active moiety. Fortunately, the FDA provides a validation file:
http://www.fda.gov/downloads/ForIndustry/DataStandards/StructuredProductLabeling/UCM362965.zip
This needs to be used to ensure that the active moiety associations for LinkesSPLs are accurate.
Would like to get the TCM data into Bio2RDF.
The data is now available at: http://www.open-biomed.org.uk/tcmdata/
and the SPARQL endpoint at : http://www.open-biomed.org.uk/rdf-tcm/
will ask whether the data is one time curated, or whether it can be configured to draw on some primary source.
The entrez gene parser does not seem allow downloading and parsing of files.
php entrez_gene.php download=true files=all indir=/home/dankla/bio2rdf-dataspaces/downloads/gene/ outdir=/home/dankla/bio2rdf-dataspaces/data/gene/
PHP Warning: gzopen(/home/dankla/bio2rdf-dataspaces/downloads/gene/GENE_INFO/All_Data.gene_info.gz): failed to open stream: No such file or directory in /home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/gene/entrez_gene.php on line 100
Could not open file GENE_INFO/All_Data.gene_info.gz!
The indir and outdir directories are successfully created but no files are downloaded despite the download flag being set to true. Am i missing something or is this functionality not yet included in the script?
just testing
finally, we should create another sub-program in linkedSPLs for quality check, including validates the correctness of mappings and competency queries.
trying to run the irefindex script generates the following error :
All.mitab.10182011.txt.zip copied to /home/dankla/bio2rdf-dataspaces/downloads/irefindex/
Processing all ...PHP Warning: fgets(): Zip stream error: Premature EOF in /home/dankla/bio2rdf-dataspaces/php-lib/fileapi.php on line 80
PHP Notice: Undefined offset: 13 in /home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php on line 158
#0 error_handler(256, Invalid namespace for :, /home/dankla/bio2rdf-dataspaces/php-lib/ns.php, 477, Array ([qname] => :,[delimiter] => :,[ns] => ,[id] => ))
#1 trigger_error(Invalid namespace for :, 256) called at [/home/dankla/bio2rdf-dataspaces/php-lib/ns.php:477]
#2 CNamespace->MapQName(:) called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:162]
#3 iREFINDEXParser->Parse() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:115]
#4 iREFINDEXParser->Run() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:357]
#0 error_handler(256, Invalid qname for , /home/dankla/bio2rdf-dataspaces/php-lib/ns.php, 494, Array ([qname] => ,[ns] => ,[id] => ))
#1 trigger_error(Invalid qname for , 256) called at [/home/dankla/bio2rdf-dataspaces/php-lib/ns.php:494]
#2 CNamespace->getFQURI() called at [/home/dankla/bio2rdf-dataspaces/php-lib/rdfapi.php:161]
#3 RDFFactory->QQuad(, void:inDataset, bio2rdf_dataset:bio2rdf-irefindex-20120906) called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:164]
#4 iREFINDEXParser->Parse() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:115]
#5 iREFINDEXParser->Run() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:357]
Running the homologene parser
PHP Notice: Undefined variable: rdir in /home/dankla/bio2rdf-dataspaces/bio2rdf->scripts/homologene/homologene.php on line 88
downloading homologene.data ... PHP Warning: file_get_contents(homologene.data): failed to open stream: No >such file or directory in /home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/homologene/homologene.php on line 90
http://geneontology.org/external2go/ has the most complete listing
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/ seems to be a subset
As part of OHDSI (http://ohdsi.org/), and open collaborative, we are
looking at various data sources that would be helpful for identifying
drugs without adverse events. ClinicalTrials.gov is actually a
potentially very useful source. We explain how to get adverse events
from the XML of a CT.gov entry here: http://bit.ly/1iPhq9W
I tried to find the same information in bio2rdf ClinicalTrials.gov but was unable to.
There does not seem to be an "event" resource anywhere in the graph. Though, there are resources that represent outcomes (primary and secondary) and note if
the if the outcomes are safety related. When you can spare a minute,
would you please check to see if the XML the scripts you use to create
the resource might not be properly loading "event" resources?
Thank you,
-Rich Boyce
i modified the parser to generate safe literal for a few entries where quotes were still intact
here's the loaded data:
http://bio2rdf.semanticscience.org:8019/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fhgnc%3A5171
two issues
TODOs for documentation of the linkeSPLs project:
We should systematically go through each parser and assert the proper datatype for each literal value. The datatypes should be selected from the OWL datatype map for compatibility:
http://www.w3.org/TR/owl2-syntax/#Datatype_Maps
I suspect the following are the most valuable:
For all of the ctd tables that have been processed, the virtuoso server complains about a syntax error. I believe the error lies with the url's having one front-slash instead of two.
eg, <http:/bio2rdf.org/ctd_vocabulary:abu> <http:/purl.org/dc/terms/identifier>
Below is one example for ctd_chem_gene_ixns.nq.
ctd_chem_gene_ixn_types.nq.gz
ctd_chem_go_enriched.nq.gz
ctd_chemicals_diseases.nq.gz
ctd_chemicals.nq.gz
ctd_chem_pathways_enriched.nq.gz
ctd_diseases.nq.gz
ctd_diseases_pathways.nq.gz
ctd_genes_diseases.nq.gz
ctd_genes_pathways.nq.gz
ctd_pathways.nq.gz
Connected to OpenLink Virtuoso
Driver: 07.00.3203 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
*** Error 22007: [Virtuoso Driver][Virtuoso Server]XM033: XML parser detected an error:
ERROR : Syntax error in the attribute list (no whitespace)
at line 1 column 9 of source text
<http:/bio2rdf.org/ctd_resource:C1122973784> <http:/purl.org/dc/terms/identi
------^
at line 0 of Top-Level:
DB.DBA.RDF_LOAD_RDFXML_MT(file_to_string_output ('/home/bio2rdf/rdf/ctd/2/ctd_chem_gene_ixns.nq'), '', 'http:/bio2rdf.org/ctd_chem_gene_ixns', 784, 16)
Done. -- 4977 msec.
Undefined offset: 40 in /home/dankla/openbiocloud-scripts/dataspaces/biomolecular_space/hgnc/hgnc.php on line 173
working on a new interpro parser.
Need to specify a unique key for each entry, as there is a clash with just chemical + gene:
http://ctd.bio2rdf.org/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fctd_resource%3AD0029452
While processing one of the input files, the parser barfs with the following error:
Generating dataset description... Processing meddra_freq_parsed... PHP Fatal error: String size overflow in /home/prigor/projects/gradschool/genomics/newcrick/pediatrics/semanticweb/bio2rdf/php-lib/rdfap
Fatal error: String size overflow in /home/prigor/projects/gradschool/genomics/newcrick/pediatrics/semanticweb/bio2rdf/php-lib/rdfapi.php on line 63
Has anyone encountered this?
As noted here: https://groups.google.com/forum/#!topic/bio2rdf/eUMK67M_ia8
the parsing of descriptor records does not fully capture all "PRINTENTRY" and "ENTRY" terms.
culprit:
private function makeDescriptorRecord($desc_record_arr){...} in https://github.com/bio2rdf/bio2rdf-scripts/blob/master/mesh/mesh_parser.php#L54
Hi All,
I would like to start a discussion about how the Bio2RDF repo is managed by core contributors. Personally, I find the current model of everybody forking the project difficult to manage and resolve assigned issues. I'd like to come up with a common set of best practices for working on the bio2rdf project. This would translate into a wiki post with the specific git commands for contributing and resolving issues. This will help existing and future contributors become more familiar with git and the bio2rdf project.
I would suggest, core contributors, all work off one repo (bio2rdf/bio2rdf-scripts). This repo would have two main branches, master and develop. Issues would be resolved by branching from develop to a issue specific branch, commit, and issuing a merge request back to develop for code review. Once merged by the assigned person the branch could be deleted. Prior to merge with develop no more changes to feature branch will be made.
Core contributors take out a social contract that if they break the master ( e.g. poor code review prior to merge with master) they will loose coffee privileges for the day :)
Ideas? Thoughts?
Cheers,
Dana Klassen
please update the parser to use the framework.
please rewrite to match that in gene.gene2go(); the idea is that we want to assert function,process,location directly and to indirectly link evidence through a gene-go-association.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.