Code Monkey home page Code Monkey logo

bio2rdf-scripts's Introduction

Run Bio2RDF PHP scripts

Bio2RDF-scripts

This Git repository holds all of the RDF converter scripts used to generate Bio2RDF linked data.

Requirements

See the wiki for details.


Licensed under MIT License, see license page for details.

bio2rdf-scripts's People

Contributors

alisoncallahan avatar amarillion avatar ansell avatar boycer avatar dependabot[bot] avatar dklassen avatar gabivulcu avatar jctoledo avatar jlleitschuh avatar juancifuentes avatar katrinleinweber avatar manuels avatar maulikkamdar avatar micheldumontier avatar ningyifan avatar theoryno3 avatar vemonet avatar zorino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bio2rdf-scripts's Issues

[MESH] Extraction of Concepts, Terms and their semantic relation

I'm trying to create a graph/tree of the MESH thesaurus in order to do some semantic inferencing by performing graph traversal on semantic relationships.

The bio2rdf MESH datasets is very well suited to let me load it into a graph database. However, I don't find the Mesh Concepts, and Terms and their relationships into the three dataset available (descriptor_record.nt, qualifier_records.nt and supplementary_records.nt)

Do you know where I could find that information in the form of rdf files ? If no, should we/I upgrade the extraction script to extract such information ?

chembl parser, release3 branch, fatal error

Dear all,

So while processing chembl components, the parser throws the following error messages:

==Error messages==

Processing components... PHP Notice: Undefined variable: component_classes_row in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on

Notice: Undefined variable: component_classes_row in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 594
PHP Fatal error: Call to a member function fetch_assoc() on a non-object in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 5

Fatal error: Call to a member function fetch_assoc() on a non-object in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 594

Clinicaltrials.gov: a missing record

when I try to serach bio2rdf for
NCT00299741

I get no results.
But a trial http://clinicaltrials.gov/ct2/show/NCT00299741?resultsxml=true
truely exists
http://clinicaltrials.gov/ct2/show/NCT00299741

Why it is not in bio2rdf?

I am using this access:
http://clinicaltrials.bio2rdf.org/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fclinicaltrials:NCT00299741
or even pure text string.

(it is in linkedct.org data
http://linkedct.org/resource/trial/NCT00299741
but even there it does not list correctly the result reference. It has 1 and the RDF data say 0.

Vojtech Huser

Merge on bio2rdf-scripts

Hi,
I would like for at least 1 person other than the submitter to comment (no problems or raise issue), and be responsible to merge and close the pull request. We could either do: whoever gets to the request first, or we could assign a reviewer for each committer.

what do you think?

Character encoding issues in Pharmgkb

When looking at diseases in Pharmgkb there are encoding issues with labels with special characters:

For example:
http://bio2rdf.org/pharmgkb:PA165108196

What comes out is (http://cu.pharmgkb.bio2rdf.org/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fpharmgkb%3APA165108196&graph=http%3A%2F%2Fbio2rdf.org%2Fpharmgkb_resource%3Abio2rdf.dataset.pharmgkb.R3):

rdfs:label Münchausen's syndrome [pharmgkb:PA165108196]
dcterms:title Münchausen's syndrome

The actual name from UMLS (via its cui) is:
Münchausen's syndrome (SY)
or
Munchausen's syndrome (PT)

This seems to happen in at least two browsers (Firefox and Chrome) and with both the bio2rdf Pharmgkb endpoint as well as a local copy I have. This happens for a few diseases in Pharmgkb.

Here is my original query - I am trying to get all diseases in Pharmgkb and their CUI's (if they have one):

PREFIX dc: http://purl.org/dc/elements/1.1/
PREFIX dct: http://purl.org/dc/terms/
PREFIX pharm: http://bio2rdf.org/pharmgkb_vocabulary:
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#

SELECT ?pgkb_id ?labl ?CUI WHERE {
?pgkb_id a pharm:Disease .
?pgkb_id rdfs:label ?labl .
OPTIONAL {?pgkb_id pharm:x-UMLS ?CUI .}

} order by ?pgkb_id

dataset versioning

The PHP-LIB offers support for setting a version for all source data, which is useful for datasets like uniprot or refseq. In GO Annotations, each dataset has a different version, so the current approach is not sufficient and would require a per-source versioning.

iproclass dataset, loading error

Has anyone encountered the following issues while loading the iproclass dataset into virtuoso (version 7.1.0)? There seems to be a malformed quad with multiple an invalid uniprot ids.

==Error message excerpts below==

Loading /var/preserve/bio2rdf/mirror/release/3/iproclass/t2.txt into http://bio2rdf.org/iproclass ...
Skipping line:1 ... Problem in: <http://bio2rdf.org/iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937> http://purl.org/dc/terms/identifier "iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937"^^http://www.w3.org/2001/XMLSchema#string http://bio2rdf.org/bio2rdf.dataset:bio2rdf-iproclass-20131213 .

Loading /var/preserve/bio2rdf/mirror/release/3/iproclass/t2.txt into http://bio2rdf.org/iproclass ...
Skipping line:1 ... Problem in: <http://bio2rdf.org/iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937> http://bio2rdf.org/bio2rdf_vocabulary:namespace "iproclass"^^http://www.w3.org/2001/XMLSchema#string http://bio2rdf.org/bio2rdf.dataset:bio2rdf-iproclass-20131213 .

Loading /var/preserve/bio2rdf/mirror/release/3/iproclass/t2.txt into http://bio2rdf.org/iproclass ...
Skipping line:1 ... Problem in: <http://bio2rdf.org/iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937> http://bio2rdf.org/bio2rdf_vocabulary:identifier "uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937"^^http://www.w3.org/2001/XMLSchema#string http://bio2rdf.org/bio2rdf.dataset:bio2rdf-iproclass-20131213 .

chembl parser, release3 branch, Undefined index error

Hi gang,

So I've encountered this error while attempting to process Chembl. The error message below occurs several hundred times. The parser is still running and has been for several days now.

==Error message snippet==
Notice: Undefined index: molformula in /home/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 697 [60/9422]
#0 error_handler(256, $literal is not a literal, /home/bio2rdf/php-lib/rdfapi.php, 139, Array ([s_uri] => http://bio2rdf.org/chembl:CHEMBL25706,[p_uri] => http://bio2rdf.org/chembl_vocabulary:compound-key,[literal] => ,[lang] => ,[lt_uri] => http://www.w3.org/2001/XMLSchema#string,[g_uri] => http://bio2rdf.org/bio2rdf.dataset:bio2rdf-chembl-20130930))
#1 trigger_error($literal is not a literal, 256) called at [/home/bio2rdf/php-lib/rdfapi.php:139]
#2 RDFFactory->QuadL(http://bio2rdf.org/chembl:CHEMBL25706, http://bio2rdf.org/chembl_vocabulary:compound-key, , , http://www.w3.org/2001/XMLSchema#string,http://bio2rdf.org/bio2rdf.dataset:bio2rdf-chembl-20130930) called at [/home/bio2rdf/php-lib/rdfapi.php:191]
#3 RDFFactory->QQuadL(chembl:CHEMBL25706, chembl_vocabulary:compound-key, , , xsd:string) called at [/home/bio2rdf/php-lib/bio2rdfapi.p

hp:512]
#4 Bio2RDFizer->triplifyString(chembl:CHEMBL25706, chembl_vocabulary:compound-key, ) called at [/home/bio2rdf/bio2rdf-scripts/chembl/ch

embl.php:714]
#5 ChemblParser->compounds(mysqli Object ([affected_rows] => ,[client_info] => ,[client_version] => ,[connect_errno] => ,[connect_error] => ,[errno] => ,[error] => ,[error_list] => ,[field_count] => ,[host_info] => ,[info] => ,[insert_id] => ,[server_info] => ,[server_version] => ,[stat] => ,[sqlstate] => ,[protocol_version] => ,[thread_id] => ,[warning_count] => )) called at [/home/prigor/projects/gradschool/genomics/newcrick/pediatrics/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php:137]
#6 ChemblParser->process() called at [/home/bio2rdf/bio2rdf-scripts/chembl/chembl.php:58]
#7 ChemblParser->run() called at [/home/bio2rdf/bio2rdf-scripts/runparser.php:52]
#8 Bio2RDFApp->__construct(Array ([0] => runparser.php,[1] => parser=chembl,[2] => files=all,[3] => download=false,[4] => db_name=Bio2Rdf,[5] => db_user=pediatrics,[6] => db_pass=pediatrics,[7] => indir=/home/bio2rdf/download/chembl/2/,[8] => outdir=/home/bio2rdf/rdf/chembl/2/,[9] => db_ip=db-igb.ics.uci.edu,[10] => db_port=5000,[11] => registry_dir=/home/bio2rdf/download/,[12] => process=true)) called at [/home/bio2rdf/bio2rdf-scripts/runparser.php:82]

Question: the URI for the "core" LinkedSPL graph

We have a question about how to create URIs for components of the LinkedSPLs resource. The "core" of the resources would represent all SPL sections and have predicates for active moieties and other items. Some of these will point to graphs that we develop as part of the project. We think that that main graph URI should be perhaps:
http://linkedSPLs.bio2rdf.org#

And the other graph URIs like:
http://linkedSPLs.bio2rdf.org/activeMoiety#
http://linkedSPLs.bio2rdf.org/adverseEvents#
etc.

Does this seem in line with current practice? We could also see using 'http://linkedSPLs.bio2rdf.org/core#' for the 'core' data so that all parts are named as distinct sub-graphs.

Data validation

Looking for thoughts on how we can appropriately test generated data from scripts. For example, script output must pass through x parser (e.g. rapper, sesame console, jena) before being accepted.

comments?

how to access results posted Yes/No

Some studies post summary resuts to CTG and other don't

For example this study has results posted
http://clinicaltrials.gov/show/NCT00320788

I can see the results in the RDF, but no simple flag indicating - resutls yes or no.
How can I access it?

Consider querying studies for wet macular degeneration: like this
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX dc: http://purl.org/dc/elements/1.1/
PREFIX dct: http://purl.org/dc/terms/
PREFIX ct: http://bio2rdf.org/clinicaltrials:
PREFIX ctv: http://bio2rdf.org/clinicaltrials_vocabulary:

SELECT *
WHERE {
?x ctv:condition-mesh http://bio2rdf.org/clinicaltrials_resource:4938444ea4887b824f41147e31329e6c.
?x rdfs:label ?label.
?x ctv:phase [rdfs:label ?phase]
}

and I want to go to "ctv:has_result" flag.

HGNC download page moved and has broken script

There is strange output when running the HGNC parser

php hgnc.php indir=/tmp/ outdir=/tmp/

The following is generated:

**Warning** Invalid namespace: for qname:                 $('.gene-search-form  #gntext').val('').css('color', '#000');
. Using http://bio2rdf.org/:$('.gene-search-form #gntext').val('').css('color', '#000');
**Warning** Invalid namespace: for qname:                 $('.gene-search-form    #gntext').val('').css('color', '#000');
. Using http://bio2rdf.org/:$('.gene-search-form #gntext').val('').css('color', '#000');
**Warning** Invalid namespace: for qname:                 $('.gene-search-form     #gntext').val('').css('color', '#000');
. Using http://bio2rdf.org/:$('.gene-search-form #gntext').val('').css('color', '#000');
**Warning** Invalid namespace: for qname:            }
. Using http://bio2rdf.org/:}
**Warning** Invalid namespace: for qname:            }
. Using http://bio2rdf.org/:}
**Warning** Invalid namespace: for qname:            }

homologene parameters

Hi Al,

when trying to run the homologene parser without parameters:

php homologene.php

the application runs in the default without printing the list of parameters. It looks like there are no required parameters. Is this on purpose or should there be at least one required parameter?

Dana

linkedSPLs-improve the the way of getting FDA pharmgx mappings

Currently, we are querying rdf graph from virtuoso endpoint to get dailymed setid by rxcuis. This causes dump and load linkedSPLs core graph twice during update. I suggests that we switch to SQL agaist Mysql endpoint because all the mappings are stored before the dump process. This will be a speed up the whole updating process.

entrez gene parser files download

The entrez gene parser does not seem allow downloading and parsing of files.

php entrez_gene.php download=true files=all indir=/home/dankla/bio2rdf-dataspaces/downloads/gene/ outdir=/home/dankla/bio2rdf-dataspaces/data/gene/
PHP Warning: gzopen(/home/dankla/bio2rdf-dataspaces/downloads/gene/GENE_INFO/All_Data.gene_info.gz): failed to open stream: No such file or directory in /home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/gene/entrez_gene.php on line 100
Could not open file GENE_INFO/All_Data.gene_info.gz!

The indir and outdir directories are successfully created but no files are downloaded despite the download flag being set to true. Am i missing something or is this functionality not yet included in the script?

irefindex

trying to run the irefindex script generates the following error :

All.mitab.10182011.txt.zip copied to /home/dankla/bio2rdf-dataspaces/downloads/irefindex/
Processing all ...PHP Warning: fgets(): Zip stream error: Premature EOF in /home/dankla/bio2rdf-dataspaces/php-lib/fileapi.php on line 80
PHP Notice: Undefined offset: 13 in /home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php on line 158
#0 error_handler(256, Invalid namespace for :, /home/dankla/bio2rdf-dataspaces/php-lib/ns.php, 477, Array ([qname] => :,[delimiter] => :,[ns] => ,[id] => ))
#1 trigger_error(Invalid namespace for :, 256) called at [/home/dankla/bio2rdf-dataspaces/php-lib/ns.php:477]
#2 CNamespace->MapQName(:) called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:162]
#3 iREFINDEXParser->Parse() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:115]
#4 iREFINDEXParser->Run() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:357]
#0 error_handler(256, Invalid qname for , /home/dankla/bio2rdf-dataspaces/php-lib/ns.php, 494, Array ([qname] => ,[ns] => ,[id] => ))
#1 trigger_error(Invalid qname for , 256) called at [/home/dankla/bio2rdf-dataspaces/php-lib/ns.php:494]
#2 CNamespace->getFQURI() called at [/home/dankla/bio2rdf-dataspaces/php-lib/rdfapi.php:161]
#3 RDFFactory->QQuad(, void:inDataset, bio2rdf_dataset:bio2rdf-irefindex-20120906) called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:164]
#4 iREFINDEXParser->Parse() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:115]
#5 iREFINDEXParser->Run() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:357]

homologene parser

Running the homologene parser

PHP Notice: Undefined variable: rdir in /home/dankla/bio2rdf-dataspaces/bio2rdf->scripts/homologene/homologene.php on line 88
downloading homologene.data ... PHP Warning: file_get_contents(homologene.data): failed to open stream: No >such file or directory in /home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/homologene/homologene.php on line 90

ClinicalTrials.gov - missing adverse event results

As part of OHDSI (http://ohdsi.org/), and open collaborative, we are
looking at various data sources that would be helpful for identifying
drugs without adverse events. ClinicalTrials.gov is actually a
potentially very useful source. We explain how to get adverse events
from the XML of a CT.gov entry here: http://bit.ly/1iPhq9W

I tried to find the same information in bio2rdf ClinicalTrials.gov but was unable to.
There does not seem to be an "event" resource anywhere in the graph. Though, there are resources that represent outcomes (primary and secondary) and note if
the if the outcomes are safety related. When you can spare a minute,
would you please check to see if the XML the scripts you use to create
the resource might not be properly loading "event" resources?

Thank you,
-Rich Boyce

linkedSPLs - documentation todos

TODOs for documentation of the linkeSPLs project:

  • Top level README is stubbed out for sections - complete the documentation. Be sure to explain how the data is loaded from DailyMed, UMLS RxNORM, DRON, FDA UNII, FDA PharmGX, and drug-drug interactions. Refer to the bio2rdf conventions and how mappings are generatted for drug entities and pharmgx
  • Complete the README documentation for each sub-graph.

typing literals

We should systematically go through each parser and assert the proper datatype for each literal value. The datatypes should be selected from the OWL datatype map for compatibility:
http://www.w3.org/TR/owl2-syntax/#Datatype_Maps

I suspect the following are the most valuable:

  • xsd:string
  • xsd:int
  • xsd:float (and xsd:double only when necessary)
  • xsd:boolean
  • xsd:dateTime (set hours, minutes and seconds to 0 if they are not specified, using Z time for timezone whereever possible if a timezone is specified. leave out Z if a timezone is not specified)

ctd, release3 branch, error during import

For all of the ctd tables that have been processed, the virtuoso server complains about a syntax error. I believe the error lies with the url's having one front-slash instead of two.

eg, <http:/bio2rdf.org/ctd_vocabulary:abu> <http:/purl.org/dc/terms/identifier>

Below is one example for ctd_chem_gene_ixns.nq.

ctd_chem_gene_ixn_types.nq.gz
ctd_chem_go_enriched.nq.gz
ctd_chemicals_diseases.nq.gz
ctd_chemicals.nq.gz
ctd_chem_pathways_enriched.nq.gz
ctd_diseases.nq.gz
ctd_diseases_pathways.nq.gz
ctd_genes_diseases.nq.gz
ctd_genes_pathways.nq.gz
ctd_pathways.nq.gz

Connected to OpenLink Virtuoso
Driver: 07.00.3203 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
*** Error 22007: [Virtuoso Driver][Virtuoso Server]XM033: XML parser detected an error:
    ERROR  : Syntax error in the attribute list (no whitespace)
at line 1 column 9 of source text
<http:/bio2rdf.org/ctd_resource:C1122973784> <http:/purl.org/dc/terms/identi
  ------^
at line 0 of Top-Level:
DB.DBA.RDF_LOAD_RDFXML_MT(file_to_string_output ('/home/bio2rdf/rdf/ctd/2/ctd_chem_gene_ixns.nq'), '', 'http:/bio2rdf.org/ctd_chem_gene_ixns', 784, 16)
Done. -- 4977 msec.

sider parser (release 3 branch): fatal php error, string size overflow

While processing one of the input files, the parser barfs with the following error:

Generating dataset description... Processing meddra_freq_parsed... PHP Fatal error: String size overflow in /home/prigor/projects/gradschool/genomics/newcrick/pediatrics/semanticweb/bio2rdf/php-lib/rdfap

Fatal error: String size overflow in /home/prigor/projects/gradschool/genomics/newcrick/pediatrics/semanticweb/bio2rdf/php-lib/rdfapi.php on line 63

Has anyone encountered this?

Repo commit and merge procedure

Hi All,

I would like to start a discussion about how the Bio2RDF repo is managed by core contributors. Personally, I find the current model of everybody forking the project difficult to manage and resolve assigned issues. I'd like to come up with a common set of best practices for working on the bio2rdf project. This would translate into a wiki post with the specific git commands for contributing and resolving issues. This will help existing and future contributors become more familiar with git and the bio2rdf project.

I would suggest, core contributors, all work off one repo (bio2rdf/bio2rdf-scripts). This repo would have two main branches, master and develop. Issues would be resolved by branching from develop to a issue specific branch, commit, and issuing a merge request back to develop for code review. Once merged by the assigned person the branch could be deleted. Prior to merge with develop no more changes to feature branch will be made.

Core contributors take out a social contract that if they break the master ( e.g. poor code review prior to merge with master) they will loose coffee privileges for the day :)

Ideas? Thoughts?

Cheers,

Dana Klassen

goa parser

please update the parser to use the framework.

please rewrite to match that in gene.gene2go(); the idea is that we want to assert function,process,location directly and to indirectly link evidence through a gene-go-association.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.