bio2rdf / bio2rdf-scripts Goto Github PK

View Code? Open in Web Editor NEW

128.0 26.0 46.0 115.17 MB

Scripts that Bio2RDF users have created to generate RDF versions of scientific datasets

Home Page: http://bio2rdf.org/

License: Other

PHP 11.99% Perl 0.18% Python 1.09% Shell 1.07% HTML 5.92% Java 77.30% XSLT 0.06% Ruby 2.39% Batchfile 0.01%

bio2rdf-scripts's Introduction

Bio2RDF-scripts

This Git repository holds all of the RDF converter scripts used to generate Bio2RDF linked data.

Requirements

See the wiki for details.

Licensed under MIT License, see license page for details.

bio2rdf-scripts's People

Contributors

Stargazers

Watchers

Forkers

alisoncallahan micheldumontier dklassen manuels ar5ham jeremycarroll juancifuentes gabivulcu amarillion htanya theoryno3 keii27 jctoledo sgtp ompandey iuidsl seandavi georgevd hansmeets anyuanay zorino julie-sullivan ngopina88 xiaoqiangcs jvsoest pedrohserrano ningyifan amalic yingkaizhang ontario-datalake monfrost kingfish777 lucian-whu guettnerbianca awesomedeepai rpatil524 vemonet invincibled wuyunhaowuyunhao ldk0122 daiqh1 kqingcan danolez1 axelrolov xiaomingaaa 15153462274

bio2rdf-scripts's Issues

how to access results posted Yes/No

Some studies post summary resuts to CTG and other don't

For example this study has results posted
http://clinicaltrials.gov/show/NCT00320788

I can see the results in the RDF, but no simple flag indicating - resutls yes or no.
How can I access it?

Consider querying studies for wet macular degeneration: like this
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX dc: http://purl.org/dc/elements/1.1/
PREFIX dct: http://purl.org/dc/terms/
PREFIX ct: http://bio2rdf.org/clinicaltrials:
PREFIX ctv: http://bio2rdf.org/clinicaltrials_vocabulary:

SELECT *
WHERE {
?x ctv:condition-mesh http://bio2rdf.org/clinicaltrials_resource:4938444ea4887b824f41147e31329e6c.
?x rdfs:label ?label.
?x ctv:phase [rdfs:label ?phase]
}

and I want to go to "ctv:has_result" flag.

hgnc parser

i modified the parser to generate safe literal for a few entries where quotes were still intact

here's the loaded data:

http://bio2rdf.semanticscience.org:8019/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fhgnc%3A5171

two issues

x-uniprot is not pointing to a uniprot entry
x-mgi points to http://bio2rdf.org/mgi:MGI:105112, so the mgi prefix needs to be removed

We should systematically go through each parser and assert the proper datatype for each literal value. The datatypes should be selected from the OWL datatype map for compatibility:
http://www.w3.org/TR/owl2-syntax/#Datatype_Maps

I suspect the following are the most valuable:

xsd:string
xsd:int
xsd:float (and xsd:double only when necessary)
xsd:boolean
xsd:dateTime (set hours, minutes and seconds to 0 if they are not specified, using Z time for timezone whereever possible if a timezone is specified. leave out Z if a timezone is not specified)

linkedSPLs-add unit test program for checking updated mappings

finally, we should create another sub-program in linkedSPLs for quality check, including validates the correctness of mappings and competency queries.

LinkedSPLs - Ensure active moieties used are present in the FDA's validation file

It turns out that some SPLs list incorrect data in the active moiety. Fortunately, the FDA provides a validation file:
http://www.fda.gov/downloads/ForIndustry/DataStandards/StructuredProductLabeling/UCM362965.zip

This needs to be used to ensure that the active moiety associations for LinkesSPLs are accurate.

consistent branches between php-lib and bio2rdf-scripts?

Hello,

Which branches are compatible between php-lib and bio2rdf-scripts?

Thanks,
Paul

Enhance DrugBank by parsing actions.

parse action labels and assert more info

https://github.com/jimmccusker/melagrid/blob/master/data/source/data-melagrid-org/drugbank/version/drugbank.xsl

ClinicalTrials.gov - missing adverse event results

As part of OHDSI (http://ohdsi.org/), and open collaborative, we are
looking at various data sources that would be helpful for identifying
drugs without adverse events. ClinicalTrials.gov is actually a
potentially very useful source. We explain how to get adverse events
from the XML of a CT.gov entry here: http://bit.ly/1iPhq9W

I tried to find the same information in bio2rdf ClinicalTrials.gov but was unable to.
There does not seem to be an "event" resource anywhere in the graph. Though, there are resources that represent outcomes (primary and secondary) and note if
the if the outcomes are safety related. When you can spare a minute,
would you please check to see if the XML the scripts you use to create
the resource might not be properly loading "event" resources?

Thank you,
-Rich Boyce

problem with drugbank synonyms

only one synonym is being extracted

e.g. only "2-Acetoxybenzenecarboxylic acid" in http://bio2rdf.semanticscience.org:12050/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fdrugbank%3ADB00945

where drugbank has 2 dozen
http://www.drugbank.ca/drugs/DB00945

DrugBank 4.0 update

DrugBank 4.0 recently released. The dataset is now available in XML at http://www.drugbank.ca/downloads/. It would be helpful to run the bio2rdf scripts to synchronize with the current version of Drugbank.
Thanks!
Serkan Ayvaz

Merge on bio2rdf-scripts

Hi,
I would like for at least 1 person other than the submitter to comment (no problems or raise issue), and be responsible to merge and close the pull request. We could either do: whoever gets to the request first, or we could assign a reviewer for each committer.

what do you think?

put up endpoints

homologene parser

Running the homologene parser

PHP Notice: Undefined variable: rdir in /home/dankla/bio2rdf-dataspaces/bio2rdf->scripts/homologene/homologene.php on line 88
downloading homologene.data ... PHP Warning: file_get_contents(homologene.data): failed to open stream: No >such file or directory in /home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/homologene/homologene.php on line 90

sider parser (release 3 branch): fatal php error, string size overflow

While processing one of the input files, the parser barfs with the following error:

Generating dataset description... Processing meddra_freq_parsed... PHP Fatal error: String size overflow in /home/prigor/projects/gradschool/genomics/newcrick/pediatrics/semanticweb/bio2rdf/php-lib/rdfap

Fatal error: String size overflow in /home/prigor/projects/gradschool/genomics/newcrick/pediatrics/semanticweb/bio2rdf/php-lib/rdfapi.php on line 63

Has anyone encountered this?

problem with drugbank:clearance

The predicate used to identify the clearance does not expand to its' full URI

drugbank_vocabulary:clearance

should be

http://bio2rdf.org/drugbank_vocabulary:clearance

Data validation

Looking for thoughts on how we can appropriately test generated data from scripts. For example, script output must pass through x parser (e.g. rapper, sesame console, jena) before being accepted.

comments?

chembl parser, release3 branch, fatal error

Dear all,

So while processing chembl components, the parser throws the following error messages:

==Error messages==

Processing components... PHP Notice: Undefined variable: component_classes_row in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on

Notice: Undefined variable: component_classes_row in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 594
PHP Fatal error: Call to a member function fetch_assoc() on a non-object in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 5

Fatal error: Call to a member function fetch_assoc() on a non-object in /home/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 594

issue in ncbi gene

For the resource http://bio2rdf.org/geneid:100309489

I found is being generated:

http://bio2rdf.org/geneid:100309489 http://bio2rdf.org/geneid_vocabulary:has_dbxref 810

suggest checking output from ncbi script.

TCM dataset

Would like to get the TCM data into Bio2RDF.

The data is now available at: http://www.open-biomed.org.uk/tcmdata/
and the SPARQL endpoint at : http://www.open-biomed.org.uk/rdf-tcm/

will ask whether the data is one time curated, or whether it can be configured to draw on some primary source.

irefindex

trying to run the irefindex script generates the following error :

All.mitab.10182011.txt.zip copied to /home/dankla/bio2rdf-dataspaces/downloads/irefindex/
Processing all ...PHP Warning: fgets(): Zip stream error: Premature EOF in /home/dankla/bio2rdf-dataspaces/php-lib/fileapi.php on line 80
PHP Notice: Undefined offset: 13 in /home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php on line 158
#0 error_handler(256, Invalid namespace for :, /home/dankla/bio2rdf-dataspaces/php-lib/ns.php, 477, Array ([qname] => :,[delimiter] => :,[ns] => ,[id] => ))
#1 trigger_error(Invalid namespace for :, 256) called at [/home/dankla/bio2rdf-dataspaces/php-lib/ns.php:477]
#2 CNamespace->MapQName(:) called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:162]
#3 iREFINDEXParser->Parse() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:115]
#4 iREFINDEXParser->Run() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:357]
#0 error_handler(256, Invalid qname for , /home/dankla/bio2rdf-dataspaces/php-lib/ns.php, 494, Array ([qname] => ,[ns] => ,[id] => ))
#1 trigger_error(Invalid qname for , 256) called at [/home/dankla/bio2rdf-dataspaces/php-lib/ns.php:494]
#2 CNamespace->getFQURI() called at [/home/dankla/bio2rdf-dataspaces/php-lib/rdfapi.php:161]
#3 RDFFactory->QQuad(, void:inDataset, bio2rdf_dataset:bio2rdf-irefindex-20120906) called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:164]
#4 iREFINDEXParser->Parse() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:115]
#5 iREFINDEXParser->Run() called at [/home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/irefindex/irefindex.php:357]

linkedSPLs-improve the the way of getting FDA pharmgx mappings

Currently, we are querying rdf graph from virtuoso endpoint to get dailymed setid by rxcuis. This causes dump and load linkedSPLs core graph twice during update. I suggests that we switch to SQL agaist Mysql endpoint because all the mappings are stored before the dump process. This will be a speed up the whole updating process.

HGNC download page moved and has broken script

There is strange output when running the HGNC parser

php hgnc.php indir=/tmp/ outdir=/tmp/

The following is generated:

**Warning** Invalid namespace: for qname:                 $('.gene-search-form  #gntext').val('').css('color', '#000');
. Using http://bio2rdf.org/:$('.gene-search-form #gntext').val('').css('color', '#000');
**Warning** Invalid namespace: for qname:                 $('.gene-search-form    #gntext').val('').css('color', '#000');
. Using http://bio2rdf.org/:$('.gene-search-form #gntext').val('').css('color', '#000');
**Warning** Invalid namespace: for qname:                 $('.gene-search-form     #gntext').val('').css('color', '#000');
. Using http://bio2rdf.org/:$('.gene-search-form #gntext').val('').css('color', '#000');
**Warning** Invalid namespace: for qname:            }
. Using http://bio2rdf.org/:}
**Warning** Invalid namespace: for qname:            }
. Using http://bio2rdf.org/:}
**Warning** Invalid namespace: for qname:            }

chembl parser, release3 branch, Undefined index error

Hi gang,

So I've encountered this error while attempting to process Chembl. The error message below occurs several hundred times. The parser is still running and has been for several days now.

==Error message snippet==
Notice: Undefined index: molformula in /home/bio2rdf/bio2rdf-scripts/chembl/chembl.php on line 697 [60/9422]
#0 error_handler(256, $literal is not a literal, /home/bio2rdf/php-lib/rdfapi.php, 139, Array ([s_uri] => http://bio2rdf.org/chembl:CHEMBL25706,[p_uri] => http://bio2rdf.org/chembl_vocabulary:compound-key,[literal] => ,[lang] => ,[lt_uri] => http://www.w3.org/2001/XMLSchema#string,[g_uri] => http://bio2rdf.org/bio2rdf.dataset:bio2rdf-chembl-20130930))
#1 trigger_error($literal is not a literal, 256) called at [/home/bio2rdf/php-lib/rdfapi.php:139]
#2 RDFFactory->QuadL(http://bio2rdf.org/chembl:CHEMBL25706, http://bio2rdf.org/chembl_vocabulary:compound-key, , , http://www.w3.org/2001/XMLSchema#string,http://bio2rdf.org/bio2rdf.dataset:bio2rdf-chembl-20130930) called at [/home/bio2rdf/php-lib/rdfapi.php:191]
#3 RDFFactory->QQuadL(chembl:CHEMBL25706, chembl_vocabulary:compound-key, , , xsd:string) called at [/home/bio2rdf/php-lib/bio2rdfapi.p

hp:512]
#4 Bio2RDFizer->triplifyString(chembl:CHEMBL25706, chembl_vocabulary:compound-key, ) called at [/home/bio2rdf/bio2rdf-scripts/chembl/ch

embl.php:714]
#5 ChemblParser->compounds(mysqli Object ([affected_rows] => ,[client_info] => ,[client_version] => ,[connect_errno] => ,[connect_error] => ,[errno] => ,[error] => ,[error_list] => ,[field_count] => ,[host_info] => ,[info] => ,[insert_id] => ,[server_info] => ,[server_version] => ,[stat] => ,[sqlstate] => ,[protocol_version] => ,[thread_id] => ,[warning_count] => )) called at [/home/prigor/projects/gradschool/genomics/newcrick/pediatrics/semanticweb/bio2rdf/bio2rdf-scripts/chembl/chembl.php:137]
#6 ChemblParser->process() called at [/home/bio2rdf/bio2rdf-scripts/chembl/chembl.php:58]
#7 ChemblParser->run() called at [/home/bio2rdf/bio2rdf-scripts/runparser.php:52]
#8 Bio2RDFApp->__construct(Array ([0] => runparser.php,[1] => parser=chembl,[2] => files=all,[3] => download=false,[4] => db_name=Bio2Rdf,[5] => db_user=pediatrics,[6] => db_pass=pediatrics,[7] => indir=/home/bio2rdf/download/chembl/2/,[8] => outdir=/home/bio2rdf/rdf/chembl/2/,[9] => db_ip=db-igb.ics.uci.edu,[10] => db_port=5000,[11] => registry_dir=/home/bio2rdf/download/,[12] => process=true)) called at [/home/bio2rdf/bio2rdf-scripts/runparser.php:82]

ctd, release3 branch, error during import

For all of the ctd tables that have been processed, the virtuoso server complains about a syntax error. I believe the error lies with the url's having one front-slash instead of two.

eg, <http:/bio2rdf.org/ctd_vocabulary:abu> <http:/purl.org/dc/terms/identifier>

Below is one example for ctd_chem_gene_ixns.nq.

ctd_chem_gene_ixn_types.nq.gz
ctd_chem_go_enriched.nq.gz
ctd_chemicals_diseases.nq.gz
ctd_chemicals.nq.gz
ctd_chem_pathways_enriched.nq.gz
ctd_diseases.nq.gz
ctd_diseases_pathways.nq.gz
ctd_genes_diseases.nq.gz
ctd_genes_pathways.nq.gz
ctd_pathways.nq.gz

Connected to OpenLink Virtuoso
Driver: 07.00.3203 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
*** Error 22007: [Virtuoso Driver][Virtuoso Server]XM033: XML parser detected an error:
    ERROR  : Syntax error in the attribute list (no whitespace)
at line 1 column 9 of source text
<http:/bio2rdf.org/ctd_resource:C1122973784> <http:/purl.org/dc/terms/identi
  ------^
at line 0 of Top-Level:
DB.DBA.RDF_LOAD_RDFXML_MT(file_to_string_output ('/home/bio2rdf/rdf/ctd/2/ctd_chem_gene_ixns.nq'), '', 'http:/bio2rdf.org/ctd_chem_gene_ixns', 784, 16)
Done. -- 4977 msec.

NCBI Gene parser URI error

2 incorrect URIs are being generated by the current version of the ncbi_gene parser i.e:

http://bio2rdf.org/geneid:vocabulary:Gene
http://bio2rdf.org/geneid:vocabulary:protein-coding-gene

Each of these should respectively read:

http://bio2rdf.org/geneid_vocabulary:Gene
http://bio2rdf.org/geneid_vocabulary:protein-coding-gene

[MESH] Extraction of Concepts, Terms and their semantic relation

I'm trying to create a graph/tree of the MESH thesaurus in order to do some semantic inferencing by performing graph traversal on semantic relationships.

The bio2rdf MESH datasets is very well suited to let me load it into a graph database. However, I don't find the Mesh Concepts, and Terms and their relationships into the three dataset available (descriptor_record.nt, qualifier_records.nt and supplementary_records.nt)

Do you know where I could find that information in the form of rdf files ? If no, should we/I upgrade the extraction script to extract such information ?

affymetrix release 3 update

update parser using the new php-lib.

dataset versioning

The PHP-LIB offers support for setting a version for all source data, which is useful for datasets like uniprot or refseq. In GO Annotations, each dataset has a different version, so the current approach is not sufficient and would require a per-source versioning.

Character encoding issues in Pharmgkb

When looking at diseases in Pharmgkb there are encoding issues with labels with special characters:

For example:
http://bio2rdf.org/pharmgkb:PA165108196

What comes out is (http://cu.pharmgkb.bio2rdf.org/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fpharmgkb%3APA165108196&graph=http%3A%2F%2Fbio2rdf.org%2Fpharmgkb_resource%3Abio2rdf.dataset.pharmgkb.R3):

rdfs:label MÃ¼nchausen's syndrome [pharmgkb:PA165108196]
dcterms:title MÃ¼nchausen's syndrome

The actual name from UMLS (via its cui) is:
Münchausen's syndrome (SY)
or
Munchausen's syndrome (PT)

This seems to happen in at least two browsers (Firefox and Chrome) and with both the bio2rdf Pharmgkb endpoint as well as a local copy I have. This happens for a few diseases in Pharmgkb.

Here is my original query - I am trying to get all diseases in Pharmgkb and their CUI's (if they have one):

PREFIX dc: http://purl.org/dc/elements/1.1/
PREFIX dct: http://purl.org/dc/terms/
PREFIX pharm: http://bio2rdf.org/pharmgkb_vocabulary:
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#

SELECT ?pgkb_id ?labl ?CUI WHERE {
?pgkb_id a pharm:Disease .
?pgkb_id rdfs:label ?labl .
OPTIONAL {?pgkb_id pharm:x-UMLS ?CUI .}

} order by ?pgkb_id

MeSH makeDescriptorRecord incomplete capturing of PRINT ENTRY and ENTRY terms

As noted here: https://groups.google.com/forum/#!topic/bio2rdf/eUMK67M_ia8
the parsing of descriptor records does not fully capture all "PRINTENTRY" and "ENTRY" terms.

culprit:
private function makeDescriptorRecord($desc_record_arr){...} in https://github.com/bio2rdf/bio2rdf-scripts/blob/master/mesh/mesh_parser.php#L54

interpro parser

working on a new interpro parser.

Inconsistent URI usage between MeSH descriptor files

URIs used to refer to the same term between the three records (descriptor, qualifer and supplamentary ) files are not normalized. See: ftp://nlmpubs.nlm.nih.gov/online/mesh/.asciimesh/

xrefs type and labeling

We suggest to provide the following information for references to datasets:

add a dc:identifer "namespace:identifier"
add a type as http://bio2rdf.org/namespace_vocabulary:Resource

such that for ns:id,
:a bio2rdf_vocabulary:x-ns ns:id .
ns:id a ns_vocabulary:resource;
dc:identifer "ns:id".

Question: the URI for the "core" LinkedSPL graph

We have a question about how to create URIs for components of the LinkedSPLs resource. The "core" of the resources would represent all SPL sections and have predicates for active moieties and other items. Some of these will point to graphs that we develop as part of the project. We think that that main graph URI should be perhaps:
http://linkedSPLs.bio2rdf.org#

And the other graph URIs like:
http://linkedSPLs.bio2rdf.org/activeMoiety#
http://linkedSPLs.bio2rdf.org/adverseEvents#
etc.

Does this seem in line with current practice? We could also see using 'http://linkedSPLs.bio2rdf.org/core#' for the 'core' data so that all parts are named as distinct sub-graphs.

Clinicaltrials.gov: a missing record

when I try to serach bio2rdf for
NCT00299741

I get no results.
But a trial http://clinicaltrials.gov/ct2/show/NCT00299741?resultsxml=true
truely exists
http://clinicaltrials.gov/ct2/show/NCT00299741

Why it is not in bio2rdf?

I am using this access:
http://clinicaltrials.bio2rdf.org/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fclinicaltrials:NCT00299741
or even pure text string.

(it is in linkedct.org data
http://linkedct.org/resource/trial/NCT00299741
but even there it does not list correctly the result reference. It has 1 and the RDF data say 0.

Vojtech Huser

unique name in CTD interactions

Need to specify a unique key for each entry, as there is a clash with just chemical + gene:

http://ctd.bio2rdf.org/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fctd_resource%3AD0029452

http://ctdbase.org/detail.go?type=chem&acc=D002945&view=ixn

Repo commit and merge procedure

Hi All,

I would like to start a discussion about how the Bio2RDF repo is managed by core contributors. Personally, I find the current model of everybody forking the project difficult to manage and resolve assigned issues. I'd like to come up with a common set of best practices for working on the bio2rdf project. This would translate into a wiki post with the specific git commands for contributing and resolving issues. This will help existing and future contributors become more familiar with git and the bio2rdf project.

I would suggest, core contributors, all work off one repo (bio2rdf/bio2rdf-scripts). This repo would have two main branches, master and develop. Issues would be resolved by branching from develop to a issue specific branch, commit, and issuing a merge request back to develop for code review. Once merged by the assigned person the branch could be deleted. Prior to merge with develop no more changes to feature branch will be made.

Core contributors take out a social contract that if they break the master ( e.g. poor code review prior to merge with master) they will loose coffee privileges for the day :)

Ideas? Thoughts?

Cheers,

Dana Klassen

Omim warnings about invalid namespace

Running the OMIM parser is generating these warnings:

Warning Invalid namespace:img for qname:img:Epicanthus-small.jpg. Using http://bio2rdf.org/img:Epicanthus-small.jpg

Is this expected behaviour?

goa parser

please update the parser to use the framework.

please rewrite to match that in gene.gene2go(); the idea is that we want to assert function,process,location directly and to indirectly link evidence through a gene-go-association.

implement site maps

homologene parameters

Hi Al,

when trying to run the homologene parser without parameters:

php homologene.php

the application runs in the default without printing the list of parameters. It looks like there are no required parameters. Is this on purpose or should there be at least one required parameter?

Dana

pubmed parser: no date or pages

the resulting PubMed RDF does not contain (year) dates and page ranges.

Add GO mappings

http://geneontology.org/external2go/ has the most complete listing

ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/ seems to be a subset

drugbank DDI label includes unescaped newline characters

Need to use safe literal on the generated label.

iproclass dataset, loading error

Has anyone encountered the following issues while loading the iproclass dataset into virtuoso (version 7.1.0)? There seems to be a malformed quad with multiple an invalid uniprot ids.

==Error message excerpts below==

Loading /var/preserve/bio2rdf/mirror/release/3/iproclass/t2.txt into http://bio2rdf.org/iproclass ...
Skipping line:1 ... Problem in: <http://bio2rdf.org/iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937> http://purl.org/dc/terms/identifier "iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937"^^http://www.w3.org/2001/XMLSchema#string http://bio2rdf.org/bio2rdf.dataset:bio2rdf-iproclass-20131213 .

Loading /var/preserve/bio2rdf/mirror/release/3/iproclass/t2.txt into http://bio2rdf.org/iproclass ...
Skipping line:1 ... Problem in: <http://bio2rdf.org/iproclass:uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937> http://bio2rdf.org/bio2rdf_vocabulary:identifier "uniprot:75070738; 440648385; 116242084; 381277118; 223014445; 262343945; 393660255; 545765794; 326200806; 171675572; 122938060; 313267586; 381269698; 381235637; 375070038; 223011463; 168252164; 306976300; 71012654; 290544683; 552099586; 71015856; 223009293; 298367672; 381220965; 381263594; 118122797; 71017146; 283047974; 513791906; 410062122; 356476889; 126038553; 381220321; 545755337; 133712399; 119221318; 32892886; 150023363; 545773508; 71011533; 119035042; 71016641; 342840601; 410061422; 375067798; 545760936; 440652669; 94980847; 396580364; 302376750; 305655998; 545772752; 40848968; 349501937; 440653985; 302375924; 513789232; 381237709; 306976986; 385257755; 51894941; 213990019; 194580074; 290758436; 375066580; 57904139; 381242007; 218454552; 350282678; 381250715; 151334971; 381254929; 378744438; 528078892; 187960896; 224036663; 126038572; 156077456; 545760754; 393190388; 151334146; 496528315; 32348620; 381265624; 381267696; 381244919; 315021963; 359468606; 513790394; 334738648; 309752115; 381239417; 381231017; 381248909; 61652721; 189179453; 171675936; 401710617; 375068736; 381220237; 381225039; 51450507; 545770162; 301505924; 223008761; 305654864; 71017217; 171674984; 156455294; 440653369; 310776622; 381246333; 530848012; 393660591; 83266850; 223010427; 150023474; 213493011; 344267169; 150023044; 381270944; 545750325; 310776398; 381257225; 69065120; 145967460; 82492497; 444328782; 545765990; 32894300; 381251835; 251765638; 71008964; 381275452; 381226551; 240252114; 187961162; 223008985; 32894958; 381259913; 381236995; 61393516; 215883812; 171674718; 545752705; 381219971; 381243953; 83266934; 381249791; 347809526; 410064684; 385254773; 440647993; 116241692; 156455696; 381242301; 430728523; 302320751; 332379800; 381252255; 530926823; 513789526; 381273716; 381242483; 442565814; 381224437; 401711107; 375067490; 381230709; 381232823; 545753489; 223009489; 381248937"^^http://www.w3.org/2001/XMLSchema#string http://bio2rdf.org/bio2rdf.dataset:bio2rdf-iproclass-20131213 .

linkedSPLs - documentation todos

TODOs for documentation of the linkeSPLs project:

Top level README is stubbed out for sections - complete the documentation. Be sure to explain how the data is loaded from DailyMed, UMLS RxNORM, DRON, FDA UNII, FDA PharmGX, and drug-drug interactions. Refer to the bio2rdf conventions and how mappings are generatted for drug entities and pharmgx
Complete the README documentation for each sub-graph.

entrez gene parser files download

The entrez gene parser does not seem allow downloading and parsing of files.

php entrez_gene.php download=true files=all indir=/home/dankla/bio2rdf-dataspaces/downloads/gene/ outdir=/home/dankla/bio2rdf-dataspaces/data/gene/
PHP Warning: gzopen(/home/dankla/bio2rdf-dataspaces/downloads/gene/GENE_INFO/All_Data.gene_info.gz): failed to open stream: No such file or directory in /home/dankla/bio2rdf-dataspaces/bio2rdf-scripts/gene/entrez_gene.php on line 100
Could not open file GENE_INFO/All_Data.gene_info.gz!

The indir and outdir directories are successfully created but no files are downloaded despite the download flag being set to true. Am i missing something or is this functionality not yet included in the script?