paleobiodb / data_service Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 0.0 117.55 MB

The PBDB Data Service, API and table/system maintenance scripts

License: Artistic License 2.0

Perl 96.06% Shell 0.07% JavaScript 1.44% CSS 0.97% HTML 1.34% Roff 0.08% Dockerfile 0.03%

data_service's People

Contributors

Stargazers

Watchers

data_service's Issues

Large downloads produce 500 error.

From Steve Wang ([email protected]:

I'm trying to download phylum- or class-level datasets from @PaleoDB, but I keep getting '500 Server Error' using either the API or Navigator. It seems to work for smaller taxa (Bryozoa), but not for larger ones (Mollusca). Is there a way to fix this problem, or a workaround?

Example: the following gives me a ‘500 Server error’:

https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&base_name=MOLLUSCA&taxon_reso=lump_genus&show=class,acconly

but if I replace MOLLUSCA with BRYOZOA, it works fine. It seems to get stuck on larger taxa (with many genera).

I have been able to reproduce this problem.

Change permissions for adding type locality in Add/Edit Names

The "Type locality (PaleoDB number)" field in the Add/Edit Names page for any taxon seems to be locked so that users can't enter values to the field. It seems to only be entered "internally" when a "n. gen." and/or "n. sp." tag was entered in a biotic list. Because secondary users can't edit other's biotic lists to add the tags, there's really no way for most users to enter values to this field (without annoying the original collection enterer). Can you change the permission to that any user can add the relevant type locality (just as we can add holotype specimen numbers, authority information, etc.).

When the Downloader uses a taxon name, and that name is a homonym, the Data Service should ALWAYS default to the version that valid unless otherwise specified

intervals should sort by "lag" second

1.1 sorted by "eag" as specified in the order= argument, then by "lag" descending. 1.2 needs to do the same thing.

Add paleo coordinates filter

Specimens/Measurements download return needs to be modified

Currently, one can either download data for specimens OR measurements, but there needs to be a mashup of the two returns to make the measurement data useful. Basically, all of the columns in the Specimens return need to be added to the Measurements return to make it useful. The only way to link the two downloads currently is via the specimen_no parameter, which could be done after downloading each separately, either by hand or by script, but I'm sure that most users would like us to do that for them, so we should.

Perhaps we should allow users to do either one of the two returns we currently offer, as well as a mashup of the two. See the attached file for an example of a mashed up return.

PBDB Cetace OCBs.txt

Allow for searching by taxonomic rank through /taxa route

Something like this...

/data1.2/taxa/list.json?rank=family
/data1.2/taxa/list.json?rank_no=25

Currently rank can be used to further filter a query for a specific taxnomic group. What I'm looking for is not a filter, but rather being able to pull a list of all families/genera etc.

Gplates paleocoordinates are sometimes blank

The Gplates data service sometimes fails to return paleocoordinates. This is likely because the service doesn't have paleocoordinates for those modern coordinates.

I think our data service should do the following instead of returning nothing:

return the Scotese coordinates (appropriately labeled) instead of the Gplates coordinates
if neither Gplates nor Scotese have any paleocoordinates, we should return something like NaN

We need to expose State and County fields via the API

Using show=full in occs/list route gives warning

Example:

https://paleobiodb.org/data1.2/occs/list.json?base_name=Cetacea&interval=Miocene&show=full
warnings: [
"undefined output block '1.2:occs:subgenus'"
],

Downloader throws a 404 error

This request:
https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&base_name=Mammalia&cc=NOA&colls_authent_by=Alroy&private

Throws this error:
400 Bad Request
unknown parameter 'colls_authent_by'

API call to return a list of time scales only returns one of the time scales

An API call described on this page (https://paleobiodb.org/data1.2/intervals_doc.html) is supposed to return a list of time scales, but it only returns a single time scale. Part of the page in question is also found inside this Github repo at https://github.com/paleobiodb/pbdb-new/blob/master/doc/1.2/intervals_doc.tt

This is the API call = https://paleobiodb.org/data1.2/scales/list.json

http://fossilworks.org/bridge.pl?a=searchScale
is a website shows that there is more than one time scale used in pbdb. Therefore, I expect multiple time scales to return in the API call.

Country code field blank for Kazakhstan collections

Collections from the country of Kazakhstan have blank values in the cc field. Possibly related to the fact that the country was previously misspelled (as Kazakstan) in the drop-down (and is still misspelled in the fossilworks download form), but even newer collections from the country have blank country codes in API output.

Common name search

Autocomplete should be able to handle common name searches. This requires changes to both Navigator and the data service.

Missing file lat/long description file in documentation

https://paleobiodb.org/data1.2/doc/basis_precision_doc.html 404s

Note: The templates to generate this do see to be present.

Change the references output to bibjson

Can we have the /references json output format be specifically BibJSON?

Slow response on /occs/taxa.json

In Rockd we use the following query to produce a list of nearby taxa:

https://paleobiodb.org/data1.2/occs/taxa.json?lngmax=-89.92494618509072&latmin=42.69286975940797&lngmin=-88.8895713169537&latmax=43.4492378004429&interval=Cambrian&idreso=lump_genus&rank=genus&show=class,img,classext

For a long time it was very fast, but now the request usually times out before it can be completed. It also is worth noting that in the above query (which I'm assuming got cached by MariaDB) the API reported the "elapsed time" as 0.2 seconds, even though the response took around 7 seconds.

Any thoughts as to what might be causing this?

Ensure that meta-data can be unselected for all API routes

Even if you un-select the include metadata option in the download form, this path still returns the metadata header. Metadata should not be returned for any route if not selected as part of the data form.

https://paleobiodb.org/data1.2/occs/taxa.txt?rank=max_genus&interval=Phanerozoic&limit=100

Dead Link to API documentation in download form

The help section for bibliographic references, under the "reference type" selection menu of the download form has a dead link that returns a 404.

https://paleobiodb.org/data1.2/bibliographic_refs_doc.html#ref_type

Quick Search problem

Search for "Costa" in the quick search. No genus returned, but there is a genus record:

https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=24002

from Shanan

Is a DarwinCore Archive file available somewhere?

Would be great to have a DarwinCore rendering of the data. Hopefully it already exists somewhere?

Support for simultaneous response dump to local disk

accepted_attr field is blank in occurrences

The documentation says that show=attr is supposed to list the author/year of the accepted name, but that field (accepted_attr) is blank in the output.

For example, https://paleobiodb.org/data1.2/occs/list.txt?base_name=Transennatia&show=attr.

Break up downloads into 2 files: Data and metadata

There has been a request to add back to our system the recognition of who entered most of the data and a suggested citation for the data download.

Part of this may be taken care of with our new data archiving system, but it still would be a good idea to acknowledge data enterers.

I think the best place for this might be the metadata block in a data download. While the metadata is EXTREMELY important, the first thing I always do is cut it out and save it elsewhere, because it gets in the way of using the data as a spreadsheet.

I think the way to fix these two issues is to download 2 files, with paired names. While we are messing with the downloader, let's force users to pick a name for the file(s), then download 2 files data and metadata with the given name plus the tags data and metadata.

Data1.1 200 errors

A user (Jon Hill, University of York) reported via twitter that he got a 200 error message when using version 1.1 of the API (and later that he got a 502 error message). He's switched to 1.2 so it's not a pressing issue, but perhaps something to investigate? I was able to replicate it once by going to paleobiodb.org/data1.1 (got a 200 error via my browser), but then it worked fine the next time and subsequently is working for me.

Taxonomy problems

taxa/list, taxa/single do not properly filter on status

taxa/auto needs to:

indicate invalid names
provide containing taxon, i.e. class or phylum

taxa/list, taxa/single may need to report number of homonyms

children of invalid subgroups need to be made children of the valid parents

Dead link in documentation due to typo in URL

The link to the "quickdiv" page from the "diversity" page in the documentation goes instead to "quickdev", which of course does not exist.

Common name support in combined/auto

This is considered a VERY low priority. Originally entered on paleobiodb-changelog issue list on 2016-12-23.

Original issue was: Entering 'bivalve', 'ammonite', 'vertebrate' returns an error in classic download generator. The name 'vertebrate' did not match any name in the taxonomy table

Response from Andrew was: Those are common names, try bivalvia, ammonoidea, and vertebrata instead. Michael says he will add common name support eventually.

Pre-compute full diversity method for whole db

Now that the full diversity method is exposed in Navigator (probably going to go to production soon as no-one has yet found any bugs in it), it would be helpful to have a pre-computed result available to the API for that method on the whole database so that it doesn't have to recompute every time someone clicks through to that window without filtering.

warning but collections returned

https://paleobiodb.org/data1.2/colls/summary.json?lngmin=-99.11732179250912&lngmax=-73.66807321589076&latmin=38.52014589519641&latmax=45.67190505334318&level=3&lithology=sedimentary

Recent taxonomic names not in API

It appears that taxonomic names and opinions entered after late March (or thereabouts) aren't output by the API. For example, https://paleobiodb.org/classic/basicTaxonInfo?taxon_no=384026 vs. https://paleobiodb.org/data1.2/taxa/list.txt?id=384026. As best as I can tell, taxonomic names before that one are working fine, but ones after are not included.

Data service not finding a specific taxon during data retrieval

For some reason, the data service does not find the genus Urkudelphis from the Chattian of Ecuador in collection 190942 when using this query:

http://paleobiodb.org/data1.2/occs/taxa.csv?datainfo&rowcount&base_name=Cetacea&rank=genus&taxon_status=accepted&interval=Chattian&cc=SOA&private&show=full

And it should.

Extinction/origination rate calculator

Originally added to pbdb-changelog by @vjpsyverson on 2017-01-24

Choose and describe algorithms (@vjpsyverson)
Implement in API (@mmcclenn)
Make available via downloader (@vjpsyverson)
(optional) Add to Navigator (@vjpsyverson)

Return interval types in combined/auto

{"oid":"int:499","nam":"Crassicostatus","eag":171.60000,"lag":168.40000},
{"oid":"int:506","nam":"Crassicosta","eag":180.10000,"lag":175.60000},

Er...what? ,:-/

(data service call here)

Wrong image codes returned by occs/prevalence

As discussed here: Some of the image codes being returned by occs/prevalence are incorrect, specifically those for three of the subtaxa in the crinoid order Articulata. (They're showing pictures of articulate brachiopods instead, which is no longer an accepted name in the tree but probably still explains the error.) The problem is probably actually just incorrect entries in a table somewhere, but I don't know which one or how to fix it.

Can't access paleobiodb.org

Using either Chrome or Firefox on Windows got a security certificate error. From Firefox:

paleobiodb.org uses an invalid security certificate.

The certificate expired on 4/3/2016 4:59 PM. The current time is 4/3/2016 5:55 PM.

(Error code: sec_error_expired_certificate)

Create tables for specimen elements

Navigator Time Filters

time intervals are not showing up on navigator

Empty columns are not returned in json view of pbdb_occurrences

Hello,
Some time ago we received an issue for the R package (ropensci/paleobioDB#18) regarding the response to the json requests. By comparing with the txt or csv version, the json requests for pbdb_occurrences do not include the empty columns.
In the issue linked you may see it with
https://paleobiodb.org/data1.1/occs/list.txt?base_name=Dicellograptus&show=abund&limit=all and
https://paleobiodb.org/data1.1/occs/list.json?limit=all&base_name=Dicellograptus&show=abund&vocab=pbdb
where the json lacks the "reid_no","superceded","abund_value","abund_unit" contrary to the txt.

Is this a know / desired behavior? Although is not too problematic, it could potentially cause some issue if lets say a second request expects (or not expects) some given columns which do not come come (or do come) with a first request. Would be desirable to get even the empty columns for consistency.

Thanks!
Javier

Add GEOJSON support for collections and occurrences

Add GEOJSON as a downloadable format for collections and occurrences data.

Also, include some pointer on the basic download form to our paleocoordinate rotation service so users can make their own rotations.

Add response "cxi" back to /colls/summary in 1.2

The value "cxi" ("cx_int_no") in responses to /colls/summary is pretty vital for Navigator. Can you please add it back to 1.2?

discrepancy between returned early and late intervals if calling up occurrences vs taxon names

I have been using the PaleoDB to compile stratigraphic ranges for taxa and have discovered that it is possible for the early and late intervals returned for a taxon (when downloading taxonomic names) to comprise a different/longer amount of time than implied by the ages of the collections that taxon is listed in (when downloading occurrences). Here is an example:

http://www.paleobiodb.org/data1.2/occs/list.txt?base_name=Acidaspidina%20plana&show=class,time&idqual=certain

will return 4 records, all occurrences assigned to the Maduan, currently with max_ma of 501 and min_ma of 498.5 in the database.

In comparison:
http://www.paleobiodb.org/data1.2/taxa/list.txt?base_name=Acidaspidina%20plana&show=class,parent,app&rel=current

will return a record for the taxon with the expected max_ma (501) and min_ma (498.5) but with early and late intervals as Drumian and Guzhangian, respectively, presumably because the Drumian is 504.5 to 500.5 and Guzhangian is 500.5-497.0, and thus comprise the max and min ages from the occurrences.

But if I wanted to apply an updated/different age model to the returned early and late intervals, this would result in a longer stratigraphic range (essentially less precise) for this taxon than is known from the occurrences. In this case, the range would also be inaccurate as the Maduan is currently within the Paibian, so this taxon is actually younger than the Guzhangian (the age assignments in the PBDB for this regional stage are out-of-date, not a surprise since this is the Cambrian, but only compounds the problem and would be impossible to correct by someone downloading ranges via taxonomic names).

Add Publisher and place of publication to the references return

These two pieces of information are not returned, and are essential for a full citation of books and book chapters.

interval argument to occs/diversity does not accept input

Example: here returns error message:

"Warning:","unknown interval id 'ARRAY(0x7fcf5f838610)'"

Changes for navigator 1.2

In order to get Navigator using 1.2, the following changes need to be made to the data service:

fix strata/auto with limit

500 Server Error with specific large data request

This request fails with a 500 server error:

https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&taxon_reso=genus&max_ma=1000&min_ma=0&show=class,coords,env

It is not clear why.

duplicated ref numbers

Hi,
queries have been returning duplicated reference numbers.
best,
Sara

range through diversity metrics are all problematic

All of the diversity metrics fail to take into account whether or not taxa are exant or not. Even if there are no occurrences in the database, the calculation should assume there are Holocene occurrences if the taxon is marked as Exant. They don't appear to do this.

Autocomplete search by reference

It's desirable to be able to display only those collections cited in a particular reference.

Ideally, searching by author + year?

Recently-entered collections missing

It appears that collections entered since June 14th are not being output. Collection 194102 is the last to be included in the API, and 194103 and up don't get output, aren't shown on Navigator, etc.

paleobiodb / data_service Goto Github PK

data_service's People

Contributors

Stargazers

Watchers

data_service's Issues

Recommend Projects

Recommend Topics

Recommend Org