paleobiodb / data_service Goto Github PK
View Code? Open in Web Editor NEWThe PBDB Data Service, API and table/system maintenance scripts
License: Artistic License 2.0
The PBDB Data Service, API and table/system maintenance scripts
License: Artistic License 2.0
From Steve Wang ([email protected]:
I'm trying to download phylum- or class-level datasets from @PaleoDB, but I keep getting '500 Server Error' using either the API or Navigator. It seems to work for smaller taxa (Bryozoa), but not for larger ones (Mollusca). Is there a way to fix this problem, or a workaround?
Example: the following gives me a ‘500 Server error’:
but if I replace MOLLUSCA with BRYOZOA, it works fine. It seems to get stuck on larger taxa (with many genera).
I have been able to reproduce this problem.
The "Type locality (PaleoDB number)" field in the Add/Edit Names page for any taxon seems to be locked so that users can't enter values to the field. It seems to only be entered "internally" when a "n. gen." and/or "n. sp." tag was entered in a biotic list. Because secondary users can't edit other's biotic lists to add the tags, there's really no way for most users to enter values to this field (without annoying the original collection enterer). Can you change the permission to that any user can add the relevant type locality (just as we can add holotype specimen numbers, authority information, etc.).
1.1 sorted by "eag" as specified in the order= argument, then by "lag" descending. 1.2 needs to do the same thing.
Currently, one can either download data for specimens OR measurements, but there needs to be a mashup of the two returns to make the measurement data useful. Basically, all of the columns in the Specimens return need to be added to the Measurements return to make it useful. The only way to link the two downloads currently is via the specimen_no parameter, which could be done after downloading each separately, either by hand or by script, but I'm sure that most users would like us to do that for them, so we should.
Perhaps we should allow users to do either one of the two returns we currently offer, as well as a mashup of the two. See the attached file for an example of a mashed up return.
Something like this...
/data1.2/taxa/list.json?rank=family
/data1.2/taxa/list.json?rank_no=25
Currently rank can be used to further filter a query for a specific taxnomic group. What I'm looking for is not a filter, but rather being able to pull a list of all families/genera etc.
The Gplates data service sometimes fails to return paleocoordinates. This is likely because the service doesn't have paleocoordinates for those modern coordinates.
I think our data service should do the following instead of returning nothing:
Example:
https://paleobiodb.org/data1.2/occs/list.json?base_name=Cetacea&interval=Miocene&show=full
warnings: [
"undefined output block '1.2:occs:subgenus'"
],
This request:
https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&base_name=Mammalia&cc=NOA&colls_authent_by=Alroy&private
Throws this error:
400 Bad Request
unknown parameter 'colls_authent_by'
An API call described on this page (https://paleobiodb.org/data1.2/intervals_doc.html) is supposed to return a list of time scales, but it only returns a single time scale. Part of the page in question is also found inside this Github repo at https://github.com/paleobiodb/pbdb-new/blob/master/doc/1.2/intervals_doc.tt
This is the API call = https://paleobiodb.org/data1.2/scales/list.json
http://fossilworks.org/bridge.pl?a=searchScale
is a website shows that there is more than one time scale used in pbdb. Therefore, I expect multiple time scales to return in the API call.
Collections from the country of Kazakhstan have blank values in the cc field. Possibly related to the fact that the country was previously misspelled (as Kazakstan) in the drop-down (and is still misspelled in the fossilworks download form), but even newer collections from the country have blank country codes in API output.
Autocomplete should be able to handle common name searches. This requires changes to both Navigator and the data service.
https://paleobiodb.org/data1.2/doc/basis_precision_doc.html 404s
Note: The templates to generate this do see to be present.
Can we have the /references json output format be specifically BibJSON?
In Rockd we use the following query to produce a list of nearby taxa:
For a long time it was very fast, but now the request usually times out before it can be completed. It also is worth noting that in the above query (which I'm assuming got cached by MariaDB) the API reported the "elapsed time" as 0.2 seconds, even though the response took around 7 seconds.
Any thoughts as to what might be causing this?
Even if you un-select the include metadata option in the download form, this path still returns the metadata header. Metadata should not be returned for any route if not selected as part of the data form.
https://paleobiodb.org/data1.2/occs/taxa.txt?rank=max_genus&interval=Phanerozoic&limit=100
The help section for bibliographic references, under the "reference type" selection menu of the download form has a dead link that returns a 404.
https://paleobiodb.org/data1.2/bibliographic_refs_doc.html#ref_type
Search for "Costa" in the quick search. No genus returned, but there is a genus record:
https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=24002
from Shanan
Would be great to have a DarwinCore rendering of the data. Hopefully it already exists somewhere?
The documentation says that show=attr is supposed to list the author/year of the accepted name, but that field (accepted_attr) is blank in the output.
For example, https://paleobiodb.org/data1.2/occs/list.txt?base_name=Transennatia&show=attr.
There has been a request to add back to our system the recognition of who entered most of the data and a suggested citation for the data download.
Part of this may be taken care of with our new data archiving system, but it still would be a good idea to acknowledge data enterers.
I think the best place for this might be the metadata block in a data download. While the metadata is EXTREMELY important, the first thing I always do is cut it out and save it elsewhere, because it gets in the way of using the data as a spreadsheet.
I think the way to fix these two issues is to download 2 files, with paired names. While we are messing with the downloader, let's force users to pick a name for the file(s), then download 2 files data and metadata with the given name plus the tags data and metadata.
A user (Jon Hill, University of York) reported via twitter that he got a 200 error message when using version 1.1 of the API (and later that he got a 502 error message). He's switched to 1.2 so it's not a pressing issue, but perhaps something to investigate? I was able to replicate it once by going to paleobiodb.org/data1.1 (got a 200 error via my browser), but then it worked fine the next time and subsequently is working for me.
taxa/list, taxa/single do not properly filter on status
taxa/auto needs to:
taxa/list, taxa/single may need to report number of homonyms
children of invalid subgroups need to be made children of the valid parents
The link to the "quickdiv" page from the "diversity" page in the documentation goes instead to "quickdev", which of course does not exist.
This is considered a VERY low priority. Originally entered on paleobiodb-changelog issue list on 2016-12-23.
Original issue was: Entering 'bivalve', 'ammonite', 'vertebrate' returns an error in classic download generator. The name 'vertebrate' did not match any name in the taxonomy table
Response from Andrew was: Those are common names, try bivalvia, ammonoidea, and vertebrata instead. Michael says he will add common name support eventually.
Now that the full diversity method is exposed in Navigator (probably going to go to production soon as no-one has yet found any bugs in it), it would be helpful to have a pre-computed result available to the API for that method on the whole database so that it doesn't have to recompute every time someone clicks through to that window without filtering.
It appears that taxonomic names and opinions entered after late March (or thereabouts) aren't output by the API. For example, https://paleobiodb.org/classic/basicTaxonInfo?taxon_no=384026 vs. https://paleobiodb.org/data1.2/taxa/list.txt?id=384026. As best as I can tell, taxonomic names before that one are working fine, but ones after are not included.
For some reason, the data service does not find the genus Urkudelphis from the Chattian of Ecuador in collection 190942 when using this query:
And it should.
Originally added to pbdb-changelog by @vjpsyverson on 2017-01-24
Choose and describe algorithms (@vjpsyverson)
Implement in API (@mmcclenn)
Make available via downloader (@vjpsyverson)
(optional) Add to Navigator (@vjpsyverson)
{"oid":"int:499","nam":"Crassicostatus","eag":171.60000,"lag":168.40000},
{"oid":"int:506","nam":"Crassicosta","eag":180.10000,"lag":175.60000},
Er...what? ,:-/
(data service call here)
As discussed here: Some of the image codes being returned by occs/prevalence are incorrect, specifically those for three of the subtaxa in the crinoid order Articulata. (They're showing pictures of articulate brachiopods instead, which is no longer an accepted name in the tree but probably still explains the error.) The problem is probably actually just incorrect entries in a table somewhere, but I don't know which one or how to fix it.
Using either Chrome or Firefox on Windows got a security certificate error. From Firefox:
paleobiodb.org uses an invalid security certificate.
The certificate expired on 4/3/2016 4:59 PM. The current time is 4/3/2016 5:55 PM.
(Error code: sec_error_expired_certificate)
time intervals are not showing up on navigator
Hello,
Some time ago we received an issue for the R package (ropensci/paleobioDB#18) regarding the response to the json requests. By comparing with the txt
or csv
version, the json
requests for pbdb_occurrences
do not include the empty columns.
In the issue linked you may see it with
https://paleobiodb.org/data1.1/occs/list.txt?base_name=Dicellograptus&show=abund&limit=all
and
https://paleobiodb.org/data1.1/occs/list.json?limit=all&base_name=Dicellograptus&show=abund&vocab=pbdb
where the json
lacks the "reid_no","superceded","abund_value","abund_unit" contrary to the txt
.
Is this a know / desired behavior? Although is not too problematic, it could potentially cause some issue if lets say a second request expects (or not expects) some given columns which do not come come (or do come) with a first request. Would be desirable to get even the empty columns for consistency.
Thanks!
Javier
Add GEOJSON as a downloadable format for collections and occurrences data.
Also, include some pointer on the basic download form to our paleocoordinate rotation service so users can make their own rotations.
The value "cxi" ("cx_int_no") in responses to /colls/summary is pretty vital for Navigator. Can you please add it back to 1.2?
I have been using the PaleoDB to compile stratigraphic ranges for taxa and have discovered that it is possible for the early and late intervals returned for a taxon (when downloading taxonomic names) to comprise a different/longer amount of time than implied by the ages of the collections that taxon is listed in (when downloading occurrences). Here is an example:
will return 4 records, all occurrences assigned to the Maduan, currently with max_ma of 501 and min_ma of 498.5 in the database.
In comparison:
http://www.paleobiodb.org/data1.2/taxa/list.txt?base_name=Acidaspidina%20plana&show=class,parent,app&rel=current
will return a record for the taxon with the expected max_ma (501) and min_ma (498.5) but with early and late intervals as Drumian and Guzhangian, respectively, presumably because the Drumian is 504.5 to 500.5 and Guzhangian is 500.5-497.0, and thus comprise the max and min ages from the occurrences.
But if I wanted to apply an updated/different age model to the returned early and late intervals, this would result in a longer stratigraphic range (essentially less precise) for this taxon than is known from the occurrences. In this case, the range would also be inaccurate as the Maduan is currently within the Paibian, so this taxon is actually younger than the Guzhangian (the age assignments in the PBDB for this regional stage are out-of-date, not a surprise since this is the Cambrian, but only compounds the problem and would be impossible to correct by someone downloading ranges via taxonomic names).
These two pieces of information are not returned, and are essential for a full citation of books and book chapters.
Example: here returns error message:
"Warning:","unknown interval id 'ARRAY(0x7fcf5f838610)'"
In order to get Navigator using 1.2, the following changes need to be made to the data service:
This request fails with a 500 server error:
It is not clear why.
Hi,
queries have been returning duplicated reference numbers.
best,
Sara
All of the diversity metrics fail to take into account whether or not taxa are exant or not. Even if there are no occurrences in the database, the calculation should assume there are Holocene occurrences if the taxon is marked as Exant. They don't appear to do this.
It's desirable to be able to display only those collections cited in a particular reference.
Ideally, searching by author + year?
It appears that collections entered since June 14th are not being output. Collection 194102 is the last to be included in the API, and 194103 and up don't get output, aren't shown on Navigator, etc.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.