Code Monkey home page Code Monkey logo

data_service's People

Contributors

johnalroy avatar cambro avatar mmcclenn avatar jczaplew avatar

Stargazers

Struga avatar  avatar Dmitry Mozzherin avatar Evangelos Vlachos avatar Erdong avatar Jesús Pérez avatar David Gaitsgory avatar Nick Damiano avatar David Landa avatar Shane Loeffler avatar Phil Novack-Gottshall avatar David Bapst avatar

Watchers

 avatar James Cloos avatar Daven Quinn avatar David Bapst avatar  avatar  avatar Shane Loeffler avatar Phil Novack-Gottshall avatar Andy Fraass avatar Mark D. Uhen avatar Evangelos Vlachos avatar  avatar

data_service's Issues

Large downloads produce 500 error.

From Steve Wang ([email protected]:

I'm trying to download phylum- or class-level datasets from @PaleoDB, but I keep getting '500 Server Error' using either the API or Navigator. It seems to work for smaller taxa (Bryozoa), but not for larger ones (Mollusca). Is there a way to fix this problem, or a workaround?

Example: the following gives me a ‘500 Server error’:

https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&base_name=MOLLUSCA&taxon_reso=lump_genus&show=class,acconly

but if I replace MOLLUSCA with BRYOZOA, it works fine. It seems to get stuck on larger taxa (with many genera).

I have been able to reproduce this problem.

Change permissions for adding type locality in Add/Edit Names

The "Type locality (PaleoDB number)" field in the Add/Edit Names page for any taxon seems to be locked so that users can't enter values to the field. It seems to only be entered "internally" when a "n. gen." and/or "n. sp." tag was entered in a biotic list. Because secondary users can't edit other's biotic lists to add the tags, there's really no way for most users to enter values to this field (without annoying the original collection enterer). Can you change the permission to that any user can add the relevant type locality (just as we can add holotype specimen numbers, authority information, etc.).

Specimens/Measurements download return needs to be modified

Currently, one can either download data for specimens OR measurements, but there needs to be a mashup of the two returns to make the measurement data useful. Basically, all of the columns in the Specimens return need to be added to the Measurements return to make it useful. The only way to link the two downloads currently is via the specimen_no parameter, which could be done after downloading each separately, either by hand or by script, but I'm sure that most users would like us to do that for them, so we should.

Perhaps we should allow users to do either one of the two returns we currently offer, as well as a mashup of the two. See the attached file for an example of a mashed up return.

PBDB Cetace OCBs.txt

Allow for searching by taxonomic rank through /taxa route

Something like this...

/data1.2/taxa/list.json?rank=family
/data1.2/taxa/list.json?rank_no=25

Currently rank can be used to further filter a query for a specific taxnomic group. What I'm looking for is not a filter, but rather being able to pull a list of all families/genera etc.

Gplates paleocoordinates are sometimes blank

The Gplates data service sometimes fails to return paleocoordinates. This is likely because the service doesn't have paleocoordinates for those modern coordinates.

I think our data service should do the following instead of returning nothing:

  1. return the Scotese coordinates (appropriately labeled) instead of the Gplates coordinates
  2. if neither Gplates nor Scotese have any paleocoordinates, we should return something like NaN

API call to return a list of time scales only returns one of the time scales

An API call described on this page (https://paleobiodb.org/data1.2/intervals_doc.html) is supposed to return a list of time scales, but it only returns a single time scale. Part of the page in question is also found inside this Github repo at https://github.com/paleobiodb/pbdb-new/blob/master/doc/1.2/intervals_doc.tt

This is the API call = https://paleobiodb.org/data1.2/scales/list.json

http://fossilworks.org/bridge.pl?a=searchScale
is a website shows that there is more than one time scale used in pbdb. Therefore, I expect multiple time scales to return in the API call.

Country code field blank for Kazakhstan collections

Collections from the country of Kazakhstan have blank values in the cc field. Possibly related to the fact that the country was previously misspelled (as Kazakstan) in the drop-down (and is still misspelled in the fossilworks download form), but even newer collections from the country have blank country codes in API output.

Common name search

Autocomplete should be able to handle common name searches. This requires changes to both Navigator and the data service.

Slow response on /occs/taxa.json

In Rockd we use the following query to produce a list of nearby taxa:

https://paleobiodb.org/data1.2/occs/taxa.json?lngmax=-89.92494618509072&latmin=42.69286975940797&lngmin=-88.8895713169537&latmax=43.4492378004429&interval=Cambrian&idreso=lump_genus&rank=genus&show=class,img,classext

For a long time it was very fast, but now the request usually times out before it can be completed. It also is worth noting that in the above query (which I'm assuming got cached by MariaDB) the API reported the "elapsed time" as 0.2 seconds, even though the response took around 7 seconds.

Any thoughts as to what might be causing this?

Break up downloads into 2 files: Data and metadata

There has been a request to add back to our system the recognition of who entered most of the data and a suggested citation for the data download.

Part of this may be taken care of with our new data archiving system, but it still would be a good idea to acknowledge data enterers.

I think the best place for this might be the metadata block in a data download. While the metadata is EXTREMELY important, the first thing I always do is cut it out and save it elsewhere, because it gets in the way of using the data as a spreadsheet.

I think the way to fix these two issues is to download 2 files, with paired names. While we are messing with the downloader, let's force users to pick a name for the file(s), then download 2 files data and metadata with the given name plus the tags data and metadata.

Data1.1 200 errors

A user (Jon Hill, University of York) reported via twitter that he got a 200 error message when using version 1.1 of the API (and later that he got a 502 error message). He's switched to 1.2 so it's not a pressing issue, but perhaps something to investigate? I was able to replicate it once by going to paleobiodb.org/data1.1 (got a 200 error via my browser), but then it worked fine the next time and subsequently is working for me.

Taxonomy problems

taxa/list, taxa/single do not properly filter on status

taxa/auto needs to:

  • indicate invalid names
  • provide containing taxon, i.e. class or phylum

taxa/list, taxa/single may need to report number of homonyms

children of invalid subgroups need to be made children of the valid parents

Common name support in combined/auto

This is considered a VERY low priority. Originally entered on paleobiodb-changelog issue list on 2016-12-23.

Original issue was: Entering 'bivalve', 'ammonite', 'vertebrate' returns an error in classic download generator. The name 'vertebrate' did not match any name in the taxonomy table

Response from Andrew was: Those are common names, try bivalvia, ammonoidea, and vertebrata instead. Michael says he will add common name support eventually.

Pre-compute full diversity method for whole db

Now that the full diversity method is exposed in Navigator (probably going to go to production soon as no-one has yet found any bugs in it), it would be helpful to have a pre-computed result available to the API for that method on the whole database so that it doesn't have to recompute every time someone clicks through to that window without filtering.

Return interval types in combined/auto

{"oid":"int:499","nam":"Crassicostatus","eag":171.60000,"lag":168.40000},
{"oid":"int:506","nam":"Crassicosta","eag":180.10000,"lag":175.60000},

Er...what? ,:-/

(data service call here)

Wrong image codes returned by occs/prevalence

As discussed here: Some of the image codes being returned by occs/prevalence are incorrect, specifically those for three of the subtaxa in the crinoid order Articulata. (They're showing pictures of articulate brachiopods instead, which is no longer an accepted name in the tree but probably still explains the error.) The problem is probably actually just incorrect entries in a table somewhere, but I don't know which one or how to fix it.

Can't access paleobiodb.org

Using either Chrome or Firefox on Windows got a security certificate error. From Firefox:

paleobiodb.org uses an invalid security certificate.

The certificate expired on 4/3/2016 4:59 PM. The current time is 4/3/2016 5:55 PM.

(Error code: sec_error_expired_certificate)

Empty columns are not returned in json view of pbdb_occurrences

Hello,
Some time ago we received an issue for the R package (ropensci/paleobioDB#18) regarding the response to the json requests. By comparing with the txt or csv version, the json requests for pbdb_occurrences do not include the empty columns.
In the issue linked you may see it with
https://paleobiodb.org/data1.1/occs/list.txt?base_name=Dicellograptus&show=abund&limit=all and
https://paleobiodb.org/data1.1/occs/list.json?limit=all&base_name=Dicellograptus&show=abund&vocab=pbdb
where the json lacks the "reid_no","superceded","abund_value","abund_unit" contrary to the txt.

Is this a know / desired behavior? Although is not too problematic, it could potentially cause some issue if lets say a second request expects (or not expects) some given columns which do not come come (or do come) with a first request. Would be desirable to get even the empty columns for consistency.

Thanks!
Javier

Add GEOJSON support for collections and occurrences

Add GEOJSON as a downloadable format for collections and occurrences data.

Also, include some pointer on the basic download form to our paleocoordinate rotation service so users can make their own rotations.

discrepancy between returned early and late intervals if calling up occurrences vs taxon names

I have been using the PaleoDB to compile stratigraphic ranges for taxa and have discovered that it is possible for the early and late intervals returned for a taxon (when downloading taxonomic names) to comprise a different/longer amount of time than implied by the ages of the collections that taxon is listed in (when downloading occurrences). Here is an example:

http://www.paleobiodb.org/data1.2/occs/list.txt?base_name=Acidaspidina%20plana&show=class,time&idqual=certain

will return 4 records, all occurrences assigned to the Maduan, currently with max_ma of 501 and min_ma of 498.5 in the database.

In comparison:
http://www.paleobiodb.org/data1.2/taxa/list.txt?base_name=Acidaspidina%20plana&show=class,parent,app&rel=current

will return a record for the taxon with the expected max_ma (501) and min_ma (498.5) but with early and late intervals as Drumian and Guzhangian, respectively, presumably because the Drumian is 504.5 to 500.5 and Guzhangian is 500.5-497.0, and thus comprise the max and min ages from the occurrences.

But if I wanted to apply an updated/different age model to the returned early and late intervals, this would result in a longer stratigraphic range (essentially less precise) for this taxon than is known from the occurrences. In this case, the range would also be inaccurate as the Maduan is currently within the Paibian, so this taxon is actually younger than the Guzhangian (the age assignments in the PBDB for this regional stage are out-of-date, not a surprise since this is the Cambrian, but only compounds the problem and would be impossible to correct by someone downloading ranges via taxonomic names).

Changes for navigator 1.2

In order to get Navigator using 1.2, the following changes need to be made to the data service:

  • fix strata/auto with limit

range through diversity metrics are all problematic

All of the diversity metrics fail to take into account whether or not taxa are exant or not. Even if there are no occurrences in the database, the calculation should assume there are Holocene occurrences if the taxon is marked as Exant. They don't appear to do this.

Autocomplete search by reference

It's desirable to be able to display only those collections cited in a particular reference.

Ideally, searching by author + year?

Recently-entered collections missing

It appears that collections entered since June 14th are not being output. Collection 194102 is the last to be included in the API, and 194103 and up don't get output, aren't shown on Navigator, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.