Code Monkey home page Code Monkey logo

koop-socrata's Issues

Koop-Socrata handles 1mm row datasets

Concept

As a user I can access a performant feature service that is proxying: http://data.seattle.gov/resource/3k2p-39jp.json

Details

  • Requests required to move data from Socrata should not overwhelm hourly limit of 1,000 queries
  • Requests for 10,000 or more rows at once from Socrata should not clog the node process
  • Query performance should be < 2 seconds when the data is already cached
    • Count Only
    • Single page
    • OutStatistics

cc @astauffer

displayField should not be hard-coded to 'name'

See this service that doesn't have a field called name. If there is no name, default to the first string?

{
"currentVersion": 10.21,
"id": 0,
"name": "psp3-bvzw",
"type": "Feature Layer",
"displayField": "name",
"description": "",
"copyrightText": "",
"defaultVisibility": true,
"relationships": [
...

optimize socrata paging

We are seeing a massive slow down in socrata services that require paging of lots of data. One reason is that the current paging routing does not seem to be throttling its requests. It currently creates a list of pages to GET and immediately requests them. This is probably causing trouble on the server when we make 100 - 200 requests all at once for data.

The solution might be a scale it back a little bit and request pages a bit more slowly.

Adding Socrata token

In the documentation it says,

"2. Edit the default.json in your koop-app config to add

{
  "socrata": {
    "token": "your-app-token"
  }
}
```"

is this just appended to the default.json file? what exactly is the formatting? I get errors if I try to add this anywhere in default.js when I do npm start, but jslint says the formatting is valid. 

current code is as such with me trying to add in token: 

`{
  "server": {
    "port": 1337
  },
  "socrata": {
    "token": "my_token"
  },
  "data_dir": "/usr/local/koop/",
  "db": {
    "conn": "connection"
  }
}`

tests leave leftover logging/error messages

a successful run of tests results in two leftover files in the root of the repository.

..year-month-day and .error.year-month-day.

2015-08-24T19:49:10.641Z error Could not get rowCount. count::https://data.seattle.gov/resource/missing.json?$select=count(*)::404

2015-08-24T19:49:10.642Z error Could not get metadata. meta::https://data.seattle.gov/views/missing.json::404

2015-08-24T19:49:11.229Z error Could not get rowCount. Could not parse count JSON

2015-08-24T19:49:11.737Z error Could not get rowCount. count::https://data.seattle.gov/resource/filtered.json?$select=count(*)::500

2015-08-24T19:49:12.238Z error Could not get first row. first::https://data.seattle.gov/resource/countFail.json?$order=:id&$limit=1::500

a quick review seems to indicate we are logging behavior that the tests are provoking intentionally, so maybe it'd be best to just blow them away afterward?

Version 10 of node.js has been released

Version 10 of Node.js (code name Dubnium) has been released! 🎊

To see what happens to your code in Node.js 10, Greenkeeper has created a branch with the following changes:

  • Added the new Node.js version to your .travis.yml

If you’re interested in upgrading this repo to Node.js 10, you can open a PR with these changes. Please note that this issue is just intended as a friendly reminder and the PR as a possible starting point for getting your code running on Node.js 10.

More information on this issue

Greenkeeper has checked the engines key in any package.json file, the .nvmrc file, and the .travis.yml file, if present.

  • engines was only updated if it defined a single version, not a range.
  • .nvmrc was updated to Node.js 10
  • .travis.yml was only changed if there was a root-level node_js that didn’t already include Node.js 10, such as node or lts/*. In this case, the new version was appended to the list. We didn’t touch job or matrix configurations because these tend to be quite specific and complex, and it’s difficult to infer what the intentions were.

For many simpler .travis.yml configurations, this PR should suffice as-is, but depending on what you’re doing it may require additional work or may not be applicable at all. We’re also aware that you may have good reasons to not update to Node.js 10, which is why this was sent as an issue and not a pull request. Feel free to delete it without comment, I’m a humble robot and won’t feel rejected 🤖


FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Certain datasets from Socrata have incorrect spatial reference

Here is an example:
http://koop.dc.esri.com/socrata/wastate/f9h8-rtz6/FeatureServer

Notice:
spatialReference: {
wkid: 4326,
latestWkid: 4326
},
initialExtent: {
xmin: -13610908.050271627,
ymin: 6061419.6999203265,
xmax: -13605556.778112192,
ymax: 6066753.658368595,
spatialReference: {
wkid: 4326,
latestWkid: 4326
}
},

It appears these coordinates xmin, xmax etc. are web mercator coordinates (wkid:3857) but Koop is saying they are lat/long by setting the wkid to 4326. We need something that will test for the proper coordinate system.

Koop Not Handling Bad Values

FeatueServices in Koop are not responding when bad/out of range values are present in the Socrata table.

Example: http://detroitkoop-268609380.us-east-1.elb.amazonaws.com/socrata/detroitmi/uhnf-v2zs

  • Several rows in the location column have values like:
{"coordinates":[999998.9998,999999],"type":"Point",

The resulting featureService refuses to draw in a map: http://www.arcgis.com/home/webmap/viewer.html?url=http://detroitkoop-268609380.us-east-1.elb.amazonaws.com/socrata/detroitmi/uhnf-v2zs/featureserver/0

needs updating

This module is still including a default config in YAML and the tests are requiring mocha and koop-server, neither of which are listed as dev dependencies or dependencies. Even when mocha, koop-server, and yaml modules are installed, I get a segmentation fault when trying to run the test. Needs to be brought up to speed with the latest version of koop and made usable.

Koop-Socrata Cache Timer Not Expiring

The Koop Cache timer is not expiring.

Environment:

  • Ubuntu 14.04 EC2
  • PostgreSQL 9.3.9 (w/PostGIS)
  • Most recent build of loop-sample-app
  • Most recent install of koop-socrata

STR:

@dmfenton

Socrata Cache not Expiring

Source data: http://data.detroitmi.gov/resource/encd-2smf.json

  • Last-Modified header: "Mon, 13 Apr 2015 21:11:05 PDT"
  • Date header: "Tue, 14 Apr 2015 14:12:31 GMT"

Data already exists in Koop cache (postgre db)

  • Table: Socrata:encd-2smf:0
  • Entry in kooptimers table: id: Socrata:encd-2smf:timer; expires: '1428951734558' (Mon, 13 Apr 2015, 19:02:14 GMT)

New request initiated for: http://detroitkoop-268609380.us-east-1.elb.amazonaws.com/socrata/detroitmi/encd-2smf/featureserver/0

  • Koop does not drop expired cache, reports "old" data
  • kooptimer remains set as expired 1428951734558 value

Only workaround have found is to drop the corresponding koop table, delete the corresponding row from kooptimers, and make a new request.

Unable to install on Windows

Perhaps all files under test/fixtures have invalid names under Windows fs rules (colon ':' is not permitted).

An in-range update of request is breaking the build 🚨

☝️ Greenkeeper’s updated Terms of Service will come into effect on April 6th, 2018.

Version 2.84.0 of request was just published.

Branch Build failing 🚨
Dependency request
Current Version 2.83.0
Type dependency

This version is covered by your current version range and after updating it in your project the build failed.

request is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push The Travis CI build failed Details

Commits

The new version differs by 6 commits.

  • d77c839 Update changelog
  • 4b46a13 2.84.0
  • 0b807c6 Merge pull request #2793 from dvishniakov/2792-oauth_body_hash
  • cfd2307 Update hawk to 7.0.7 (#2880)
  • efeaf00 Fixed calculation of oauth_body_hash, issue #2792
  • 253c5e5 2.83.1

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Multiple Location Column Designator not Working

When working with a dataset with multiple location columns (https://data.detroitmi.gov/Property-Parcels/BSEED-Permits-Issued/n7kb-xdcs) the syntax /socrata/provider/socrataId!columnName no longer appears to work.

Environment:

  • Most recent build of koop-sample-app and koop-socrata in Ubuntu 14.04 EC2
  • PostGIS Cache

Calling Path: /socrata/detroitmi/n7kb-xdcs!site_location creates the following log in Node:

info: CREATE TABLE "Socrata:n7kb-xdcs:0" (id SERIAL PRIMARY KEY,feature JSON,geom Geometry(POINT, 4326),geohash varchar(10))
debug: Updating info Socrata:n7kb-xdcs:0 processing
info: Processing: https://data.detroitmi.gov/resource/n7kb-xdcs.json?$order=:id&$limit=50000&$offset=1
error: insert partial ERROR error: relation "Socrata:n7kb-xdcs!site_location:0" does not exist, Socrata:n7kb-xdcs!site_location
error: Failed while inserting a page of Socrata:n7kb-xdcs!site_location:0. error: relation "Socrata:n7kb-xdcs!site_location:0" does not exist
error: Could not get info of Socrata:n7kb-xdcs!site_location:0 Key Not Found Socrata:n7kb-xdcs!site_location:0
debug: Updating info Socrata:n7kb-xdcs!site_location:0 undefined
info: Finished paging Socrata:n7kb-xdcs!site_location:0
express deprecated res.send(body, status): Use res.status(status).send(body) instead node_modules/koop-socrata/controller/index.js:28:13

Table written to PostGIS is Socrata:n7kb-xdcs:0

Error in the log file

Occasionally, I am getting the following message on the log:

error: Error querying {"name":"error","length":210,"severity":"ERROR","code":"42883","hint":"No function matches the given name and argument types. You might need to add explicit type casts.","position":"72","file":"parse_func.c","line":"523","routine":"ParseFuncOrColumn","msg":"function st_geomfromgeojson(text) does not exist"}

This happens when making the following call (or similar):

http://localhost:1337/socrata/nola/hpm5-48nj

which gives me null in the response. What does it mean and how to deal with it.

Failing to register hosts

Summary:

I've cloned the sample app and I'm trying to pull down some Socrata data, but I'm unable to register a host. It seems like this is because Koop is trying to query a table that doesn't exist. When I use the local cache, I can add a host, but I can't get a dataset from that host.

Details:

When I perform a github/ query like http://localhost:1337/github/chelm/grunt-geo/forks it seems to work fine. A new table is created in my local postgres database the expected GeoJSON is returned.

When I query /socrata, I get the helpful suggestion to POST a host. But when I use the one-liner example in the README (curl --data "host=https://data.nola.gov&id=nola" localhost:1337/socrata) the following error is printed:

{"name":"error","length":107,"severity":"ERROR","code":"42P01","position":"31","file":"parse_relation.c","line":"986","routine":"parserOpenTable"}

To make sure there wasn't something weird going on with my shell, I made a little Python script to make the request. Same outcome.

Some Googling shows that 42P01 is a postgres error meaning "Table does not exist".

Then I tried disabling PGCache in the sample app's server.js so I could default to the in-memory cache. Now I can successfully add a host! (I added New Orleans as shown in the README.) Then I tried using the sample query on the README:

http://localhost:1337/socrata/nola/fwm6-d78i

This stack trace got printed to the console:

TypeError: Cannot read property 'info' of undefined at Object.module.exports.getInfo (/Users/willengler/Sandbox/koop-sample-app/node_modules/koop/lib/Local.js:62:35) at Cache.getInfo (/Users/willengler/Sandbox/koop-sample-app/node_modules/koop/lib/Cache.js:178:13) at /Users/willengler/Sandbox/koop-sample-app/node_modules/koop-socrata/models/Socrata.js:167:20 at /Users/willengler/Sandbox/koop-sample-app/node_modules/koop/lib/Cache.js:133:23 at Object.module.exports.select (/Users/willengler/Sandbox/koop-sample-app/node_modules/koop/lib/Local.js:34:7) at Cache.get (/Users/willengler/Sandbox/koop-sample-app/node_modules/koop/lib/Cache.js:132:13) at Object.socrata.getResource (/Users/willengler/Sandbox/koop-sample-app/node_modules/koop-socrata/models/Socrata.js:165:16) at /Users/willengler/Sandbox/koop-sample-app/node_modules/koop-socrata/controller/index.js:69:17 at /Users/willengler/Sandbox/koop-sample-app/node_modules/koop-socrata/models/Socrata.js:49:9 at Object.module.exports.serviceGet (/Users/willengler/Sandbox/koop-sample-app/node_modules/koop/lib/Local.js:159:7)

But then there was this hopeful message:

info: Processing: https://data.nola.gov/resource/fwm6-d78i.json?$order=:id&$limit=10000&$offset=1 info: Processing: https://data.nola.gov/resource/fwm6-d78i.json?$order=:id&$limit=10000&$offset=1 info: Beginning to page through https://data.nola.gov/resourcefwm6-d78i 1 Pages. info: Beginning to page through https://data.nola.gov/resourcefwm6-d78i 1 Pages. info: Finished paging Socrata:fwm6-d78i:0

Subsequent GETs to http://localhost:1337/socrata/nola/fwm6-d78igive me something like {"checked_at":"2015-09-28T17:38:39.725Z"} instead of the data I'm expecting.

Am I doing something wrong?

Config:

For reference, here's the contents of my default.json file:

{ "server": { "port": 1337 }, "data_dir": "/usr/local/koop/", "db": { "conn": "koop://localhost/koop_test" } }

I've deleted the https config file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.