rufuspollock-okfn / data.okfn.org-new Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 6.0 924 KB

Simple data package based data portal (and original site for frictionless data effort)

Home Page: http://data.okfn.org/

JavaScript 33.53% CSS 11.06% HTML 55.42%

data.okfn.org-new's People

Contributors

Stargazers

Watchers

Forkers

gregoryrhysevans zelima mikanebu ashepherd stephenabbott mr-vara

data.okfn.org-new's Issues

Rework site to reflect becoming just a registry

Move /data/ to / (?)
Get rid of everything not related to the registry
...

mapquest developer tiles

when rendered here: http://data.okfn.org/tools/view?url=https%3A%2F%2Fgithub.com%2Fdatasets%2Fex-geojson#resource-example
the tiles do not show up due to restrictions from mapquest developer api. screenshot attached.

Display field constraints in field table

Add a column to the field information table to display field constraints to provide additional human-readable metadata.

Recline graph specs to vega view spec code

I think we focus on doing this backend with nodejs before code goes to frontend
Should be very simple to test
Support all recline graph options (see the recline flot graph documentation) - line graph, bar chart (both ways)
- Multiple series (multiple lines)

How do we handle inlining the "resource" data - do we want to use datapackage-render-js?

Track downloads

From @rgrp on September 13, 2014 12:47

Copied from original issue: frictionlessdata/frictionlessdata.io#129

Deprecate core-list.txt to the benefit of core-list.csv

See datasets/awesome-data#159

From: frictionlessdata/frictionlessdata.io#208

Two parts:

Support csv as source list rather than txt
Switch the default link
Then ultimately we could txt file in the datasets/registry repo

Fix issue with API where data is not being served (breaking graphs etc)

e.g. this errors: http://data.okfn.org/data/core/finance-vix/r/vix-daily.csv

@pwalsh need access to heroku backend to debug and fix ...

Fix anchor jump

@julmot:

In this commit we're referencing the following URL:

http://data.okfn.org/data/core/language-codes#resource-ietf-language-tags

As you can see it contains an anchor that should normally jump directly to the specified section. Unfortunately, when opening this URL, it shows a different section.

Display "last commit" date for each package

From @bluechi on April 26, 2015 20:40

Originally posted here
datasets/awesome-data#82

It would be helpful if the "last commit" date was displayed somewhere on the package view page and/or the main "Find Data Packages" page so that users can see when this package was last updated. Maybe even sort the packages according to the most recent commit.

Copied from original issue: frictionlessdata/frictionlessdata.io#175

Data package viewer link to metadata points to wrong direction

From @mihi-tr on October 10, 2013 12:51

The data package viewer link to metadata is hard-wired to point to http://data.okfn.org not where the json actually comes from.

Copied from original issue: frictionlessdata/frictionlessdata.io#79

¨Data¨ page layout is messed up

Under Country and Regional Analyses (CRA) - UK Government Finances and Hotline SOS Démocratie 2013. in the *Community Datasets list, there are two gigantic lists of sources that should be reduced for the sake of design and page layout.

Looking at it with more attention, it seems both packages have been poorly packed. Even though this is the community part, I think we have to do something about it, as it makes the website unreadable.

http://prntscr.com/b884pm

http://prntscr.com/b884zk

Nice Owner/User page in data section

From @rgrp on June 7, 2014 14:50

Pages like: /data/{username}

Suggest

Pull their gravatar and name (from github API)
- Do we want to cache this?
List their data packages (as per normal list)
- Bonus for allowing quick filtering a la the main listing page

Note that "core" user will be special ... (will need to set their gravatar specialy)

User Stories

As a Data User I want to see all the Data Packages produced by a particular organization or user so that I can find new ones that are relevant to me

Especially, useful for core datasets
List all datasets (no pagination for now!)

As a Data User I want to see all the Data Package produced by a particular organization or user so that I can get a sense of the quality of their work

Copied from original issue: frictionlessdata/frictionlessdata.io#111

Normalize licenses and license names and display in dataset view

From @rgrp on July 3, 2013 9:6

At the moment not clear exactly what is required for licenses and some of the time we just have ids and other times names and urls. We want to ensure given an id we always have a name and url - we could look this up from licenses.opendefinition.org ...

In terms of the interface we want to also handle the unknown case (should that ever happen!!)

This would be part of the tools datapackage normalize code.

Copied from original issue: frictionlessdata/frictionlessdata.io#55

Datasets Data URLs and API generally

From @rgrp on February 24, 2013 18:26

This issue is about the URL / API structure for accessing data (and metadata) from the data packages.

Current Situation

For stuff under /data/: /data/{dataset}/datapackage.json and /data/{dataset}.csv
For other stuff either at /tools/view/ or /community/ via: http://data.okfn.org/tools/dataproxy/?url={path-to-csv} (though this is not much different from datapipes.okfnlabs.org/csv/raw/?url=.... and leaves much to be desired)

Proposal

/data/ + /community/ data packages

For /data/ and /community/ data packages:

/.../{dataset}/datapackage.json     # the datapackage.json file

## data urls
/.../{dataset}/r/{resource-name-or-order}.{format}  

so e.g.

/.../gdp/r/annual.csv   # resource name
/.../gdp/r/0.csv           # resource by index

Formats that we should support would be:

{format} = csv | json | html | raw (by default)
{resource-name} = name as in resources entry. (Also allow order e.g. 1 for first resource, 2 for second resource etc).

Addressing individual elements

Longer-term we could support addressing individual elements e.g. addressing into rows in a dataset or :

.../gdp/r/annual/5/        # row 5 of this dataset, rendered as HTML by default
.../gdp/r/annual/5.csv  # in CSV format
.../gdp/r/annual/5/year/  # cell in row 5, field year (in HTML form by default)

.../{dataset}/r/{resource-name-or-index}/{row-index-or-primary-key}[.html | .csv | .json]
.../{dataset}/r/{resource-name-or-index}/{row-index}/{field-name-or-index}[.html | .csv | .json]

Questions:

How do distinguish row index from primary key when both numerical (which takes precedence?) - i'd argue PK should take precedence and we have e.g. i:{number}
- That said index is always possible whereas primary key may be absent ...
Support for ranges - see approach to this in datapipes

Data packages somewhere online

We follow something similar to the other case but instead of data package name in the url we move the data package url to the query string:

/api/datapackage.json?url={datapackage-url}
/api/data/{resource-name-or-index}.{format}?{datapackage-url}

# e.g. this returns first resource as CSV
/api/data/0.csv?url=https://raw.github.com/datasets/browser-stats/master/datapackage.json

Discussion

data.json is the serialization in the most obvious way - i.e. convert to a hash
- alternative provide this in a results style format (and include the schema)
Should we use download attribute to set filename ...?
- Not needed in above
~~(Now supported) How do we handle multiple data resources / files?~~
- ~~worry about that in the future - so only support first resource for the moment (this is good as it privileges single resource data packages ...)~~

Appendix

Alternatives

Alternatively could be:

{dataset}/{filename}.csv
{dataset}/{filename}.json (CORS enabled ...)

{dataset}/data.csv

Think the former is better ...

Copied from original issue: frictionlessdata/frictionlessdata.io#19

Show contributors / maintainers / publisher etc in sidebar

From @rgrp on May 18, 2014 18:52

Copied from original issue: frictionlessdata/frictionlessdata.io#105

Absolutely positioned "Improve This Page"

https://twitter.com/BobData/status/781573451588964352

Start on some tests (focus on data package view page)

Test that a view shows up
Test there is some content
Test that tables show up

Include Issues in dataset page

From @rgrp on January 13, 2013 14:27

Simplest is link to github issues

Improvements:

Show number of issues
Link for new issue (which states correct which file we are on for datasets with multiple files ...)
javascript popup of the issues ...

Copied from original issue: frictionlessdata/frictionlessdata.io#10

Show recent changes to dataset (data package) on data view page

From @rgrp on July 28, 2013 8:18

As a Visitor I want to see what changes there have been to this data package recently so that I get a sense of its status, up to dateness etc

Cost: 4h

Implementation

Pull info from github changesets and display
- do we do this with JS or ...?
Show in RHS sidebar (?)

Copied from original issue: frictionlessdata/frictionlessdata.io#57

JSON data link for non-core datasets is broken

From @rgrp on February 8, 2015 12:53

e.g. http://data.okfn.org/data/core/occupations/r/occupations.csv.json

Copied from original issue: frictionlessdata/frictionlessdata.io#150

Fixes for vega time series views ...

@zelima to fill this in ...

Suggest you work on this even in a separate small repo with some data and some vega views hand constructed ...

[docs] Add examples of "core" data that needs cleaning up to slidedecks

From @rgrp on August 16, 2014 11:38

Add this to front page and core datasets slide decks (http://data.okfn.org/roadmap/core-datasets)

Examples

Data needs cleaning - CO2 data datasets/awesome-data#56
Lack of type info - GDP data (?)

Copied from original issue: frictionlessdata/frictionlessdata.io#121

Replace Recline with Vega views

vega => html working
recline graph spec => vega(-lite)
- Keep recline
Replace "recline" bootup code in https://github.com/okfn/data.okfn.org-new/blob/master/public/js/catalog.js#L27 with new vega bootup code
- TODO: what about the slickgrid table (???)
All done

Recline spec to Vega

      "id": "Graph",
      "type": "Graph",
      "state": {
        "graphType": "lines",
        "group": "Date",
        "series": [ "VIXClose" ]
      }

=> Vega(-lite)

      "id": "Graph",
      "type": "vega-lite",
      "spec": {
         // appropriate vega spec here ... e.g. for example above a line graph ...
      }

Website down

Hello! :) Is the tool expected to be down? http://data.okfn.org/tools/view

I was linking to this tool in a few places, and just curious whether it's coming back. Thanks!

Site supports vega views

Given a data package with a vega view in a data package it renders correctly

Data version API

From @trickvi on June 20, 2013 12:37

I would like to be able to use data.okfn.org as an intermediary between my software and the data packages it uses and be able to quickly check whether there's a new version available of the data (e.g. if I've cached the package on a local machine).

There are ways to do it with the current setup:

Download the datapackage.json descriptor file, parse it and get the version there and check it against my local version. Problems:
- This solution relies on humans and that they update their version but there might not be any consistency in it since the data package standard describes the version attribute as: "a version string conforming to the Semantic Versioning requirement"
- I have to fetch the whole datapackage.json (it's not big I know but why download all that extra data I might not even want)
Go around data.okfn.org and look directly at the github repository. Problems:
- I have to find out where the repo is, use git and do a lot of extra stuff (I don't care how the data packages are stored, I just want a simple interface to fetch them)
- What would be the point of data.okfn.org/data? In my mind it collects data packages and provides a consistent interface to get the data packages irrespective of how its stored.

I propose data.okfn.org provides an internal system to allow users to quickly check whether a new version might be released. This does not have to be an API. We could leverage HTTP's caching mechanism using an ETag header that would contain some hash value. This hash value can e.g. be the the sha value of heads ref objects served via the Github API:

https://api.github.com/repos/datasets/cpi/git/refs/heads/master

Software that works with data packages could then implement a caching strategy and just send a request with an If-None-Match header along with a GET request for datapackage.json to either get a new version of the descriptor (and look at the version in that file) or just serve the data from its cache.

Copied from original issue: frictionlessdata/frictionlessdata.io#51

View not loading

The view is not loading. "Loading View" is rotating forever..
Please check here

I would like to fix this issue, but I see less infomation to contribute. And also the default configuration is broken as the URLs are not found https://raw.githubusercontent.com/datasets/registry/master/catalog-list.txt and https://raw.githubusercontent.com/datasets/registry/master/datapackage-list.txt

Create dataset groups (tagging)

I'd like to have all datasets relating to an argument grouped together (eg: all datasets related to maritime transportation).
This could be done by adding tags as keywords in the datapackage.json and have pages on the website which can load list of datasets by keyword.
Example: http://data.okfn.org/data/keywords/container would load all datasets related to containers (container codes, IMDG code, and so on)

Replace (recline) slickgrid grids with handsontable grids

Can we test this code in some way - sinon and/or mocha ...