okfn / dataportals.org Goto Github PK

View Code? Open in Web Editor NEW

126.0 45.0 80.0 1.78 MB

Open Data Portals and Sites around the world

Home Page: http://dataportals.org/

JavaScript 31.26% Python 7.13% CSS 9.73% HTML 48.86% Procfile 0.04% Dockerfile 2.98%

open-data open-datasets open-knowledge-international metadata csv json

dataportals.org's People

Stargazers

Watchers

Forkers

mhoneyman defvol hlainchb multimac59 pretty00butt wphackedhelp jgkim stephen-gates jrovegno sniderna harry-wood okfnscot stevage marks pieterjanpauwels papajo mmdolbow azolnai linkfar dongoginho conocimientoabierto todrobbins camillem diemesleno claudielarouche jessecrocker jbothma arkabide braicauc kordovero slowe margot-tb anuveyatsu nikeshbalami vcgato29 alexfriant augusto-herrmann chrisgorgo lhfei gaybro8777 warezaddict-com hoenie-ams data-equity appdulrahman jdukpa timwis dalavancloud stalbertgis roll stephenabbott fagan2888 mdheller goo160047co vvying forschung jwyg restuccia mahrd7 guilhermeiablo twitchsnitchdotcom brierjon arjun-gautam opentechcommunity afeld foxshakeitoff adriens ali-hasrat mikanebu fanirani101 0charliecat charlie-forked bcpi-data-and-insight pax luismossburger jv-ai mattiasaxell rodrigozaroni

dataportals.org's Issues

Switch to NodeJS

Going for Node over pure browser JS because:

Ability to mimic precisely existing site urls (no #! stuff ...)
SEO
Performance, lack of cross-browser worries etc

Admin workflow for adding/updating catalogs?

I'd like to add/update a few Irish catalogs. As an admin of the old CKAN-based site, I could have done that easily, so what's the workflow now? Should I still go through the Google Form, or submit a pull request with changes to /data/catalogs.csv, or…?

Support for reloading the database

At the moment to reload the database after a change requires doing heroku restart. Provide a url like admin/reload/ that will reload the spreadsheet DB on demand.

Decide metadata structure for new version of database (as of 2014)

We have been migrating over to new DB (google spreadsheet!)

Question is what fields should we have?

Outlining suggested fields in this spreadsheet ~~template sheet~~.

Questions

New? Add a maintainer and maintainer_url field for organization or person maintaining the portal.
Modify? license field - shall we normalize in some way, do we want a link to the license page or evidence on source website
Delete? delete metadata_modified and metadata_created? Latter is easy to support but former is hard with google spreadsheet DB
New? open boolean field to indicate whether the data in the portal is open or not
- could have an open_percent for mixed portals
New? date_launched date the portal launched
New/Change? A field like metadata_contributor to credit folks contributing info (very spotty in current data and not very systematic).
Change? Merge groups and tags - no reason to keep separate ...
New? generator - name of software that powers the portal

Make search box accessible from all pages

Search box is not available from
*http://datacatalogs.org/search ('browse'page)
*http://datacatalogs.org/catalog/[name] (individual record page)

To access the search box user must visit application root http://datacatalogs.org/

Consider moving search box into header (navbar navbar-fixed-top) - for consistent user experience and improved access to search.

Will attempt PR if agreed worthwhile.

Download in bulk instructions on about page

Geocode all existing catalogs in the DB

Have done some but more to do.

Can generally do this by getting the place field sorted and then using ImportXML plus nominatim - see http://schoolofdata.org/2013/02/19/geocoding-part-ii-geocoding-data-in-a-google-docs-spreadsheet/

Fields do not match input Google Sheet for one record

The values displayed on the front-end for this record http://datacatalogs.org/catalog/alabama
do not correctly map for Issued & License values
Issued = Active, License = US/en

Couldn't find another record with a similar issue.

Nice Map on Front Page

leaflet / recline map on front page
simple search

About page

copy over

No indication when there are no search results

When you type a search query that has no results there's no explicit indication that there are no results.

Perhaps this issue could be included as part of #16?

Deploy on Heroku and set up DNS

Follow https://github.com/okfn/datasets.okfnlabs.org#deployment-to-heroku

Deploy at http://new.datacatalogs.org for the present

Get current export of catacatalogs.org data and put in spreadsheet

Consider rename from catalogs to portals

Open data portals is perhaps a better and more appropriate naming than open data catalogs

Site url would go from datacatalogs.org => dataportals.org

Metadata v2.1

update database with new fields - field changes are listed in the spreadsheet here
- author -> Renamed to publisher
- groups -> Disappears, tags covers this
- license -> Renamed to license_id
- metadatamodified -> Disappears, too hard to maintain
- New fields:
  - license_url
  - issued
  - publisher_classification
  - generator
  - api_endpoint
  - api_type
  - full_metadata_download
update submission form with relevant fields
update webapp if appropriate

Note: the field 'license_open' is still up for discussion. We will now have scope to put a page explaining the situation for mixed licenses under license_url if 'license_id' cannot be filled in or is filled in with something ambiguous. The question is whether catalogs that don't have a single license ought to have license_open set to FALSE or empty.

A further discussion topic is whether we want to have scope for generating stats on the catalogs (e.g. number of datasets, or catalog last updated) and storing them in some way.

Consider adding API Endpoint to footer.html

Consider making it easier for others to discover API of catalogue by placing a hyperlink to http://datacatalogs.org/api/data.json .

Incorrect Portals Total

In the Google sheet (curated Tab) has 426 rows. As one row is a header, there should be 425 portals but the website shows 424.

Move database to CSV

Can be published CSV from Google, datapackage, or anything

Use of Inactive Tag?

A portal was reported as no longer being available. I changed the value in the Google sheet from active to inactive thinking this would hide the portal on DataPortals.org. This only changed the Status and the portal still appears in the website.

How should a portal be removed/hidden?

delete the row in the Google sheet
change the query to not select inactive sites
some other trick?

Create a map on the site

We can just copy and paste from the one in http://census.okfn.org

Add prominent LOD2 logo / reference

Migrate from current ckan instance to new setup (2014)

#3 - shut down submissions to old instance
- As this is not easy we may just hope no-one submits and then move this to e.g. old.datacatalogs.org
Pull data and import into gdoc (repeat process from last time)
- #24 - metadata structure for new DB
- #25 Do the migration of data
Switch sites - DNS etc

Shut down new submissions to current datacatalogs.org

Remove repetitive, uninformative wrench

I'm not sure what the wrench is doing next to all the catalogs. I’d remove it and move the list of tags in its place, which will shorten the length of the page.

Nice info boxes?

Do we want to do some pruning/structuring of the metadata inside the infoboxes on the map?

Agree and document workflow for moderation

Suggest:

Add moderated and moderatorcomments columns to DB
moderated = 1 means that item will not be shown on site
moderatorcomments are comments from moderator as to why it is moderated

Example: http://datacatalogs.org/catalog/eduinfo may be spam ...

Bulk import

How would one bulk submit data portals to this list?

Implement group list page

This URL doesn’t work:

http://new.datacatalogs.org/group

but this one does:

http://new.datacatalogs.org/group/canada

Model.js - internal catalog with load of info from spreadsheet via API

Search (and browse) page

Actual search (maybe use lunrjs)
Redirect (??) from /dataset to /search
Browse (just list)

Deploy new staging site at new.datacatalogs.org

Front page map improvements

Decent faceting (on what basis)
Improved filter/search

Enable notifications when new portal has been submitted

Enable notifications for Editors on dataportal.org

Create basic site using js (cf datacatalogs.js)

Migrate data from current ckan instance to new DB (spreadsheet) - Aug 2014

Document submission and moderation process

Clarify the data license

Although the website footer currently features an Open Data badge and the submission form specifies that all entries are licensed under CCZero and/or PDDL, the license is not clearly evident for people interested in using the raw data. This could be taken care of together with #33.

Create GDocs spreadsheet and agree fields

Also create a form for submitting new items

New data portal submission form

Suggest we use google forms

Editors can enter direct into the spreadsheet db

Remove duplicates from tags in DB

After the merge of groups + tags (done quickly with GSheets formula), there are duplicates. This is probably more easily dealt with with a quick python script once we move to CSV (Issue #35). Or we could add it to the model code and do it at (re)load.

Upgrade Recline

The used version includes an old Leaflet. Would be cool if you can upgrade to a commit after datopian/datahub#431

Becoming an Editor (was: Adding new open data portals)

Hi there,
Is anybody updating and adding new open data portals on the map? Ones submitted via Google Forms on dataportals.org.

Cheers!

Curation workflow

I notice that some of the info for the new metadata (Issue #34) was already being collected on the form (e.g. launch date and publisher type) whereas some of the old metadata (tags) was not being collected. What is the current workflow? Do we want a second sheet that draws on the responses sheet to at least reshuffle the form responses so its easier to copy/paste? Or do we rather want to encourage 90% of contributors to edit the DB directly and we can cope with the other 10% being a bit of manual work?

How do you edit an entry?

Brisbane City Council in Australia has updated its open data portal to https://www.data.brisbane.qld.gov.au. How do you edit an existing entry?

Search cannot find records when terms included a special character

Where a record term (e.g. its Title) includes a special character (e.g. comma, exclamation, period ) the search will not return the record.

Test cases include:

search for Ottawa, returns no results b/c 'Ottawa' always has comma after it in record

record can be found with other terms (e.g. search for ontario city feedback )

search for falls, returns no results b/c 'Falls' always has a comma after it in a record

record can be found with other search terms (e.g. search for Niagara)
search for EDINA and search for geo, both do not return Go-Geo! University of Edinburgh b/c EDINA always followed by comma, and Geo is followed by exclamation mark

Information box not processing markdown

Not sure if this is an error in the spreadsheet data or the information box.

Spreadsheet description column has cells with markdown. The information box does not process the markdown.

Could be added to #37

Catalog item page

Group pages

Redo front page (nodejs)

Custom Map Pin?

As discussed on the OK Discussion Forum, I'm exploring using a custom map pin on the DataPortals.org map and the map pin image as a favourite icon and apple touch icon. It seems possible to use a custom map pin - do you think we should make this change?

If so, I'm happy to work to sort out the design and provide the image files.

I will follow these requirements unless told otherwise: