Code Monkey home page Code Monkey logo

dataportals.org's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataportals.org's Issues

Switch to NodeJS

Going for Node over pure browser JS because:

  • Ability to mimic precisely existing site urls (no #! stuff ...)
  • SEO
  • Performance, lack of cross-browser worries etc

Admin workflow for adding/updating catalogs?

I'd like to add/update a few Irish catalogs. As an admin of the old CKAN-based site, I could have done that easily, so what's the workflow now? Should I still go through the Google Form, or submit a pull request with changes to /data/catalogs.csv, or…?

Support for reloading the database

At the moment to reload the database after a change requires doing heroku restart. Provide a url like admin/reload/ that will reload the spreadsheet DB on demand.

Decide metadata structure for new version of database (as of 2014)

We have been migrating over to new DB (google spreadsheet!)

Question is what fields should we have?

Outlining suggested fields in this spreadsheet template sheet.

Questions

  • New? Add a maintainer and maintainer_url field for organization or person maintaining the portal.
  • Modify? license field - shall we normalize in some way, do we want a link to the license page or evidence on source website
  • Delete? delete metadata_modified and metadata_created? Latter is easy to support but former is hard with google spreadsheet DB
  • New? open boolean field to indicate whether the data in the portal is open or not
    • could have an open_percent for mixed portals
  • New? date_launched date the portal launched
  • New/Change? A field like metadata_contributor to credit folks contributing info (very spotty in current data and not very systematic).
  • Change? Merge groups and tags - no reason to keep separate ...
  • New? generator - name of software that powers the portal

Metadata v2.1

  • update database with new fields - field changes are listed in the spreadsheet here
    • author -> Renamed to publisher
    • groups -> Disappears, tags covers this
    • license -> Renamed to license_id
    • metadatamodified -> Disappears, too hard to maintain
    • New fields:
      • license_url
      • issued
      • publisher_classification
      • generator
      • api_endpoint
      • api_type
      • full_metadata_download
  • update submission form with relevant fields
  • update webapp if appropriate

Note: the field 'license_open' is still up for discussion. We will now have scope to put a page explaining the situation for mixed licenses under license_url if 'license_id' cannot be filled in or is filled in with something ambiguous. The question is whether catalogs that don't have a single license ought to have license_open set to FALSE or empty.

A further discussion topic is whether we want to have scope for generating stats on the catalogs (e.g. number of datasets, or catalog last updated) and storing them in some way.

Incorrect Portals Total

In the Google sheet (curated Tab) has 426 rows. As one row is a header, there should be 425 portals but the website shows 424.

Use of Inactive Tag?

A portal was reported as no longer being available. I changed the value in the Google sheet from active to inactive thinking this would hide the portal on DataPortals.org. This only changed the Status and the portal still appears in the website.

How should a portal be removed/hidden?

  • delete the row in the Google sheet
  • change the query to not select inactive sites
  • some other trick?

screenshot 2015-08-17 19 59 04

Migrate from current ckan instance to new setup (2014)

  • #3 - shut down submissions to old instance
    • As this is not easy we may just hope no-one submits and then move this to e.g. old.datacatalogs.org
  • Pull data and import into gdoc (repeat process from last time)
    • #24 - metadata structure for new DB
    • #25 Do the migration of data
  • Switch sites - DNS etc

Remove repetitive, uninformative wrench

I'm not sure what the wrench is doing next to all the catalogs. I’d remove it and move the list of tags in its place, which will shorten the length of the page.

Nice info boxes?

Do we want to do some pruning/structuring of the metadata inside the infoboxes on the map?

Bulk import

How would one bulk submit data portals to this list?

Clarify the data license

Although the website footer currently features an Open Data badge and the submission form specifies that all entries are licensed under CCZero and/or PDDL, the license is not clearly evident for people interested in using the raw data. This could be taken care of together with #33.

Remove duplicates from tags in DB

After the merge of groups + tags (done quickly with GSheets formula), there are duplicates. This is probably more easily dealt with with a quick python script once we move to CSV (Issue #35). Or we could add it to the model code and do it at (re)load.

Curation workflow

I notice that some of the info for the new metadata (Issue #34) was already being collected on the form (e.g. launch date and publisher type) whereas some of the old metadata (tags) was not being collected. What is the current workflow? Do we want a second sheet that draws on the responses sheet to at least reshuffle the form responses so its easier to copy/paste? Or do we rather want to encourage 90% of contributors to edit the DB directly and we can cope with the other 10% being a bit of manual work?

Search cannot find records when terms included a special character

Where a record term (e.g. its Title) includes a special character (e.g. comma, exclamation, period ) the search will not return the record.

Test cases include:

  1. search for Ottawa, returns no results b/c 'Ottawa' always has comma after it in record

record can be found with other terms (e.g. search for ontario city feedback )

  1. search for falls, returns no results b/c 'Falls' always has a comma after it in a record

    record can be found with other search terms (e.g. search for Niagara)

  2. search for EDINA and search for geo, both do not return Go-Geo! University of Edinburgh b/c EDINA always followed by comma, and Geo is followed by exclamation mark

Information box not processing markdown

Not sure if this is an error in the spreadsheet data or the information box.

Spreadsheet description column has cells with markdown. The information box does not process the markdown.
screenshot 2015-08-21 20 58 14

Could be added to #37

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.