culturehack / data-tool Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 15.0 245 KB

A collection of cultural data sets and sources & a website to browse them.

License: MIT License

Ruby 17.23% CSS 22.10% HTML 60.67%

data-tool's People

Contributors

Stargazers

Watchers

Forkers

fionaroberto dracos emchateau george08 r4isstatic cooperhewitt micahwalter imclab skraphog barrynorton mpetyx jan-martinek carwash mashtheweb

data-tool's Issues

Status of copy

Obviously it's possible/likely that the copy describing each bit of data will be a bit of a moveable feast, but looking at the site now, I'm not sure which bits might have been placeholder text put in by Frankie and which are ones put in by you. This one is a case in point: http://data.culturehack.org.uk/dataset/37251027-Pepys-Diary All for a jolly tone of voice, but not sure about the "It's great" and the typo.

Redirect culturehack.org.uk/data to data.culturehack.org.uk

Currently http://www.culturehack.org.uk/data resolves to http://culturehack.org.uk/2012/06/16/data-sets-by-type/

it should probably be changed to resolve to
http://data.culturehack.org.uk

Assigning to @jamesjefferies for whenever he has a moment, is low priority

Rogue entry

http://data.culturehack.org.uk/dataset/99999999-Template-Entry

Choice of displayed licences

Creative Commons licences as mentioned on /about are not really suitable for data, they are licenses for content. See http://opendatacommons.org/licenses/ for some suitable data licenses, and http://opendatacommons.org/faq/licenses/#Why_Not_Use_a_Creative_Commons_or_FreeOpen_Source_Software_License_for_Databases for the explanation on CC. Or OSM's move from CC to ODbL: http://www.osmfoundation.org/wiki/License/We_Are_Changing_The_License#Why_are_we_changing_the_license.3F

Question - 50 entries v 200 entries

How many data set entries do we want / need?

Question for Rachel, really: would she prefer 50 really rich well described ones, or 200 less well described ones?

Caper strategy document mentions 200...

QUESTION: How to integrate editorial with data tool pages?

How do we cross reference between data tool and editorial on main CH site?

In order to pull WP posts through / evidence hacks, how do we best combine this with the data tool entry points?

Categories: setting them in code, extending them

Currently, our categories are (i think) defined in site.rb lines 6-14

  CATEGORIES = [
    'Art', 
    'Literature',
    'Music', 
    'Performance', 
    'Fashion', 
    'Media', 
    'History'
  ]

They're also then listed out in _prose.yml line 18 on

 - name: "categories"
        field:
          element: "multiselect"
          label: "Categories"
          options:
            - name: "Art"
              value: "Art"

QUESTIONS:

To add a new category, or rename an existing category, is it just a case of editing them in those two places?
Can Categories take spaces? If so, do they need to be surrounded by quotes in site.rb and any of the source.md files?

Give more context to Licensing labels

Is it possible to make the licensing info a click through to something? I'm not sure what PD means, for instance, on this page http://data.culturehack.org.uk/dataset/37251027-Pepys-Diary and there's no way of finding out.

Analytics

I forgot to add any analytics tracking to the site.

Probably best to use the same tracking account as the main Culture Hack site, I’d guess? (that way journeys between the two bits of the site could be tracked).

Website produced by Kim and Frankie

Can you take this out of the footer and add it to the About page pls?

FR iterating the UX sketches based on the documents sent on friday

Initial work done, needs proper data adding.

Copy/Paste Questionnaire text from _connect and send to KP / FR

the html file sent

requires RC to be signed in to the _connect site
when opened as a file remotely, just shows a load of JS errors... no content.

Need to copy-paste the actual page contents!

rewrite instructions and templates to be clearer about which bits are automatically pulled in etc

See the SOCH merge notes.

Question - Post Doc Researcher?

we're looking for a post-doc researcher to work with

YAML frontmatter: clarify media

In one file there is a media: data pair in the YAML frontmatter (it also appears as media: text in 37251018-British-Museum-object-catalog.md)

media: doesn't seem to be defined in _prose.yml

Q: What was media:? what were we going to do with it? Did we define a list of options?

ALSO

Am I correct in saying: Yaml is flexible and doesn't mind if you add additional values in there? So we could just make up fields on the fly?

Ability to search datasets by title / description

This would be pretty useful.

Currently unsure on approach.

Could import all the entries into postgres upon launch and use the postgres full text search feature. Has the advantage of built-in features like stemming and spelling correction. Disadvantages: another dependency, makes site more complicated to install, etc.

Alternative could implement some basic in-memory text searching. Wouldn't be too tricky to simply return matching results, but wouldn't be as sophisticated.

Ability to include 'sample data'

It'd be useful if you could include links to sample data files (eg CSV, JSON) on the dataset pages.

Could be hosted externally, or within the project.

Schedule meeting - strategy

schedule a meeting with the four of us soon (and James, if you're planning to be in London at any point?) as would be useful to discuss some of these as we consider next steps/develop the strategy.

Find out what's needed for the TSB Supplier's Report

Rachel needs to tell KP and FR what the requirements for the TSB report are

Research - single text files data descriptors?

Are there any existing methods of describing data / datasets etc with such a simple YAML format? Can we point to anything?

Related - how do we the integrate with other data sources in future - CKAN interoperability?

Add first published / update frequency information

Would be useful to have this on the dataset pages...

Smithsonian 3D showcase

https://twitter.com/GuWa/status/400647398302547969

http://3d.si.edu/

Not really the right place to put this, but it's pretty cool

Caper: check and sign SOW, provide terms/PO

Feature - text file creation button with Artisinal Integer call

Manually creating a text file is a bit of a PITA

We know that files will follow a standard template

The process would be roughly

visit artisinal integer site, get number
create text file with this filename
copy paste in template layout
do all data entry
append text file name with slug?

is this possible to script? It would make sense.
It might be possible to create a new file in github via the API...

Ability to filter by licence

Would be easy to add. Not sure how much of a priority this is though.

Left-hand category filter

The filtering is a bit confusing; my instinct is always to click through the top category list, and each time I find it a bit weird that more categories, rather than different categories, are being displayed. Could you change this to toggle through the list please, rather than add each category to the display? And then keep the small/medium/large as a filter on each category.

Build Failing - `parse': (<unknown>): found unexpected document indicator while scanning a quoted scalar at line 19 column 22 (Psych::SyntaxError)

Hello

Travis is helpfully telling me the most recent builds are failing, but not giving me tremendously useful feedback about why.

Build #62 was broken.    31 seconds
Kim Plowright   2140777 Changeset →
Add DigitalNZ, edit other Sources
    Cooper Hewiitt and Open Library with more data. Digital NZ apis added
    https://travis-ci.org/culturehack/data-tool/builds/13781983

    Build #63 is still failing.  33 seconds
Kim Plowright   08a08e5 Changeset →
correct empty yaml value
    Attempting to fix the Travis Error being thrown. Unclear *which* file is making it barf, other than that it's a line 19. This doc has an empty value at line 19 and is in the commit that made it barf. perhaps this is the culprit. (NB build is not failing for me locally)
    https://travis-ci.org/culturehack/data-tool/builds/13800862

Seems to be a parse error in site.rb -

`parse': (): found unexpected document indicator while scanning a quoted scalar at line 19 column 22 (Psych::SyntaxError)

other possibility - naming the test file 9999999999-whatereveritwas is the number causing the problem.

Any ideas? LMK what the solution is so I can fix for myself in future too!

Add cache headers

Should add some cache headers so that the pages can be stored in public proxy caches, for even speedier loading.

Suggest expiry of 10 mins?

Create initial proof-of-concept views

Need to write some code that takes the data sources and creates webpages for each of them.

Add automated indicator in copy for How Many Items in the Database?

In the copy box at the top, it would be useful to indicate how many bits of data are in the database at any one time. How much of a faff would it be to automate this?

Empty categories

Obviously I don't expect every category to be populated, but filtering "art" by "small" and "medium" returns 0 entries, which might be seen to look at bit bad at launch, as it's the first set of filter. Is it possible to pop something in here for cosmetic purposes pls?!

Explore open data about arts and culture, and the creative things people have done with it. Find out more →

Previous version you were checking with Katy:

Culture Hack Data is a simple way to explore open data about arts and culture, and the creative things people do with it. To get started, search or filter our list of data sources using the categories to the left.
Find Out More →

Suggest we can reword that slightly

Culture Hack Data is a simple way to explore open data about arts and culture, and the creative things people do with it. Search or filter our list of XX data sources, or contribute a new entry
Find Out More →