Code Monkey home page Code Monkey logo

openrefine-socialsci's Introduction

Create a Slack Account with us Slack Status DOI

OpenRefine for Social Sciences

This is a Data Carpentry lesson on OpenRefine for social scientists. Please see https://datacarpentry.org/openrefine-socialsci/ for a rendered version of this lesson.

This is an introduction to OpenRefine designed for participants with no previous experience. This lesson can be taught in ~ 2 hours, excluding setup. The episodes in this lesson cover introductory topics related to using OpenRefine.

The Instructor View shows the lesson contents with extra information that is useful when teaching this lesson.

Contributing

We welcome all contributions to improve the lesson! The maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way.

We'd like to ask you to familiarize yourself with our Contribution Guide and have a look at the more detailed guidelines on using formatting, ways to render the lesson locally, and even how to write new episodes.

Please see the current list of issues for ideas for contributing to this lesson. For making your contribution, we use the GitHub flow. Look for the tag good_first_issue. This indicates that the maintainers will welcome a pull request fixing this issue.

Making changes to the contents

Please read Contributing before starting the work. This section and the next are only a very brief introduction to providing changes.

This lesson website is built from Markdown files using The Workbench, a set of tools that check and convert the source files into a good-looking website. The episodes that make up this lesson are in the episodes directory.

Learn how to update lesson contents in The Workbench documentation.

If you want to create a pull request (PR) with changes in any of the episodes or other Markdown files, it helps if you can preview the results of your changes before you submit the PR. This is explained in the next section. Previewing is not required. If you submit your PR, automated workflows will run and a bot will inform you about the results.

Previewing the lesson on your computer

This is helpful for submitting a pull request, but not required.

Previewing the lesson on your computer requires that you install The Workbench tools. Please see the instructions for setting up The Workbench on your computer.

After setting up, see Previewing Your New Lesson to learn how to preview your changes.

Maintainers

The current maintainers of this lesson are:

They can usually be reached in our Slack channel and through issues in the GitHub repository.

openrefine-socialsci's People

Contributors

angela-li avatar annajiat avatar antonyni avatar bencomp avatar bkmgit avatar brownsarahm avatar cforgaci avatar erinbecker avatar evanwill avatar fmichonneau avatar froggleston avatar gtlaflair avatar jas58 avatar jcohen02 avatar k05lowi avatar karenword avatar lachlandeer avatar lucia-michielin avatar m-macaskill avatar maneesha avatar mariadelmarq avatar mattforshaw avatar mlandryacenet avatar petersmyth12 avatar rabeamue avatar saross avatar shilowil avatar tobyhodges avatar tracykteal avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openrefine-socialsci's Issues

Consistent use of 'facet' and 'filter'; script exercise missing; overlap in introduction and other resources

Hi folks 👋

I'm submitting a few suggestions for this lesson, as part of the checkout process:

Filtering & sorting episode:

  • Filtering section, step 1 - refers to a filter as a facet ("A respondent_roof_type facet"). I think learners would find this confusing as filters & facets are treated as separate things within the OpenRefine UI, even though they appear in the same panel. (Please ignore if I am wrong and filters are officially just a type of facet - but there's nothing in the lesson to suggest this)
  • Sort section, exercise - the field name is spelled incorrectly within the solution ("gps:Altitude" should be "gps_Altitude")

Examining numbers episode:

  • Numbers section, paragraph 2 - again, refers to a filter as a facet ("be sure to remove any text filter facets").
  • Numbers section, exercixe - column name is listed as "no_members" when it is actually "no_membrs"

Scripts episode:

  • Overview - it says "exercises 10 min" but there are no exercises indicated for this lesson.
  • Saving you work as a script section - bullet "1" appears twice (before and after the screenshot)

Other resources episode:

  • There is conceptually some similarity between this episode and the material in the Introduction episode (Getting help for OpenRefine section). I think the Other resources episode would benefit from including the information about the user community that can provide support and answer questions. This could be accomplished either by adding more info to the Other resources episode or by linking back to the Introduction episode. But it seems incomplete to refer to other learning resources without also mentioning the community resources.

Hope some of these are helpful. Thanks!

Appearance of 'Create Project' window

When running this lesson today using Open Refine v3.1 we found that the appearance of the 'Create Project' screen is slightly different to the one pictured in the notes under 'Creating a New Open Refine Project'.

The option at the bottom right for 'Quotation marks are used to enclose cells containing column separators' no longer appears (see screenshot from v3.1). A learner noticed this and was distracted by it, so updating the image could reduce this distraction.
OpenRefine v3 1

Happy to lodge a PR, but I'm new to OR so not sure if this option is missing due to a setting or version or something else.

Rearrange lesson on exporting and saving

Hi, while teaching the workshop, I realized that exporting and saving lesson goes over exporting the project first and then saving the changed file. I found it useful teach saving the file first and then exporting the project. This strategy helped with the lesson flow. This might be useful to others who are teaching OpenRefine ✌️

Discussion: Rearrange content across modules for balance and flow

This issue is a follow-up to #90, of which the other points made have been addressed.

The submitter (not the issue opener) said:

Personally, I think the class may be reorganized. Currently, there are a lot of information in "working with OpenRefine" section. For example, Transforming data using GREL could be moved to the "examine numbers" section. That part is also doing data transformation. I would start by using the common transfer, then introducing GREL. GREL might be a little bit tricky for beginners, so I would not introduce in the earlier part of the class.

I would like to try to rearrange various sections within their module or across modules. Other issues mention the order of content too, like #84 and #95.
This issue will be updated with more specific ideas.

OpenRefine doesn't appear to work in Internet Explorer

For several learners, OpenRefine started Internet Explorer, but they got a spinny-wheel but nothing more.
Copying and pasting the URL into Chrome worked for everyone.

Perhaps there could be a note (extras/instructor notes?) about this?

Typos and organization issues

As part of my checkout process, I had a look at the "OpenRefine for Social Science Data" course (https://datacarpentry.org/openrefine-socialsci/). In the following you find some small comments and suggestions:

  1. Wrong file name in 'Working with OpenRefine'
    In the fifth paragraph of 'Creationg a new OpenRefine project', the file name of the example dataset to work with in the course is given: '2. Click "Choose Files" and select the file "SAFI_messy_openrefine.csv"'.
    The file name is not correct. The file to be used is called 'SAFI_openrefine.csv'.
  2. Add a few words about the layout in the 'Working with Open Refine' chapter
    So, I am aware that in the instructions on what to contribute and what not to contribute as part of the checkout process, you write that we are not supposed to introduce new concepts etc., as the courses are already quite packed with contents. However, in the chapter 'Working with Open Refine', I find the transition from subchapter on 'Creating a new OpenRefine project' to 'Using Facets' a little abrupt. I think it would be helpful to add a little information about the basic structure of the layout of OR. This would not have to be very extensive and maybe existing content from the Library Carpentry course on OR could be used. In the chapter on 'Layout of OpenRefine, Rows vs Records' (https://librarycarpentry.org/lc-open-refine/03-working-with-data/index.html), there are two subchapters that cover this quite well, in my opinion, and are not very extensive: 'The layout of OpenRefine' and 'Working with data in OpenRefine'. So my suggestion would be to add a small section on 'The layout of OpenRefine' between 'Creating a new OpenRefine project' and 'Using Facets' based on the content of the said two Library Carpentry subchapters/sections. I think this content could be covered in 2-3 minutes of teaching time and would be particularly helpful for participants who don't have any previous experience with OR at.
  3. Typo in "Filtering and Sorting with OpenRefine"
    In the exercise in the "Sorting by multiple columns section", there's a typo in 2.: "gps_Lattitude" (instead of "gps_Latitude").
  4. "unanswered" question in 'Examining numbers in OpenRefine'?
    The 'Overview' section of this chapter contains the question 'How can we visualize relationships among columns?', but I don't think that this question is really answered by the current content.

Better sorting order for fixing village "49"

In the lesson "Filtering and Sorting" in the section "Sorting by multiple columns." the suggested order of sorting steps

  1. Sort on gps_Longitude as a number with the largest first.
  2. Add a sort on gps_Lattitude as a number with the largest first.

result in an unclear grouping of entries for the purpose of determining the correct value for the village "49". See also this screenshot.

I suggest two improvements here:

  1. reverse the order by first sorting on gps_Latitude, then on Longitude: This will clearly put the "49" entry between other "Choridzo" entries
  2. use "smallest first" as sorting order, so that the "49" entry will appear among the first 50 entries, which will prevent the users from having to browse through several pages of the dataset for finding that entry.

resulting display

Introduce menu for column reordering

This suggested deletion of content line 229 is part of the instructor training checkout process. I know the lesson is already packed, but as the dataset is so big please consider introducing the drop down menu under 'All' and introduce how to reorder the rows. I'll happily help with this addition.

Originally posted by @K05lowi in #82 (comment)

Add more rows to data set; discuss reconciliation, GREL, extensions/packages

The Social Sciences CAC ([email protected]) met June 15th and 19th to discuss the full Social Sciences curriculum and provide recommendations to the Maintainers about work for these lessons between now and their publication (September 2018). Their specific action items for this lesson are as follows:

  • Getting more rows of the SAFI data to use in the lessons
  • Incorporating data reconciliation, packages, and more GREL into the OpenRefine lesson

Please see the meeting minutes for more details.

remove mention of google-refine

There is a note about "google-refine" in episode 2. It should probably be removed as no one in our workshops will have it instead of open refine. It has been deprecated a long time ago.

Dataset file needs to be renamed on Episode #2

On the Setup Instructions there are two options for learners to download the data file that will be used throughout the workshop. For both cases the file name is SAFI_openrefine.csv. On the second Episode Working with OpenRefine, however; the instructions for creating a new project tell learners to select the file "SAFI_messy_openrefine.csv". I suggest renaming the dataset on the second Episode to match the dataset learners will have on their computers.

add missing content

The following is a list of missing content for this lesson. Please check off items when they have been addressed:

Episode 2

  • keypoints
  • time estimates (teaching and exercises)

Episode 6

  • time estimates (teaching and exercises)

Reference

  • glossary

Discussion

  • this page is empty, can be removed

Figures

  • this page is empty, can be removed

Setup

  • Software box is empty

Instructors guide

  • this page is missing

Transition to standardized GitHub labels

The lesson infrastructure committee unanimously approved the proposal of using the same set of labels across all our repositories during its last meeting on May 23rd, 2018.

This repository has now been converted to use the standard set of labels.

If this repository used the previous set of recommended labels by Software Carpentry, they have been converted to the new one using the following rules:

SWC legacy labels New 'The Carpentries' labels
bug type:bug
discussion type:discussion
enhancement type:enhancement
help-wanted help wanted
newcomer-friendly good first issue
template-and-tools type:template and tools
work-in-progress status:in progress

The label instructor-training was removed as it is not used in the workflow of certifying new instructors anymore. The label question was left as is when it was in use, and removed otherwise. If your repository used custom labels (and issues were flagged with these labels), they were left as is.

The lesson infrastructure committee hopes the standard set of labels will make it easier for you to manage the issues you receive on the repositories you manage.

The lesson infrastructure committee will evaluate how the labels are being used in the next few months and we will solicit your feedback at this stage. In the meantime, if you have any questions or concerns, please leave a comment on this issue.

-- The Lesson Infrastructure subcommittee

PS: we will close this issue in 30 days if there is no activity.

regular expression lesson

Regular expressions are very valuable and it can be a good introduction to them in OpenRefine. However, regular expressions have a very high cognitive load, as they're a distinctly new concept. So, we would want to spend enough time with them for it to be useful, and that would add to the estimated time of this module. Given where OpenRefine is in the workshop, and that it's an overall start to working with data, my initial thought is that it would be more confusing than powerful as a concept at this stage. So, it would be best not to include it, at least in the original release of these lessons.

Are there thoughts on the regular expression lesson and whether or not to include it in this release?

Trailing and leading white spaces missing? (Submitted by email for Instructor Training)

I am submitting this issue on behalf of a contributor who submitted it through email. See text below:

In the Trim Leading and Trailing Whitespace part of the Working with OpenRefine episode, the data already appear to be cleaned. I am only getting four choices in the text facet without any transforms. I tried to find another column in the dataset that could be substituted for this example, but couldn't find one with inconsistent white spaces. Perhaps the teaching dataset needs to be made messy again?

Explain difference with spreadsheet editor; add info on running OR on server

Morning all,

A few suggestions for the Introduction to OR lesson (https://datacarpentry.org/openrefine-socialsci/01-introduction/index.html):

*In both workshops where I have been involved either as participant or helper, students were unsure about when to use a spreadsheet editor such as Excel or when to rather use OpenRefine. This may be clarified in the "motivation" section by changing the first bullet point:

OpenRefine is not a spreadsheet editor like Excel or LibreOffice and it is not used during the initial data collection stage, but rather a platform to help you see patterns or categories in your data and to clear up any errors in your data. OpenRefine provides a set of tools to allow you to identify and amend messy data.
(Perhaps one or two direct comparisons of things you would do using the two different tools as examples or an exercise to show the difference would also be useful.)

*There is a full stop in the header "Getting Help for OpenRefine" which needs to be removed.
Also after "Using undo and redo" (https://datacarpentry.org/openrefine-socialsci/02-working-with-openrefine/index.html)
and "Sorting by multiple columns" (https://datacarpentry.org/openrefine-socialsci/03-filter-sort/index.html)

A full stop is missing in the last bullet under "Key Points" - OpenRefine will automatically track any steps allowing you to backtrack as needed and providing a record of all work done*.**

*I would also suggest removing some of the columns from the SAFI dataset for use with this lesson. There is a lot of scrolling back and forth to find for instance the column with roof types in workshops :-)

*A last suggestion is to add some info on running OpenRefine from a server as a considerable amount of participants seemed to prefer this option during our last workshop, especially those who do not have admin rights to their laptops and have difficulties installing new software.

Let me know if you need any other info.

Open Refine defaulting to internet explorer on some devices

I use different devices to access Open Refine, but I have had a recurring issue trying to launch in one institution issued device.

On this particular device, every time I try to create a project, it automatically launches on internet explorer and just will not create. I finally copied the URL to a chrome browser and the project creates. I don't know if this is a unique issue with this device, but that's what I had to do to create projects.

Thanks.

Explain if/when data leaves computer in Introduction episode

It is probably worth pre-emptively answering, in the Introduction, a question researchers often ask ... Is my data safe or is it being copied or uploaded? This is almost answered (indirectly) in the "Before we get started" section, but I'd highlight it as a "Feature".

This will remind novice users to be aware of where their data is going (which can be confusing in these days of xAAS), while reassuring the 'aware' user that their data doesn't need to go anywhere and can remain local.

Explain when (not) to use OpenRefine

I have taught the OpenRefine lesson a few times; most recently today. Even though I always try to explain when you could choose OpenRefine for a problem, and how to compare OpenRefine to spreadsheets and writing a script, students keep asking for more explanation and comparisons.
In our workshop the OpenRefine lesson is between Data organisation in spreadsheets and Introduction to R and that is also how I tried to frame OpenRefine: it shows your data like a spreadsheet application, but it has powers like a programming environment.

Seeing how I keep struggling to explain it well, even with years of experience with OR, we should probably improve the lesson materials.

It was suggested by helpers that referring back to my situating OR between spreadsheets and programming in the introduction later in the lesson might help, but the introduction episode should provide more context first.

Many columns in the lesson

There are a lot of columns in the data set, for this lesson, only a subset are used and the scrolling left and right to get to the right one often distracts learners; it's even slower for an instructor who has their screen zoomed in to be readable.

lesson release checklist

Lesson Release checklist

For each lesson release, copy this checklist to an issue and check off
during preparation for release

Scheduled Freeze Date: 2018-04-27
Scheduled Release Date: 2018-04-30

Checklist of tasks to complete before release:

  • check that the learning objectives reflect the content of the lessons
  • check that learning objectives are phrased as statements using action words
  • check for typos
  • check that the live coding examples work as expected
  • if example code generates warnings, explain in narrative and instructor notes
  • check that challenges and their solutions work as expected
  • check that the challenges test skills that have been seen
  • check that the setup instructions are up to date (e.g., update version numbers)
  • check that data is available and mentions of the data in the lessons are accurate
  • check that the instructor guide is up to date with the content of the lessons
  • check that all the links within the lessons work (this should be automated)
  • check that the cheat sheets included in lessons are up to date (e.g., RStudio updates them regularly)
  • check that languge is clear and free of idioms and colloquialisms
  • make sure formatting of the code in the lesson looks good (e.g. line breaks)
  • check for clarity and flow of narrative
  • update README as needed
  • fill out “overview” for each module - minutes needed for teaching and exercises, questions and learning objectives
  • check that contributor guidelines are clear and consistent
  • clean up files (e.g. delete deprecated files, insure filenames are consistent)
  • update the release notes (NEWS)
  • tag release on GitHub

remove scatterplot facet section

Episode 5 has a section on scatterplot facets. When teaching the same material from the Ecology lesson, I've always skipped this section as it doesn't seem useful to the learners. Should it be removed from this lesson (especially given the addition of the regular expressions episode)?

June 2019 Lesson Release checklist

If your Maintainer team has decided not to participate in the June 2019 lesson release, please close this issue.

To have this lesson included in the 18 June 2019 release, please confirm that the following items are true:

  • Example code chunks run as expected
  • Challenges / exercises run as expected
  • Challenge / exercise solutions are correct
  • Call out boxes (exercises, discussions, tips, etc) render correctly
  • A schedule appears on the lesson homepage (e.g. not “00:00”)
  • Each episode includes learning objectives
  • Each episode includes questions
  • Each episode includes key points
  • Setup instructions are up-to-date, correct, clear, and complete
  • File structure is clean (e.g. delete deprecated files, insure filenames are consistent)
  • Some Instructor notes are provided
  • Lesson links work as expected

When all checkboxes above are completed, this lesson will be added to the 18 June lesson release. Please leave a comment on carpentries/lesson-infrastructure#26 or contact Erin Becker with questions ([email protected]).

Adding content to cover OR's Extract/Apply functionality

Please delete the text below before submitting your contribution.


It seems like it would be really useful to cover OpenRefine's Extract/Apply functionality in this lesson. This functionality is visible in the Undo/Redo area. Extract creates a precise representation of all changes made in JSON, which can be copied and saved; Apply allows users to paste that in, and perform all the operations on a new data set.

I can see this being useful for both social science and natural science data, in particular. Is this something that would be useful for me to write up and add? I don't think it would take a particularly long time to include; possibly within 10 minutes or less.

Successful clustering methods for village column

In the [second lesson "Working with Open Refine"] (https://datacarpentry.org/openrefine-socialsci/02-working-with-openrefine)
under "Using clustering to detect possible typing errors" in step 6, it is stated that

You should find no more clusters are found. None of the available methods offered to cluster Ruaca-Nhamuenda with Ruaca or Chirdozo with Chirodzo.

This is incorrect as e.g. for the method nearest neighbor with ppm selected, Radius set to "4" and Block Chars set to "1", these clusters are correctly identified, which is also shown on this screenshot
There are also other clustering settings that lead to this result.

Since directly below the steps it is mentioned that using different or more clustering and merging methods here will result in different results later on, I am not sure what a change here would affect, which is why I am submitting this as an issue.

Update 'Find village "49"' exercise to use only interview date, because GPS isn't distinct

This lesson uses GPS and interview date to try to correct a mislabeled village. However, the GPS locations in the 3 villages are not distinct. With a scatter plot, we can see that the GPS locations are in 3 clusters, but each of those clusters has responses from multiple villages.

Is this a bad copy of the data? or is the actual GPS data bad?

If the GPS data is actually bad, maybe we should change the last exercise of episode 3 to only rely on interview date?

Edit: update the link to the exercise.


Conclusion by @bencomp from discussion below: let's update the exercise to only rely on interview date.

Text in Chapter 2 links to Ecology lessons instead of Social Sciences lesson

In Episode 2 of the OR for Social Sciences Lesson under More on Facets there is a link for further reading about Numeric and Scatterplot facets. This link refers to episode Examining Numbers in OR.
The link address is http://www.datacarpentry.org/OpenRefine-ecology-lesson/03-numbers/

So, at this point one is leaving the Social Sciences Lesson and switches to the Ecology lesson. If then following the lesson flow (e.g. forward/backward arrows), one is in the wrong lesson, which is not easy to notice, as structure of lesson and episode names are identical between Social Sciences and Ecology.

Suggestion: change this link to the identical episode Examining Numbers in the Social Sciences lesson https://datacarpentry.org/openrefine-socialsci/04-numbers/index.html

However, this same episode there does only contain details on numeric facets, but does not cover scatterplot facets. Therefore, either the text needs to be altered as well, or the additional part about scatterplots should be added to the episode for Social Sciences.
If the mix of lessons was intentional, maybe the text could make clear, that the link is outside the lesson and not a reference to content that will be covered later in the lesson.

Further hint: in Social Sciences the episode number for Examining Numbers in text and link is "04". In Ecology the episode numbers in the text are the same, but in URLs are all -1.

Align Objectives with content in introduction

I'm a member of The Carpentries staff and I'm submitting this issue on behalf of another member of the community. In most cases, I won't be able to follow up or provide more details other than what I'm providing below.

As part of the check-out procedure, I have a suggestion for an improvement. In the lesson "OpenRefine for Social Science Data" > episode "Introduction" (https://datacarpentry.org/openrefine-socialsci/01-introduction/index.html), two of the Objective may be changed as they are not covered in the episode:

  • Differentiate data cleaning from data organization.

  • Experiment with OpenRefine's user interface.

Instead features of the application are described, e.g.:

  • Understanding that OpenRefine's is located locally on a computer and not in the cloud.

Is "Describe uses and applications" a good Learning Objective; update GREL meaning

Hi there,

I have been working with the OpenRefine for Social Science Data lesson and have found a couple of issues. Though I realise you would prefer feedback on the OpenRefine for Ecology Data lesson, the first issue also relates to that particular lesson:

  1. (This is likely a minor issue) In the Objectives for the Introduction to OpenRefine for Social Science Data, you have 'Describe OpenRefine's uses and applications' - this is not so much a learning objective as a note for the instructor. Could this simply be removed?
  2. Where GREL is introduced in the 'Transforming Data' section of '2. Working with OpenRefine', GREL is still referred to as 'Google Refine Expression Language' in the text and in the screenshot. This might seem a minor point, but for a newbie (like I was a few weeks ago), when they see 'General Refine Expression Language' on their screen this might cause confusion.

Thank you for your time! :)

Antje

Mention (no) admin rights, adjust learning objectives in introduction, open resources in tabs

I'm a member of The Carpentries staff and I'm submitting this issue on behalf of another member of the community. In most cases, I won't be able to follow up or provide more details other than what I'm providing below.


In the setup page, you may mention that installing OpenRefine does not require administrative rights. People using work computers without admin rights may think they can't install it like other software, but indeed, you can install OpenRefine without admin rights.

In the introduction part, 1. the learning objectives don't match what is described on that page. The page does not cover Differentiate data cleaning from data organization and Experiment with OpenRefine's user interface. Suggest to change to "Identify features of OpenRefine". 2. When explaining create project, currently the sample data looks fine, but if users use their own data, they may get unrecognized characters. Maybe mention "In that case, select your text encoding method to US-ASCII"

Under working with OpenRefine, this sentence is confusing "Note that at step 1, you could upload data in a standard form from a web address by selecting Get data from Web Addresses (URLs). However, this won't work for all URLs". Need explanation of what works or not works.

In Other resources in OpenRefine section, I would suggest to change the links to other resources to open in another tab. Currently, if users start to explore that link, they will be hard to go back to the lesson.

Overall suggestions:

  • · The sample data has a lot of columns that are not used. It's hard to find a column to follow. Maybe delete some columns and leave only the ones used in class?
  • · Personally, I think the class may be reorganized. Currently, there are a lot of information in "working with OpenRefine" section. For example, Transforming data using GREL could be moved to the "examine numbers" section. That part is also doing data transformation. I would start by using the common transfer, then introducing GREL. GREL might be a little bit tricky for beginners, so I would not introduce in the earlier part of the class.

Clarify Enter vs Return; demonstrate edit functionality in text facet

I'm a member of The Carpentries staff and I'm submitting this issue on behalf of another member of the community. In most cases, I won't be able to follow up or provide more details other than what I'm providing below.


I would like to make a few suggestions to an existing lesson as part of my checkout process. The suggestions are rather small but something to consider potentially.

  1. Replace the use of "hit return" to either "hit enter" or "hit return/enter" the use of this key varies by computer mac/pc but could be confusing to users who do not have a return key.

  2. Facet edit https://datacarpentry.org/openrefine-socialsci/02-working-with-openrefine/index.html

Here we will use faceting to look for potential errors in data entry in the village column.

  1. Scroll over to the village column.
  2. Click the down arrow and choose Facet > Text facet.
  3. In the left panel, you'll now see a box containing every unique value in the village column along with a number representing how many times that value occurs in the column.
  4. Try sorting this facet by name and by count. Do you notice any problems with the data? What are they?
  5. Hover the mouse over one of the names in the Facet list. You should see that you have an edit function available.
  6. You could use this to fix an error immediately, and OpenRefine will ask whether you want to make the same correction to every value it finds like that one. But OpenRefine offers even better ways to find and fix these errors, which we'll use instead. We'll learn about these when we talk about clustering.

My suggestion would be to show learners how to make an edit here as well as "the even better way". Simply correcting one of the spelling errors in the variable list would add maybe 10 seconds to the lesson and provide additional context.

Explain goals of the lesson more clearly in the Introduction

Hi,

I'd like to suggest a new/modified objective for the Data Cleaning module - Introduction, more focused on learning outcomes.

Participants will be able to process data in order to ensure that it is correct, consistent, and useable by detecting any errors in the data, correcting or removing corruptions in the data, or manually modifying or deleting the coarse data as needed.

Hope this is useful.

Cheers,
Deena Yanofsky

Suggestion: Update the screenshot

In the working with data session, the first screenshot under "Creating a new OpenRefine project" is out of date, it looks different from what I am viewing, I suggest to use a more recent screenshot to replace the current one.

Screen Shot 2020-08-10 at 10 56 13 PM

When teaching this part, it is useful for students to see what the preview looks like by clicking through different options, so I suggest to add a note to the session to point out that the preview window will show different data views when the file format changes.

Check for links to operefine documentation

openrefine docs are being changed soon, so we should be prepare to update any pointers to them

This heads-up is just to flag that if you have any links to OpenRefine documentation in your lessons you may need to update the links once the new documentation is published. We’re still working on finalising the documentation and it is always going to be a work in progress - so we’re always open to improving it - but you can see the updated documentation as it currently stands at https://docs.openrefine.org/

Facets: two grammatical changes

Under Facets, two suggested edits bolded and italicized below ("help" and "allows"):

Facets are one of the most useful features of OpenRefine and can help both get an overview of the data in a project as well as help you bring more consistency to the data. OpenRefine supports faceted browsing as a mechanism for seeing a big picture of your data, and filtering down to just the subset of rows that you want to change in bulk.
A ‘Facet’ groups all the like values that appear in a column, and then allows you to filter the data by these values and edit values across many records at the same time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.