Code Monkey home page Code Monkey logo

sql-novice-survey's Introduction

GitHub release Create a Slack Account with us Slack Status DOI

sql-novice-survey

An introduction to databases and SQL using Antarctic survey data. Please see https://swcarpentry.github.io/sql-novice-survey/ for a rendered version of this material, the lesson template documentation for instructions on formatting, building, and submitting material, or run make in this directory for a list of helpful commands.

Authors

A list of contributors to the lesson can be found in AUTHORS

License

Instructional material from this lesson is made available under the Creative Commons Attribution (CC BY 4.0) license. Except where otherwise noted, example programs and software included as part of this lesson are made available under the MIT license. For more information, see LICENSE.md.

Citation

To cite this lesson, please consult with CITATION To use a particular version's DOI, refer to all Zenodo released versions

Maintainer(s):

sql-novice-survey's People

Contributors

aaren avatar abbycabs avatar abought avatar benwaugh avatar danmichaelo avatar dwinston avatar erinbecker avatar ethanwhite avatar fmichonneau avatar gcapes avatar guyer avatar henrykironde avatar jamesscottbrown avatar jcszamosi avatar kyrretl avatar matty-jones avatar mckays630 avatar morgantaschuk avatar orchid00 avatar pbarmby avatar pgmccann avatar pli888 avatar r4space avatar remram44 avatar rgaiacs avatar simonw avatar tbekolay avatar timtomch avatar wking avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sql-novice-survey's Issues

Errata in Filtering Lesson of Using Databases and SQL

In the Filtering lesson of Using Databases and SQL I came across the following in the following:

<<<<<<< HEAD (The parentheses around the individual tests aren't strictly required, but they help make the query easier to read.)
Date types
======= >>>>>> 06563c2

It occurs right after the SQL statement:

SELECT * FROM Visited WHERE site="DR-1" AND dated<="1930-00-00";

Looking at the files in the repository, I did not see it in 03-filter.md, but it is in the page 03-filter.html.

Replace wget

Sounds that wget don't work on Windows. We should replace it with instructions to download the file.

Improve semantic

We made a commitment to avoid HTML in our Markdown because (1) doing it is prone to error and (2) raise the bar for new contributions. However I want that we replace

> **Person**: people who took readings.
>
> |ident   |personal |family
> |--------|---------|----------
> |dyer    |William  |Dyer
> |pb      |Frank    |Pabodie
> |lake    |Anderson |Lake
> |roe     |Valentina|Roerich
> |danforth|Frank    |Danforth

with

<figure>
<figcaption>**Person**: people who took readings.</figcaption>
|ident   |personal |family
|--------|---------|----------
|dyer    |William  |Dyer
|pb      |Frank    |Pabodie
|lake    |Anderson |Lake
|roe     |Valentina|Roerich
|danforth|Frank    |Danforth
</figure>

screen shot 2015-07-26 at 10 20 34

Make this change will (1) improve our lessons with HTML recommendations and (2) make it more accessible.

Posponed a little "Changing database managers" callout box

At 01-select.md, what about

(...) The database manager does whatever lookups and calculations the query specifies, returning the results in a tabular form that we can then use as a starting point for further queries.

Queries are written in a language called SQL, which stands for “Structured Query Language”. SQL provides hundreds of different ways to analyze and recombine data. We will only look at a handful of queries, but that handful accounts for most of what scientists do.

Changing database managers

Every database manager — Oracle, IBM DB2, PostgreSQL, MySQL, Microsoft Access, and SQLite — understand SQL but stores data in a different way, so a database created with one cannot be used directly by another. However, every database manager can import and export data in a variety of formats (including SQL), so it is possible to move information from one to another.

The tables below show the database we will use in our examples:

wget not in macs

wget does not come with macs we should provide a manual download to the survey.db

Creating survey.db database from csv files doesn't observe null values

Empty values in a csv file get written as empty strings not as 'null' (see http://sqlite.1065341.n5.nabble.com/import-null-values-td3405.html).

The survey.db database creation script imports data from csv, which means the empty values in the Survey and Visited tables (in person and dated cols respectively) import as empty strings rather than nulls.

These can be corrected using:

UPDATE Visited SET dated = null WHERE dated = '';
UPDATE Survey SET person = null WHERE person = '';

Streamlining

tl;dr - Asking about two changes to the setup page before spending time on the pull request.

Right now the current student-facing setup lesson page does not include anything about how to quit SQLite. Novices encountering SQLite for the first time may become frustrated by not knowing how to quit the program back to the command line. Worse, they might assume the only way to quit is by exiting and restarting the shell. Is there a reason for this omission? If not, I'll put up a pull request to add the following snippet from the instructor page:

To exit SQLite and return to the Bash shell, use .quit:

sqlite> .quit
$

Also, the setup page includes Python specific commands such as opening up an ipython notebook and running python commands. Do we require that participants\learners need Python installed and need to know Python concepts in order to do SQL? I understand that we normally teach Python and SQL in the same workshops. As we have more R workshops, we may need to teach SQL without teaching Python.
I propose that we should move the Python non sequitur to either a separate page in the SQL repo or to a page within the Python repo.
Since there may be back history that I don't know, I wanted to ask for input before making the pull request. Either way, I'm happy to prepare the markdown files.

Issues at instructors.md

From https://github.com/swcarpentry/lesson-template/blob/gh-pages/tools/check.py:

  • Heading at line 58 should be level 2
  • Heading at line 64 should be level 2
  • Heading at line 91 should be level 2
  • Heading at line 127 should be level 2
  • Heading at line 171 should be level 2
  • Heading at line 180 should be level 2
  • The document links to 01-one.html, but could not find the expected markdown file /tmp/sql/01-one.md
  • The document links to 02-two.html, but could not find the expected markdown file /tmp/sql/02-two.md

showing null values

Is there any reason to not use Date as a type for dates?

We use TEXT as the type for dated in the Visited table, but SQLite has a DATE type which I think is more appropriated here.

I'd prefer DATE because it works the same, but also is able to handle different formats or explain how dates can be ordered.

Is there any reason not to use it?

SQLite database not available

$ find . -name '*.db'
$ find . -name '*.sql'
./code/gen-survey-database.sql

and

$ ls -1 data
person.csv
site.csv
survey.csv
visited.csv

no sqlite prompt

I'm trying to work through the sql lesson.

Windows 10, working through git bash.

I run

sqlite3 survey.db

and I do not get the prompt that is outlined in the instructor.md lesson. I just got a cursor.

At first I assumed something was wrong, but no, it works just fine from that cursor, there is just never a prompt

sqlite3 --version

3.8.4.3 2014-04-03 16:53:12 a611fa96c4a848614efe899130359c9f6fb889c3

0th lesson with db install and .schema/.tables?

I am thinking of adding a 0th lesson which would do the following:

  • Show explicitly where to get the db file (I had to go digging) and how to get it into sqlite.
  • Show how to figure out which tables you've got (.tables) and what kinds of things are in them (.schema). My reason for wanting to add these are to help them see which tables we've got and which column names they have. I always tend to forget what they are, hence my idea of showing how to see what they are.

I am especially unsure about .schema. I myself am reasonably comfortable with sql, so I am not confused by the resulting CREATE TABLE thing that appears, but I do see the potential confusion for the student.

Any comments/suggestions on this?

Programing with databases in R

Is it possible to mention that databases can be used from R as well as in the python example in lesson 10?
the function sqldf() in R works perfect for database queries and I think should be at least mentioned somewhere in this page.

SQLite installation issue on Windows

From one workshop

A few windows laptops had an issue with SQLite installation -
SWC installation package (which includes nano, sqlite etc) didn't works because learners have
no permission to install, and single file binary from SQLite website
didn't do the trick either (for some unknown reason).
In the end, SQLite manager
(Firefox plugin) was used as the last resort after 10~15 minutes trying to make SQLite work.

Very useful visualisation of code for joins

Perhaps share link with students? It's just so useful for getting the right code for the different types of joins (as well as knowing what an "outer" or "inner" or "right" joins are, which are not intuitive for non software devs, but super-easy to understand once you get them).

http://sql-joins.leopard.in.ua/

/sorry if this is the wrong place to leave a comment/

01-select.md talks about three entries in tables being red

There are no colors on the three data tables (anymore), so the three entries with -null- are not currently visually highlighted.

Either the formatting needs reapplying, or the text edited:

Notice that three entries --- one in the `Visited` table,
and two in the `Survey` table --- are shown in red
because they don't contain any actual data:
we'll return to these missing values [later](05-null.html).

What are fixmes?

Are fixmes things that should be fixed, i.e. something is wrong, or are they lesson challenges? I get confused by the naming - some of them at least seem to be potential lesson challenges.

Should we generate the database from a script?

At a workshop today people were confused by the survey.db file and I wonder whether it would be better to start off sourcing survey.sql instead.

I think what confused people was the fact that the data seemed to appear from nowhere. Maybe a better start should be calling sqlite3 with no arguments, showing nothing is loaded, and then source survey.sql to create the data.

We could save the file in the next step arguing that a typical database would be much bigger and the process of loading the data much longer.

The advantage that I see is that people could get an idea on which and how the different tables are created (specially instructive for people who are thinking in creating their own database to substitute spreadsheets).

Solutions to SQL challenges

The guidelines here mention including solutions for all challenges, but I don't see a place to do this in the SQL episodes. Please advise :)

Update 2 challenges (or adding null values)?

I suggest to update the following 2 challenges:

In the Filter lesson
"Suppose we want to select all sites that lie more than 30 degrees from the poles. Our first query is:
SELECT * FROM Site WHERE (lat > -60) OR (lat < 60);
Explain why this is wrong, and rewrite the query so that it is correct."

The query in question and the correct answer both give the same results because there were only 6 data records, and all of them satisfy both conditions. We may want to use another example or change some values in the data.

In the aggregation lesson:
"The average of a set of values is the sum of the values divided by the number of values. Does this mean that the avg function returns 2.0 or 3.0 when given the values 1.0, null, and 5.0?"

Without a NULL in an integer/real number field, we cannot show an example how SQL takes averages while ignoring NULL. If the data were not real, I'd suggest to replace some of the readings with NULL so that we can demonstrate it.

Back next buttons from lessons

Would be very helpful to have a back and a next button at the end (maybe also at the begining) of each of the lessons. Instead of going to the first lesson every time to jump to the next one. Or is there another way to do this?

Installation issues

We recently gave the SQL lesson for the first time, and I wanted to share some of the installation issues (which took an hour to fix) we had.

  • Windows, learner had installed Anaconda, but sqlite was not installed. Solution: open Anaconda Prompt, write ‘conda install sqlite=3’
  • same problem, but above solution didn’t work as learner did not have permission to install in the Anaconda3 folder (even though they were able to install Anaconda)
  • Running Windows installer was blocked by antivirus software; solutions: variable, ideally learner can temporarily disable the antivirus software or create an exception (or designated trusted folder for downloads); alternatively provide installer on USB sticks
  • Windows 10 (maybe also8 and 7), learner using SWC windows installer: sqlite3 is not added to PATH-variable; solution: manually add

Query definition in the glossary

The query definition is not consistent with what is said on the creating and modifying data

Records can be inserted, updated, or deleted using "queries".
Whereas query definition says: A database operation that reads values but does not modify anything.

I think queries not only retrieve information, but Insert, Update, Drop statements can also be called queries?
Either one of those two should be fixed.

_episodes/01-select Objectives suggestions

As part of instructor checkout, I'm offering suggestions to make the objectives for 01-select more specific and demonstrable. In support of objectives 1 and 2 I have suggested activities that could be incorporated into instructor notes.

Objectives
Learners will:

  • Identify individual elements of a database in order to demonstrate knowledge of the differences among tables, records (rows), and columns (fields).
  • Differentiate between example managers and content in order to understand the difference between a database manager and a database.
  • Practice writing SQL queries that select all values for specific columns (fields) from a single table in order to begin manipulating data outputs from a relational database.

Suggested activities for instructor notes:
Objective 1:
Instructor types a heading into the workshop etherpad and asks learners to type in an example of the correct element, with all examples taken from the lesson dataset, e.g.:
Type one table name:
Type one column name:
Type one observation from a row in the table named Person:

Objective 2:
Instructor types into the workshop etherpad a mixed list of managers and content drawn from everyday life and then asks learners to identify which is which using the red and green sticky notes. For example, a mixed list might include Outlook, Gmail, Email, Contacts. The instructor asks, "Show your red sticky if you think Outlook is a manager. Show your green sticky if you think Email is a manager." The idea of the exercise is to fulfill the objective by relating the unfamilar concept of a database manager and a database to more familiar managers and the content or data they manage.

setup

the page http://swcarpentry.github.io/sql-novice-survey/guide/ section SQLite Setup clones the git repo, which seems unnecessary if this page is intended for learners to setup, and in any event breaks down for me at the line sqlite3 survey.sqlite .read bin/create-db.sql

Am I missing something here?

Shift some focus from complex query construction to schema design

I layed out my reasoning and past experience here, but I'm splitting this out into a separate issue because it's partially independent of the SQLite support for INSTR (which is the focus of #13). I initially responded there because I don't think we should be talking about INSTR in a novice SQL lesson at all, and I'd rather we spent that time getting students comfortable mapping between the real world an a database schema.

SQLite & Python working together on OSX

The python script at the top of the programming lesson fails for me, with the complaint

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    cursor.execute("select site.lat, site.long from site;")
sqlite3.DatabaseError: file is encrypted or is not a database

A little googling suggests that python 2.7 uses some ancient sqlite version, that doesn't play nice with more recent ones; there are some suggestions on how to fix this on windows here and here, and something about coping with this in a virtualenv here, but this wasn't helpful to the casual OSX-using pythonista.

Does anyone know how to make Python 2.7 and SQL play nice together on OSX 10.9.5?

$ python --version
Python 2.7.8 :: Anaconda 2.1.0 (x86_64)
$ sqlite3 --version
3.7.13 2012-07-17 17:46:21 65035912264e3acbced5a3e16793327f0a2f17bb

css formatting

I have some suggestions on the template

Padding:
*I see that most if not all tables need a bit more of bottom padding, the letters in the next paragraph are too close.
*Challenges on the other side are very spacious in between, and probably the question title should be a bit smaller.
Colors:
*Table headers, I don't know if it is just me, but that blue is too dark to be combined with black... just saying

First exercise in the Filtering Lesson

At a workshop i taught a couple of weeks ago, there was a lot of confusion about the first exercise in the filtering lesson:

Suppose we want to select all sites that lie more than 30 degrees from the poles. Our first query is:

Fix this query:
SELECT * FROM Site WHERE (lat > -60) OR (lat < 60);
Explain why this is wrong, and rewrite the query so that it is correct.

I think people understood that the solution is to replace OR with AND, but they were trying to use this query on survey.db to check their work. Because none of the sites in survey.db are less than 30 degrees from the poles, you get the same output from the query using AND/OR. This confused a lot of people. Could we change the exercise so that it excludes some sites from the dataset, or change the dataset to include sites nearer to the poles?

  • Tobin

Absolute link to "/raw/gh-pages" breaks in lesson release

The index page have the following links

 [survey.db]({{ site.github.repository_url }}/raw/gh-pages/files/survey.db)

This explicitly links to the currently latest version of db file, which can be a good thing (propagate fixes more rapidly) or a not-so-good thing (possible incoherence between the lesson people are reading and the db file they might be downloading).

This semi-absolute link also causes problems (by default) in the releases, as in http://swcarpentry.github.io/swc-releases/2017.02/sql-novice-survey/

I have partial information on this and there might be other reasons for this link, but my recommendation would be to switch to a relative link like "files/survey.db" or "./files/survey.db".

Compare to Excel early on

From swcarpentry/DEPRECATED-bc#750:

A learner writes: "I want to know in the first part of the page what I can do in SQL that I cannot do in Excel. Begin with a short example of a data manipulation (changing a person’s name when they get married, changing a sequence accession for a set of bioinformatics results) that is easy to do with SQL, but much more challenging (or error prone) with spreadsheets."

instr function not available in SQLite 3.7.13 and older

One of the challenge questions in SQL lesson 04-calc requires students to use the instr function. This function wasn't introduced until 3.7.15 and it's likely students using the pre-installed version of SQLite on Mac OS X and Linux will have older versions. Perhaps add a note in the lesson? Or prompt students to check their version of SQLite in the installation instructions and upgrade?

idea: use JavaScript-based SQL tool for teaching

When teaching the SQL lesson this week, I was not so happy with SQLiteManager.

  • the GUI seems a bit cluttered, it's not intuitive what the different panes are for. I felt the need to explain where we are in every step.
  • only one query can be displayed at a time: there is no progression visible for learners

There might be an alternative: Today I discovered sql.js, which provides a JavaScript based SQL interpreter. There's even a GUI already available which might solve both problems mentioned above.

opinions?

PS: I already started a more general, but related thread on the discussion-list.

change extension: survey.db --> survey.sqlite

When teaching this lesson using the Firefox add-on, the add-on expects the file to end in sqlite. Renaming the file would remove one step when explaining how to import the data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.