data-8 / connector-instructors Goto Github PK

View Code? Open in Web Editor NEW

10.0 37.0 3.0 168 KB

Resources for Connector Instructors

Home Page: http://data8.org/connector-instructors/

Jupyter Notebook 71.90% Python 27.10% HTML 1.00%

connector-instructors's Introduction

This documentation is for a collection of code that aims to be useful for connector instructors.

If you have questions about the class, finding students, building your course, etc, check out the wiki here.

If you're looking for the github page for the repository, try here.

Overview

Currently, this repository contains three main folders:

/connectortools: A python module that has some useful functions for using Jupyter Notebooks and JupyterHub in your classes. This is in the connectortools folder and can be installed with the setup.py script.
/examples: A collection of examples for how to use connectortools.
/tutorials: A collection of general tutorials that cover topics in programming, computation, etc. Connector instructors may find these useful in teaching basic concepts, and are encouraged to copy/modify them however they'd like.

Interactive notebooks

Using Data8

We've created an interact link for this repository here. Click it, and it will copy everything in this repo into your own datahub instance. You can then run the tutorials and examples from there.

Using Binder

Alternatively, you can use mybinder to demo this code. This is a free service put together by the Freeman lab. If you're interested in running the tutorials or examples within Binder, click the image below to enter an interactive browsing session.

If you have a suggestion for something to add, open an issue in this repository and we can try to do so!

connector-instructors's People

Contributors

Stargazers

Watchers

Forkers

choldgraf praveer08 manasa0211

connector-instructors's Issues

`interact` mechanism should support private repositories

The idea here is based on discussion with @SamLau95 in gitter.

It would be nice to have a mechanism by which connector instructors can distribute notebooks & data to the Jupyter servers without it being public.

The current workflow outlined in this repo to use the interact URLs assumes that the repository is public, and also assumes that it should clone the branch named gh-pages, even though this isn't explicitly documented. Seems like assuming the branch is gh-pages creates another point of failure? A connector instructor just trying to distribute content in this way has no need of the HTML functions of gh-pages, and @anthonysuen 's team is already putting website information on all the gh-pages branches.

Sam suggested it would be possible to make sure the server has the credentials required to import from private repositories on data-8.org.

Related to this issue: On the mailing lists, @pattyf brought up the question asking for recommendations on how connector instructors manage both public content for students & private content like answer keys. @papajohn suggested having two separate repositories on data-8 to address this; one for public and one for private copies.

https://data8.berkeley.edu/ returns a 403 error

I'm following the directions for connector instructors for creating a new jupyter notebook and the url
https://data8.berkeley.edu/ returns a 403 error page.

Add explanation of the jupyterhub setup

This could go in the tech infrastructure section of the wiki. It would describe the hardware setup we use (e.g., azure cluster, n_machines etc), and the software that we use to handle the container system (e.g., kubernetes, docker, jupyterhub, etc). Similar to Sam's scipy 2016 talk.

add a postgresql python module (psycopg2?) to data8 server?

Could we get a postgresql module added to the servers? psycopg2 seems to be what comes up when I google but I'm open to suggestions (still thrilled with @ryanlovett 's suggestion to use requests instead of urllib/urllib2)

It's not essential as I'm not actually teaching SQL queries and can always extract the data into csvs beforehand for students, but I prefer to at least illustrate where data originally comes from to try and demystify this a bit. Thanks.

Update contact information

See here: https://github.com/data-8/connector-instructors/wiki/Contacts

Where do we provide overall info about connector resources

Currently in
https://docs.google.com/spreadsheets/d/1gcJ_zkvkbVzOu1z1HFtIKRAfczcA5QaELNBkvcTwRLM/edit?ts=587e91b6#gid=0

Students getting 503 errors when accessing notebooks

This was reported by @mahoneymw, who has said that a number of students are getting 503: Proxy target missing errors. Perhaps @ryanlovett has an idea what might be going wrong?

Error with uploading csv table

Dear all;

I encountered the following error, while trying to import a csv file, the basis for a notebook. The material is posted below, and thank you for any suggestions,

Andrej

/1/ I got to my tree: https://data8.berkeley.edu/user/andrej/notebooks/history-connector/Hello-Andrej.ipynb

/2/ and typed in the code, as before:

from datascience import Table
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
%matplotlib inline

/3/ and got a very, very different error, which I see as an improvement, and a suggestion for a next step would be wonderful!:

slavery_data = 'https://data8.berkeley.edu/user/andrej/edit/history-connector/07423-0001-Data.csv'
full_table = Table.read_table(slavery_data)
full_table

CParserError Traceback (most recent call last)
in ()
1 slavery_data = 'https://data8.berkeley.edu/user/andrej/edit/history-connector/07423-0001-Data.csv'
----> 2 full_table = Table.read_table(slavery_data)
3 full_table

/opt/conda/lib/python3.4/site-packages/datascience/tables.py in read_table(cls, filepath_or_buffer, _args, *_vargs)
323 except AttributeError:
324 pass
--> 325 df = pandas.read_table(filepath_or_buffer, _args, *_vargs)
326 return Table.from_df(df)
327

/opt/conda/lib/python3.4/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
496 skip_blank_lines=skip_blank_lines)
497
--> 498 return _read(filepath_or_buffer, kwds)
499
500 parser_f.name = name

/opt/conda/lib/python3.4/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
283 return parser
284
--> 285 return parser.read()
286
287 _parser_defaults = {

/opt/conda/lib/python3.4/site-packages/pandas/io/parsers.py in read(self, nrows)
745 raise ValueError('skip_footer not supported for iteration')
746
--> 747 ret = self._engine.read(nrows)
748
749 if self.options.get('as_recarray'):

/opt/conda/lib/python3.4/site-packages/pandas/io/parsers.py in read(self, nrows)
1195 def read(self, nrows=None):
1196 try:
-> 1197 data = self._reader.read(nrows)
1198 except StopIteration:
1199 if self._first_chunk:

pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:7988)()

pandas/parser.pyx in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8244)()

pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:8970)()

pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838)()

pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:22649)()

CParserError: Error tokenizing data. C error: Expected 1 fields in line 12, saw 2

More guidance re: how to create student notebooks

Hi @choldgraf , following your example notebook create_student_notebooks.ipynb I tried to create a student version of a notebook with an answer key on datahub.berkeley.edu but was unsuccessful. It cleared the output but didn't remove the code between ### SOLUTION BEGIN and ### SOLUTION END. I think my problem stems from this line import connectortools as ct. Neither nbgrader nor connectortools are installed on datahub.berkeley.edu. So my hack-around was to (1) !pip install nbgrader (not sure if that was wise) and (2) cut and paste the content of grading.py into a cell & run it. That created some errors that I attempted to fix. I got the code to run but the output wasn't as I expected. Any tips on how to run your connectortools in jupyter hub appreciated.
thanks, Patty

Add `nbgrader` as a default package in the cluster

hey @ryanlovett can we add nbgrader to the docker image for the cluster? That way connector instructors can use the connectortools module on datahub.

Submit button missing on Jupyter notebook

I have done my assignment, but there is no "submit" button on top of the toolbar. I have tried to reload the lab many times, the "submit" button is still not there.

Anyone can help?

edits needed on wiki pages

These are mostly for @choldgraf and @ryanlovett.

Home

What exactly is "the connectortools module"? It's mentioned, but I don't think it's explained.

Is it just the code in https://github.com/data-8/connector-instructors? Or the particular "connectortools" folder in there?
Should we make a list of what's available in "the connectortools module"? Would that go under Software Resources? Or somewhere else?

FAQ

Refers to tables demo, looks like it should be linked, and I don't know where that is.

Hardware Resources

I'm getting confused by the terms used throughout the wiki. Could we standardize and/or explain:

Does the cluster = our jupyterhub = datahub?
Are all of these identical to datahub.berkeley.edu (as a URL)?
Which of these will persist if we're using different resources from Data 8 (that's datahub this semester)?
Is "the datahub team" Ryan? Or Ryan plus others?
Under "Storing data and I/O" was there some talk about Box or Drive or something?

Software Resources

Under "Installing libraries and packages," can we list what libraries are installed for everyone?
Is this where we'd provide info about the datascience package?
Is there a place here to describe what's in the connectortools module?
I would sort have thought that GitHub and Interact would show up under Technical Infrastructure.

Technical Infrastructure

Will this eventually show up in the right-hand menu, once there's more to it?
What are the boundaries between Hardware Resources, Software Resources, and Technical Infrastructure?

Workflow

I don't know how this works, so can we ask a more experienced instructor to review it?

Overall

Wherever it refers to "the main course," replace by "Data 8" or "the Foundations course." I've done this everywhere I've caught it.

Private vs Public repos for connector instructors?

@anthonysuen, @papajohn : Do you know if the dsten organization has the capacity to offer private repositories for connector instructors? Following the strategy of the core course, private repositories seem better suited for making exercises and the like available to other DS8 instructors without tempting students not to do their own work.

As an intermediate solution, instructors could host their own private repos for that purpose, but it seems more streamlined to have them here, particularly if the goal is to share with instructors while not being public to students. Again the core course seems to have a very nice set-up where student-facing material goes on the gh-pages branch and becomes public, while assignments remain private on the repo master branch. Of course, gh-pages might not be amenable to everyone.

Connector instructors: I'd be curious to hear thoughts of other connector instructors on how you have or plan to manage this aspect. Would a private repository for the connector be helpful?

Rewrite parts of wiki to make it user intutitive.

Three Steps

Upload Notebook on Github -> Interact Link -> Distribute via Bcourses, website, etc.

Write helper to convert repo and folder paths to an interact link

Basically, we'd like to put a little helper on http://data8.org/connector-instructors so that instructors can input their repo + path(s) and get back a link.

This will help avoid spelling errors and other common mistakes.

Integrate some grading infrastructure

E.g. nbgrader or okpy

Need to figure out what kind of barrier to entry there is for connector instructors.

Please install Shapely and Pyproj libraries for geospatial connector

@ryanlovett Please install the shapely and pyproj libraries. See attached notebooks as examples of use. I would like to use these in 2 weeks.

shapely_pyproj_notebooks.zip

Thanks much.

Update technical information in this repository + moving stuff to a wiki

Before the new semester starts we need to do these things:

General

Update all technical information (links, instructions, etc) so that they're correct with the current setup of jupyterhub etc.
Throughout all of the wiki, change data8 to datahub
Migrate content to a wiki so that the "code" and "conceptual" components of this repository are nicely separated.
Update the FAQ, modularize it & give sub-headings
Add binder links where relevant so people can interactively click on them

Hardware Resources

Mention that nodes are shared between people. You should only use one core and if you need more, talk to the infrastructure people.
Memory restrictions - mention the thing that says how much memory you have left in the top-right
Say what happens when you hit too much memory in more detail, explain the error message, restarting kernel, etc
Warn against using multiple CPUs or n_jobs > 1
IO - mention 10 GB limit
Multiple options: download the files within the cluster (show the connectortools download module) demo creating your own download file). OR, talk to the tech team about setting up shared disk space

Software

Updating libraries - show that you can either ask Ryan etc to update which will take a while, or you can use the connectortools function to update it under the hood. Write a binder demo to show this.
Move interactive notebooks section so that it's in "home", explain binder, and sprinkle binder links throughout
Edit "setting up a course" to be more about setting up an account since that's really what it covers. Note that it doesn't have to have a github
Mention that if you want to create interact links then you need a github account.
Add a section on creating interact link.

Tech infrastructure

Add a quick explanation of each major component of the jupyterhub process
Links for each one to go towards more in-depth information
Quick guide on using docker locally to get the jupyterhub deployment on your computer

Workflow

There's overlap in information w/ "setting up your repo" elsewhere...remove redundancies.
Change the "add datasets" section to point them to the adding data section and add the "upload" button there...keep the information there though, so it's still a full workflow

I can do no. 2, @ryanlovett do you think we can chat briefly about no. 1 ?

Students with 503 Proxy Target Missing/500 Internal Server Error

Hi,

A few students in the smart cities connector class got errors when trying to use data8.org. There usernames are:

503 Proxy Target Missing:
wangandai

500 Internal Server Error:
satish.vinay
igormartire

Can you look into these for me and see if you can reset the accounts? In the future is there a way I can fix this myself?

Best,
Maddie (Smart cities connector GSI)

Clicking interact buttons sometimes clones the whole repository

A few people mentioned that clicking interact links was copying over whole repositories (or at least whole director structures).

I just tried this myself and wasn't able to replicate the problem (clicking interact with a single file in root director, a single file embedded in a folder, and a single folder). So not really sure whether this is a bug or not, but maybe I missed something.

I'm opening this issue in case the problem comes up again and in case someone else can shed more light on this. Maybe @deculler or @pattyf can share their experience?

Instructor test accounts

@deculler and @cboettig (in #3) have both expressed an interest in being able to have non-admin accounts on JupyterHub for testing the student experience. Here are a couple of ways that might find their way into documentation:

A SPA.
- Instructor needs to manually delete the account after the semester
A CalNet guest account
- Intentionally transitory
- Really meant for non-campus people

In either case, the instructor would either personally add the account to JupyterHub via their own account with admin rights, or send the name of the account to an admin and have them do it.

Students who are in a connector course but not in Data8

A few instructors are trying to figure out what to do with students who aren't in data8, but who want to take the connector courses. Right now, this seems to fall into 3 categories of people:

People who took data8 before, but want to take another connector course
People who have previous computational skills (e.g., 61A) but who don't want to take data8
People who are interested in the concepts of the connector courses but don't have computational background

Right now we've been telling instructors that it's up to them, but maybe we should have a plan for these situations.

Outreach for Connectors

Ensure connector courses are getting filled.

Add more instructions about creating notebooks to Getting Started as a connector instructor doc.

I don't think the instructions for notebooks addresses the following:

how to upload data with a new notebook
this workflow: clone notebook > edit repo locally > merge changes to master > merge changes to gh-pages. When I tried this I get the following:

dlab-patty:geospatial_connector2 patty$ git push origin gh-pages
ERROR: Permission to data-8/geospatial-connector.git denied to pattyf.
fatal: Could not read from remote repository.

Update the README / docs

How to install docker (see https://github.com/data-8/data8-notebook)
Point instructors to generalist lectures / tutorials in repo
Short "best practices in connector class structure" guide
Memory management / reminding instructors and students to restart the kernel or their whole jupyterhub session
Case studies in connector courses?

Suggestion for README & FAQ

A few suggestions that might improve readability / organization. Consider:

Adding a (linked) table of contents to the README.md
Breaking the FAQ out into a separate FAQ.md doc in the repo, and just linking it from the README?

ALL connectors - Getting in touch with Chris Holdgraf

"resetting" modified notebooks

Right now it's not simple for people to go back to square one on their notebooks. Several instructors mentioned that it would be useful to be able to pull a fresh copy of a notebook / directory / etc in case they mess something up or want to start anew.

Right now, things behave weirdly when you delete / rename / etc a file and then click on the "interact" links again. I think we should come to a consensus for how to handle this situation as simply as possible.

Right now the working idea is to automatically pull backups whenever someone clicks an "interact" link. That way people can always just copy over the original notebook if they mess something up, without doing any extra pulling / deleting / etc. These backups might be in-line w/ the regular files, or stored in a backups folder that comes with the files you just pulled.

Redirect directly to specified path when pulling from Github

Right now, hitting a url like:

https://data8.berkeley.edu/hub/interact?repo=data8assets&path=labs/lab01

Results in the student being redirected to the root of the repo: data8assets.

Instead, it'd be more convenient if the student were redirected to the path given in the URL: data8assets/labs/lab01.

Since you can specify multiple values for path in the URL, eg.

https://data8.berkeley.edu/hub/interact?repo=data8assets&path=labs/lab01&path=labs/lab01/lab01.ipynb

The last path will be chosen to redirect to. In the above example, the student will be sent directly to the notebook.

See the discussion in #15 and data-8/interact#3 .

Grading resources

Where do we describe resources for grading?

@yuvipanda wrote in an email

I am also in contact with the wonderful folks who build and maintain the ok server (which data8 now uses for grading, tests, and other course support) about writing an OK authenticator. This would make it easy for other connectors to use the same tools that John uses for his courses, and also to limit access based on rosters. Thoughts welcome on okpy/ok#1039

Add links section to Wiki

I would find it helpful to have a links section in the Wiki side menu. There are lots of URLs within the Wiki pages but would be good to have quick reference.

Also, I would add a reference to the datascience package documentation (http://data8.org/datascience/) in the Course Software wiki.

interact links on datahub not working for me

Hi @choldgraf,
I've been reading the Workflow Wiki which shows how to create an interact link. However, I cannot get interact links for my connector repo to work on datahub. This following gives me a "this site can't be reached" error:
https://datahub.berkeley.edu/user-redirect/interact?repo=geospatial-connector&path=drafts1/geoparsing

But the url syntax from last year still works:
https://data8.berkeley.edu/hub/interact?repo=geospatial-connector&path=drafts1/geoparsing

That said, I can log in directly to datahub.berkeley.edu. Also, the mybinder link to create a new interact link (http://mybinder.org/repo/choldgraf/connector-instructors/ntbk/url_to_interact.ipynb) is not working for me. It times out.

FYI, I have my own workflow for creating a new notebook which is described here:

Please let me how to format my interact links. Thanks,
-Patty

403 errors when connector instructors push to their repositories

Some instructors are running into errors when they try to push to their repositories. After they add their username and password, it's returning a 403 permission denied error. I know this has happened for at least:
username | connector name | notes
@mahoneymw | stat89A
@annalauren | ethics-connector | (who also needs to be added to this repo)

Will append new names as they pop up. This will probably be the case for anyone who created a github username within the last few weeks unless they've already been added.

The instructions say to contact you, @SamLau95...can you update their permissions?

Creating domain-general tutorial notebooks

These could be used by various connectors who want a quick tutorial on things like debugging, dealing with packages, reading error messages, etc.

Instructions for adding and testing new python modules needed

If I submit a new notebook for my connector and it has one or more modules that are not used by the foundation course I need to know if/when these get installed so that I can test my notebook on the remote hub. I cannot use the jupyter notebook hub environment if I cannot get these modules installed a week before I distribute an assignment so that I can test it. What is the process for ensuring this? Should I upload the notebook and then add a new issue to install any needed modules? If you could add some info on this to the documentation that would be great.

Add a short guide to docker

From Patty's suggestion:

It would be great if the docker instructions could indicate how to

How to install docker (see https://github.com/data-8/data8-notebook)
update docker (it asks me every day to update, should I?)
update the data8 files in docker (how to keep in sync with any changes to the repo?)
stop and restart or just run in the background docker.
who to contact with questions/problems and how - email? slack? github issue?

Permissions errors in repos

Michael Mahoney (not currently a user in this repo so I can't link him) ran into problems after he tried some renaming and moving of files in order to pull fresh copies (see #11). Now he's getting permissions errors and is unable to delete some files in his repository.

@SamLau95 any ideas?

Make data8.org/connector-instructors redirect to the wiki

Hey all - since the wiki is the main source of information and landing page for the connectors, can we make:

http://data8.org/connector-instructors/

redirect to:

https://github.com/data-8/connector-instructors/wiki

This will also allow instructors to get the repository by using an interact link.

Links to locations of demo notebooks

Could we add to the instructions for connector instructors the links to example notebooks that we can look to for reference? I know they are in github somewhere and I apologize for not remembering. If there are several good directories to look in then several links please!

Removing website materials from connector-instructors master branch

I see there is also a gh-pages branch in the conncetor-instructors repo. I assume that's hosting the website content. @anthonysuen is it OK if I delete all the CSS, index.html etc pages from the master branch?

Please install pysal library

Hi @ryanlovett,
Pysal (python spatial analysis library) is now available for python 3. Could you please install it on data8/ds8 jupyter servers? We likely won't use it for a class exercise but I would like to demo it. If you can't get to it that is fine, I can plan for a future semester. Thanks! Patty

`error: Request Entity Too Large`

Some of our students were getting this error when their notebook was trying to auto-save:

error: Request Entity Too Large

Is this because the notebooks themselves are too large? There doesn't seem to be anything gigantic in terms of plots etc. Any ideas?

@ryanlovett

Creating an instructor FAQ

This is a place to start compiling a list of Frequently Asked Questions for current and future instructors. Feel free to submit ideas and I'll update the main list in this first post.

Getting info about what data8 will cover week to week

A number of connector instructors have noted that they need more information about the main course / the materials covered in order to design their own connector materials. I asked the GSI team about this, and here is their response:

I can talk about what was covered last semester, but for the material that's changing, I don't know what's going to happen. If it's like last semester, those decisions will be made by John close to real time. I think Function Optimization is the only wholly new topic, but the course is being reordered to put classification / regression earlier and testing / confidence intervals later, so some of the coverage may end up being different for that reason.

That said, I think the best bet is to look at the outline from the syllabus and then find the corresponding sections in the textbook (http://www.inferentialthinking.com) to see how the material is covered. The textbook is basically the lecture notes from last semester, so it's better than anything I could write up.

Then I can answer specific questions if something isn't clear. In particular, it may not always be clear from the textbook what things students will be asked to code up, and I can answer those kinds of questions.

potentially of interest for @AndrejGitHub and @pattyf, and maybe @cboettig

Enable recursive folder removal

One use case is that I might want to remove rather than rename it and then re-download the lab folder. I.e., a different use case people might want is if they mess up a notebook is to remove it and then reclick on the link via the main class page to get a new notebook. But, right now, you from the interface you can't remove a non-empty folder. If I go two levels down into the directory hierarchy, remove files, then work my way back up, removing empty folder, I can remove the main folder. But that is pretty inconvenient. So, that may be okay, but it will make things difficult for some people.

(In Unix, this corresponds to "rmdir directory.name" rather than "rm -r directory.name", so it's probably an easy fix, but some people might get anxious about that much power. I personally would prefer that change, and I think that students in most of the technical connectors would. I don't know whether the less technical connectors would like the added insurance of not removing non-empty folders, so it might be worth asking them that.)

Thanks,

How do students submit (jupyter notebook) assignments?

What is the proposed workflow for students to submit Jupyter Notebooks? (It's not actually clear to me how to download a notebook from https://ds8.berkeley.edu and email it; so far I've relied on git to move work off https://ds8.berkeley.edu).

I think the ideal might for instructors to be able to log in to https://ds8.berkeley.edu as an instructor and thus see their working directory be populated with the subdirectories for each student in the course. (Currently it appears that instructor logins are functionally the same as a student login; we can see our own work but not that of our students).

Please install osgeo and geopy libraries for the geospatial connector

@ryan and team, please install the following libraries for the geospaital connector. I would like to distribute an exercise based on these libraries in February (2/22). Attaching a notebook with dataset to show how they will be used. Thanks!

The first cell in the notebook shows the libraries that I am using:
from geopy.distance import vincenty
from osgeo import ogr, osr, gdal
import json

dist_area_crs_libs.zip

data-8 / connector-instructors Goto Github PK

connector-instructors's Introduction

Overview

Interactive notebooks

Using Data8

Using Binder

connector-instructors's People

Contributors

Stargazers

Watchers

Forkers

connector-instructors's Issues

Home

FAQ

Hardware Resources

Software Resources

Technical Infrastructure

Workflow

Overall

General

Hardware Resources

Software

Tech infrastructure

Workflow

Recommend Projects

Recommend Topics

Recommend Org