Code Monkey home page Code Monkey logo

first-python-notebook's Introduction

first-python-notebook's People

Contributors

asuozzo avatar davidbradway avatar dependabot[bot] avatar gabriellelamarrlemee avatar gordonje avatar hs4man21 avatar kat-alo avatar meli-lewis avatar morrisluke avatar mrshu avatar palewire avatar petrinkae avatar rabdill avatar ryanpitts avatar zstumgoren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

first-python-notebook's Issues

The keyword arguments note needs to move

We are dropping it in here following the use of rename. The problem: Our rename call no longer users a keyword argument. So we need to bump this elsewhere. I think it should come after our first use of a kwarg, which may not be the merge method in the next chapter.

403 Forbidden error when opening CSV

pd.read_csv("http://www.firstpythonnotebook.org/_static/committees.csv")
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-4-3ea4d8833327> in <module>()
----> 1 pd.read_csv("http://www.firstpythonnotebook.org/_static/committees.csv")

~/.local/share/virtualenvs/first-python-notebook-DfG0-Xvh/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

~/.local/share/virtualenvs/first-python-notebook-DfG0-Xvh/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    422     compression = _infer_compression(filepath_or_buffer, compression)
    423     filepath_or_buffer, _, compression, should_close = get_filepath_or_buffer(
--> 424         filepath_or_buffer, encoding, compression)
    425     kwds['compression'] = compression
    426 

~/.local/share/virtualenvs/first-python-notebook-DfG0-Xvh/lib/python3.6/site-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode)
    193 
    194     if _is_url(filepath_or_buffer):
--> 195         req = _urlopen(filepath_or_buffer)
    196         content_encoding = req.headers.get('Content-Encoding', None)
    197         if content_encoding == 'gzip':

~/.pyenv/versions/3.6.4/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    221     else:
    222         opener = _opener
--> 223     return opener.open(url, data, timeout)
    224 
    225 def install_opener(opener):

~/.pyenv/versions/3.6.4/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
    530         for processor in self.process_response.get(protocol, []):
    531             meth = getattr(processor, meth_name)
--> 532             response = meth(req, response)
    533 
    534         return response

~/.pyenv/versions/3.6.4/lib/python3.6/urllib/request.py in http_response(self, request, response)
    640         if not (200 <= code < 300):
    641             response = self.parent.error(
--> 642                 'http', request, response, code, msg, hdrs)
    643 
    644         return response

~/.pyenv/versions/3.6.4/lib/python3.6/urllib/request.py in error(self, proto, *args)
    568         if http_err:
    569             args = (dict, 'default', 'http_error_default') + orig_args
--> 570             return self._call_chain(*args)
    571 
    572 # XXX probably also want an abstract factory that knows when it makes

~/.pyenv/versions/3.6.4/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    502         for handler in handlers:
    503             func = getattr(handler, meth_name)
--> 504             result = func(*args)
    505             if result is not None:
    506                 return result

~/.pyenv/versions/3.6.4/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    648 class HTTPDefaultErrorHandler(BaseHandler):
    649     def http_error_default(self, req, fp, code, msg, hdrs):
--> 650         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    651 
    652 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

Fix for committees read 403

  • Summary
  • Background
  • Original stacktrace
  • urlopen headers

Summary

This issue was previously discussed in #26

I'm seeing 403s when attempting to read committees.csv using pandas with Python 3.9 and pandas==1.2.2 (see Original stacktrace below).

Using a GitHub raw URL resolves the issue.

Background

This issue appears to stem from RTD issuing a 403 response to HTTP requests made with the default Python user agent.

pandas.read_csv uses urllib.request.urlopen under the hood to make the web request, by default setting the User-agent header to Python-urllib/3.9(see urlopen headers below).

## This fails

>>> import urllib.request
>>> url = "https://first-python-notebook.readthedocs.io/_static/committees.csv"
>>> urllib.request.urlopen(url)
Traceback (most recent call last):
<<< snipped >>>
HTTPError: Forbidden

Setting a realistic User-agent header fixes the issue:

>>> headers = {
...     'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:85.0) Gecko/20100101 Firefox/85.0'
... }
>>> req = urllib.request.Request(url=url, headers=headers)
>>> resp = urllib.request.urlopen(req)
>>> resp.read().decode('utf-8')[0:50]
'ocd_prop_id,calaccess_prop_id,ccdc_prop_id,prop_na'

Unfortunately, there doesn't appear to be a way to configure request headers via the pandas.read_csv interface (at least none jumped out at me from a quick review of function parameters).

Using an alternative URL such as the raw GH URL sidesteps the issue:

>>> gh_url = "https://raw.githubusercontent.com/california-civic-data-coalition/first-python-notebook/master/docs/_static/committees.csv"

# urlopen version works
>>> resp = urllib.request.urlopen(gh_url)
>>> resp.read().decode('utf-8')[0:50]
'ocd_prop_id,calaccess_prop_id,ccdc_prop_id,prop_na'

# pands.read_csv version works
>>> response = pd.read_csv(gh_url)
>>> response
                                          ocd_prop_id  ...  committee_position
0    ocd-contest/b51dc64d-3562-4913-a190-69f5088c22a6  ...             SUPPORT
1    ocd-contest/b51dc64d-3562-4913-a190-69f5088c22a6  ...             SUPPORT
2    ocd-contest/b51dc64d-3562-4913-a190-69f5088c22a6  ...             SUPPORT
3    ocd-contest/b51dc64d-3562-4913-a190-69f5088c22a6  ...              OPPOSE
4    ocd-contest/85990193-9d6f-4600-b8e7-bf1317841d82  ...             SUPPORT
..                                                ...  ...                 ...
97   ocd-contest/7495cdbe-1aa7-4c26-9a55-aa4130347b95  ...             SUPPORT
98   ocd-contest/7495cdbe-1aa7-4c26-9a55-aa4130347b95  ...             SUPPORT
99   ocd-contest/7495cdbe-1aa7-4c26-9a55-aa4130347b95  ...             SUPPORT
100  ocd-contest/7495cdbe-1aa7-4c26-9a55-aa4130347b95  ...             SUPPORT
101  ocd-contest/7495cdbe-1aa7-4c26-9a55-aa4130347b95  ...             SUPPORT

Original stacktrace

import pandas as pd
committee_list = pd.read_csv("https://first-python-notebook.readthedocs.io/_static/committees.csv")


---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-35-62ab1780e12d> in <module>
----> 1 committee_list = pd.read_csv("https://first-python-notebook.readthedocs.io/_static/committees.csv")

~/.local/share/virtualenvs/first-python-notebook-QxiypQOy/lib/python3.9/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    608     kwds.update(kwds_defaults)
    609 
--> 610     return _read(filepath_or_buffer, kwds)
    611 
    612 

~/.local/share/virtualenvs/first-python-notebook-QxiypQOy/lib/python3.9/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    460 
    461     # Create the parser.
--> 462     parser = TextFileReader(filepath_or_buffer, **kwds)
    463 
    464     if chunksize or iterator:

~/.local/share/virtualenvs/first-python-notebook-QxiypQOy/lib/python3.9/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    817             self.options["has_index_names"] = kwds["has_index_names"]
    818 
--> 819         self._engine = self._make_engine(self.engine)
    820 
    821     def close(self):

~/.local/share/virtualenvs/first-python-notebook-QxiypQOy/lib/python3.9/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1048             )
   1049         # error: Too many arguments for "ParserBase"
-> 1050         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   1051 
   1052     def _failover_to_python(self):

~/.local/share/virtualenvs/first-python-notebook-QxiypQOy/lib/python3.9/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1865 
   1866         # open handles
-> 1867         self._open_handles(src, kwds)
   1868         assert self.handles is not None
   1869         for key in ("storage_options", "encoding", "memory_map", "compression"):

~/.local/share/virtualenvs/first-python-notebook-QxiypQOy/lib/python3.9/site-packages/pandas/io/parsers.py in _open_handles(self, src, kwds)
   1360         Let the readers open IOHanldes after they are done with their potential raises.
   1361         """
-> 1362         self.handles = get_handle(
   1363             src,
   1364             "r",

~/.local/share/virtualenvs/first-python-notebook-QxiypQOy/lib/python3.9/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    556 
    557     # open URLs
--> 558     ioargs = _get_filepath_or_buffer(
    559         path_or_buf,
    560         encoding=encoding,

~/.local/share/virtualenvs/first-python-notebook-QxiypQOy/lib/python3.9/site-packages/pandas/io/common.py in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    287                 "storage_options passed with file object or non-fsspec file path"
    288             )
--> 289         req = urlopen(filepath_or_buffer)
    290         content_encoding = req.headers.get("Content-Encoding", None)
    291         if content_encoding == "gzip":

~/.local/share/virtualenvs/first-python-notebook-QxiypQOy/lib/python3.9/site-packages/pandas/io/common.py in urlopen(*args, **kwargs)
    193     import urllib.request
    194 
--> 195     return urllib.request.urlopen(*args, **kwargs)
    196 
    197 

/usr/local/Cellar/[email protected]/3.9.1_6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    212     else:
    213         opener = _opener
--> 214     return opener.open(url, data, timeout)
    215 
    216 def install_opener(opener):

/usr/local/Cellar/[email protected]/3.9.1_6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py in open(self, fullurl, data, timeout)
    521         for processor in self.process_response.get(protocol, []):
    522             meth = getattr(processor, meth_name)
--> 523             response = meth(req, response)
    524 
    525         return response

/usr/local/Cellar/[email protected]/3.9.1_6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py in http_response(self, request, response)
    630         # request was successfully received, understood, and accepted.
    631         if not (200 <= code < 300):
--> 632             response = self.parent.error(
    633                 'http', request, response, code, msg, hdrs)
    634 

/usr/local/Cellar/[email protected]/3.9.1_6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py in error(self, proto, *args)
    559         if http_err:
    560             args = (dict, 'default', 'http_error_default') + orig_args
--> 561             return self._call_chain(*args)
    562 
    563 # XXX probably also want an abstract factory that knows when it makes

/usr/local/Cellar/[email protected]/3.9.1_6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    492         for handler in handlers:
    493             func = getattr(handler, meth_name)
--> 494             result = func(*args)
    495             if result is not None:
    496                 return result

/usr/local/Cellar/[email protected]/3.9.1_6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    639 class HTTPDefaultErrorHandler(BaseHandler):
    640     def http_error_default(self, req, fp, code, msg, hdrs):
--> 641         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    642 
    643 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

urlopen headers

Below is a dump of the headers passed by the underlying urlopen call in urllib/request.py:

{'_full_url': 'https://first-python-notebook.readthedocs.io/_static/committees.csv', 'fragment': None, 'type': 'https', 'host': 'first-python-notebook.readthedocs.io', 'selector': '/_static/committees.csv', 'headers': {}, 'unredirected_hdrs': {'Host': 'first-python-notebook.readthedocs.io', 'User-agent': 'Python-urllib/3.9'}, '_data': None, '_tunnel_host': None, 'origin_req_host': 'first-python-notebook.readthedocs.io', 'unverifiable': False, 'timeout': <object object at 0x114e47a60>}

jupyter lab install

You may get this error:
Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/Library/Python/2.7/site-packages/jupyterlab_launcher-0.11.2.dist-info'
Consider using the --user option or check the permissions.

That’s because of the permissions you have on your computer.. (note this error will likely occur if you are on a mac.
jupyterlab/jupyterlab#3913

Try using this command: pip install --user jupyterlab
Q: Should we just suggest this for everyone?

Explicitly encourage and teach Python 3

https://pythonclock.org/ says Python 2.7 will retire in 1 Month, 10 Days, 4 Hours, 56 Minutes and 49 Seconds

More to the point: this is one of the best tutorials on getting a Python development environment set up I've ever seen, but I want people to get going with Python 3 so they can use my Datasette project!

I imagine updating it to Python 3 is not an insignificant amount of work, due to the need to re-record the installation videos. But I'm optimistically filing a bug report anyway!

Tip: Make sure your virtualenv pip isn't _really_ old

Hi!
Using OS X 10.11.6 El Capitan here. For whatever reason, my virtualenv had a really old version of pip (1.4.1) and when I tried

$ pip install jupyter

I ended up with this error:

 ... [snip] ... 

Downloading/unpacking ipython (from jupyter-console->jupyter)
  Downloading ipython-5.2.2.tar.gz (4.9MB): 4.9MB downloaded
  Running setup.py egg_info for package ipython
    error in ipython setup command: Invalid environment marker: sys_platform == "win32" and python_version < "3.6"
    Complete output from command python setup.py egg_info:
    error in ipython setup command: Invalid environment marker: sys_platform == "win32" and python_version < "3.6"

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /Users/ ... 

... [snip] ... 

So following the jupyter install advice, I upgraded my pip

$ pip install --upgrade pip

I went from 1.4.1 to 9.0.1 (!) and then pip install jupyter worked like a charm!
So if at NICAR you see an install bomb like the above, try upgrading the virtualenv pip.

latest jupyter didn't default to 'code' edit mode

When running through the tutorial on my own, in Hello Notebook section Write Python in the notebook, I had to switch the new notebook's editing mode to Code in a dropdown before math worked.

I went to make edits and rebuild them from source, but yolk was confused because both yolk and yolk3k were installed into a python3 pipenv.

I have fixes for both of these, but I'm not sure how you'd prefer to handle these, and I'm not a professional writer targeting new folks, so the tone of my PR is maybe not what you would prefer.

Add a data cleaning chapter

Describe how you would standardize the name columns
Show how you would group and sum on the new standardized column

Add disclaimer about Python3?

Hey Ben,
This is great stuff. I just ran through the tutorial to see how it would fare on Python 3, and happy to report that every single command worked! Wondering if it's worth posting a little heads up in the Python prerequisites section noting as much, so folks know they can use either 2.x or 3.x (note I tested on Python 3.6.1). Totally understand if you don't want to tie yourself to maintaining both versions, btw, so no stress if you want to keep the focus on 2.7.

Cheers!

Serdar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.