computationalmodelling / nbval Goto Github PK

A py.test plugin to validate Jupyter notebooks

License: Other

Python 51.24% Makefile 0.26% Jupyter Notebook 48.50%

ipython-notebook jupyter-notebook python testing pytest-plugin pytest

nbval's Introduction

Py.test plugin for validating Jupyter notebooks

The plugin adds functionality to py.test to recognise and collect Jupyter notebooks. The intended purpose of the tests is to determine whether execution of the stored inputs match the stored outputs of the .ipynb file. Whilst also ensuring that the notebooks are running without errors.

The tests were designed to ensure that Jupyter notebooks (especially those for reference and documentation), are executing consistently.

Each cell is taken as a test, a cell that doesn't reproduce the expected output will fail.

See docs/source/index.ipynb for the full documentation.

Installation

Available on PyPi:

pip install nbval

or install the latest version from cloning the repository and running:

pip install .

from the main directory. To uninstall:

pip uninstall nbval

How it works

The extension looks through every cell that contains code in an IPython notebook and then the py.test system compares the outputs stored in the notebook with the outputs of the cells when they are executed. Thus, the notebook itself is used as a testing function. The output lines when executing the notebook can be sanitized passing an extra option and file, when calling the py.test command. This file is a usual configuration file for the ConfigParser library.

Regarding the execution, roughly, the script initiates an IPython Kernel with a shell and an iopub sockets. The shell is needed to execute the cells in the notebook (it sends requests to the Kernel) and the iopub provides an interface to get the messages from the outputs. The contents of the messages obtained from the Kernel are organised in dictionaries with different information, such as time stamps of executions, cell data types, cell types, the status of the Kernel, username, etc.

In general, the functionality of the IPython notebook system is quite complex, but a detailed explanation of the messages and how the system works, can be found here

https://jupyter-client.readthedocs.io/en/latest/messaging.html#messaging

Execution

To execute this plugin, you need to execute py.test with the nbval flag to differentiate the testing from the usual python files:

py.test --nbval

You can also specify --nbval-lax, which runs notebooks and checks for errors, but only compares the outputs of cells with a #NBVAL_CHECK_OUTPUT marker comment.

py.test --nbval-lax

The commands above will execute all the .ipynb files and 'pytest' tests in the current folder. Specify -p no:python if you would like to execute notebooks only. Alternatively, you can execute a specific notebook:

py.test --nbval my_notebook.ipynb

By default, each .ipynb file will be executed using the kernel specified in its metadata. You can override this behavior by passing either --nbval-kernel-name mykernel to run all the notebooks using mykernel, or --current-env to use a kernel in the same environment in which pytest itself was launched.

If the output lines are going to be sanitized, an extra flag, --nbval-sanitize-with together with the path to a confguration file with regex expressions, must be passed, i.e.

py.test --nbval my_notebook.ipynb --nbval-sanitize-with path/to/my_sanitize_file

where my_sanitize_file has the following structure.

[Section1]
regex: [a-z]*
replace: abcd

regex: [1-9]*
replace: 0000

[Section2]
regex: foo
replace: bar

The regex option contains the expression that is going to be matched in the outputs, and replace is the string that will replace the regex match. Currently, the section names do not have any meaning or influence in the testing system, it will take all the sections and replace the corresponding options.

Coverage

To use notebooks to generate coverage for imported code, use the pytest-cov plugin. nbval should automatically detect the relevant options and configure itself with it.

Parallel execution

nbval is compatible with the pytest-xdist plugin for parallel running of tests. It does however require the use of the --dist loadscope flag to ensure that all cells of one notebook are run on the same kernel.

Documentation

The narrative documentation for nbval can be found at https://nbval.readthedocs.io.

Help

The py.test system help can be obtained with py.test -h, which will show all the flags that can be passed to the command, such as the verbose -v option. Nbval's options can be found under the Jupyter Notebook validation section.

Acknowledgements

This plugin was inspired by Andrea Zonca's py.test plugin for collecting unit tests in the IPython notebooks (https://github.com/zonca/pytest-ipynb).

The original prototype was based on the template in https://gist.github.com/timo/2621679 and the code of a testing system for notebooks https://gist.github.com/minrk/2620735 which we integrated and mixed with the py.test system.

We acknowledge financial support from

OpenDreamKit Horizon 2020 European Research Infrastructures project (#676541), http://opendreamkit.org
EPSRC's Centre for Doctoral Training in Next Generation Computational Modelling, http://ngcm.soton.ac.uk (#EP/L015382/1) and EPSRC's Doctoral Training Centre in Complex System Simulation ((EP/G03690X/1),
The Gordon and Betty Moore Foundation through Grant GBMF #4856, by the Alfred P. Sloan Foundation and by the Helmsley Trust.

Authors

2014 - 2017 David Cortes-Ortuno, Oliver Laslett, T. Kluyver, Vidar Fauske, Maximilian Albert, MinRK, Ondrej Hovorka, Hans Fangohr

nbval's People

Stargazers

Watchers

Forkers

maxalbert takluyver fangohr den-run-ai willingc zonca vidartf minrk graingert mscuthbert jreback matthiasha mpacer carlosespondanieto continuumio emmayuu serge-sans-paille celikbet afcarl gasparka bertodrew ndouchin manics s-sugawara-chowagiken vanpelt devitocodes michaelaye jimmy-inl ceball zitaratech milliams cjauvin casperdcl kandersolar tlvu ouseful-pr amdens-sci saulshanabrook mattwthompson yut23 bnavigator bsipocz pgbarletta borda mbrukman uellue

nbval's Issues

Missing coverage dependency

nbval now depends from the coverage package, as required in nbval/cover.py

Should we allow non-execution of cells for nbval?

We have a keyword #NBVAL_IGNORE_OUTPUT (see issue #16) that stops nbval from comparing the computed output against the stored output for the input cell in which the string #NBVAL_IGNORE_OUTPUT is found. However, the input cell code is still executed. This is normally useful as later cells may depend on it.

Would it be useful to also support another feature, maybe using the keyword #NBVAL_IGNORE, which means that the input cell in which this is found will not be executed at all? This could be used in cases where we know that the execution causes side effects on the host that executes the command, which we may not want or need as part of the nbval testing.

certain notebooks don't work at all

when executing nbval on one of my notebooks, I get weird py.test --nbval reportings like the following:

<<<<<<<<<<<< Reference output from ipynb file:
stream
============ disagrees with newly computed (test) output:  
streamstreamstreamstreamstreamstreamstreamstreamstreamstreamstreamstreamstreamstreamstream
>>>>>>>>>>>>

<<<<<<<<<<<< Reference output from ipynb file:
stream
============ disagrees with newly computed (test) output:  
streamstreamstream
>>>>>>>>>>>>

<<<<<<<<<<<< Reference output from ipynb file:
stream
============ disagrees with newly computed (test) output:  
streamst...<snip base64, md5=1d90ad9225fdb1ef...>
>>>>>>>>>>>>

I was not able to make a minimal working example out of it yet, however maybe someone has seen this before and can help me.

[Interestingly, when downloading the example documentation.ipynb from nbval repository, everything works perfect]

Interaction with cell magics

Cell magic syntax requires that the cell magic is on the first line. This causes problems when that cell needs to be ignored by nbval, as nbval doesn't pick up the '#PYTEST_VALIDATE_IGNORE_OUTPUT' unless it's on the first line.

I'd suggest maybe the way to do this is to check whether the first line is a cell magic, and if so, check whether the following line in the cell is an nbval statement.

nbval error in docker (ubuntu 16.04 and python 3.5.2)

With Dockerfile:

FROM ubuntu:16.04

RUN apt-get update -y && \
    apt-get install -y git python3-pip curl && \
    python3 -m pip install --upgrade pip pytest-cov codecov \
      matplotlib tornado ipython ipykernel \
      git+git://github.com/computationalmodelling/nbval.git

WORKDIR /usr/local/

RUN git clone https://github.com/joommf/discretisedfield.git

When run:

docker build -t dockertestimage .
docker run -ti dockertestimage /bin/bash -c "cd discretisedfield; python3 -m pytest --nbval docs/ipynb/*.ipynb"

The error is:

============================= test session starts ==============================
platform linux -- Python 3.5.2, pytest-3.0.2, py-1.4.31, pluggy-0.3.1
rootdir: /usr/local/discretisedfield, inifile: 
plugins: nbval-0.3, cov-2.3.1
collected 77 items / 1 errors 

==================================== ERRORS ====================================
_____________ ERROR collecting docs/ipynb/creating_fd_field.ipynb ______________
../lib/python3.5/dist-packages/nbval/plugin.py:194: in collect
    self.nb = reads(f.read(), 4)
/usr/lib/python3.5/encodings/ascii.py:26: in decode
    return codecs.ascii_decode(input, self.errors)[0]
E   UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 3202:
ordinal not in range(128)
!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!
=========================== 1 error in 0.59 seconds ============================

Any thoughts?

Thanks

Link to documentation in readme?

There is no current link to the documentation as a static website in the readme… nbviewer should be fine for this, I'll make a PR.

Should not assume that matplotlib is available

We currently pass --matplotlib=inline when starting the kernel, to force figures to appear inline. However, this means that if matplotlib is not installed in the environment where the kernel runs (which may not be the same as the one nbval is running in), the kernel will fail to start.

From comments on #6.

Should nbval be testing all cells?

I have some %timeit cells, as well as some code, markdown and pytest cells in a notebook and when running nbval, all the %timeit cells create test fails as the time taken to run the code is different between the nbval execution and the number in the notebook. Is this to be expected?

No TDD possible with nbval

So, I really like the simplicity of nbval, but was wondering:
Sometimes it's really nice to clear you mind about what you actually want from a new module by doing TDD, meaning writing a few tests for the non-existing API or module and then develop it, reducing the failing tests one by one.
Do you envision something like that to be possible with nbval?

Support pytest.skip()

This is the standard way to skip in pytest and i my opinion superior to # NBVAL_SKIP.
For example if i only want to skip the cell in CI i could write:

if  'TRAVIS' in os.environ:
    pytest.skip()

I could work on a PR if there is interest in pulling this in.

Release 0.5

We've made quite a few changes since the last release, and we should try to make a new release.

@vidartf @fangohr is there anything high priority which we should fix before releasing 0.5?

pytest --nbval comparing inline images from extension

Hi,
I'm trying to use pytest --nbval to test output from cairo-jupyter.

We use display(surface), which outputs png data into the notebook.

When I try and run inside pytest --nbval, it acts as though the extension is not loaded, although the cell before it has %reload_ext cairo_jupyter.

Any idea whats going on ?

extension (info here on how to install)
https://github.com/fomightez/cairo-jupyter

notebook - you can see the output here, if you view in raw mode you can see the cell output is a data url with a png
https://github.com/fomightez/cairo-jupyter/blob/master/demos/cairo_jupyter_extension.ipynb

output

$ py.test --nbval demos/cairo_jupyter_extension.ipynb 
===================================================================================== test session starts =====================================================================================
platform linux -- Python 3.6.5, pytest-3.7.1, py-1.5.4, pluggy-0.7.1
rootdir: /home/stu/projects/mine/cairo-jupyter, inifile:
plugins: nbval-0.9.1
collected 3 items                                                                                                                                                                             

demos/cairo_jupyter_extension.ipynb .F.                                                                                                                                                 [100%]

========================================================================================== FAILURES ===========================================================================================
_________________________________________________________________________ demos/cairo_jupyter_extension.ipynb::Cell 1 _________________________________________________________________________
Notebook cell execution failed
Cell 1: Cell outputs differ

Input:
%load_ext cairo_jupyter

# Using display with cairo surfaces
import cairocffi as cairo

surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, 300, 300)
    
cr = cairo.Context(surface)

for i in range(16):
    cr.save()
    cr.translate(150, 150)
    cr.set_source_rgba(0.7, 0.9, 0.0, 0.25)
    cr.rotate(i * 0.1)
    cr.rectangle(-50, -50, 100, 100)
    cr.stroke()
    cr.restore()

display(surface)

Traceback:
 mismatch 'text/plain'

 assert reference_output == test_output failed:

  '<cairocffi.s...7eff8065dc88>' == '<cairocffi.su...7f3ec81aaa90>'
  - <cairocffi.surfaces.ImageSurface at 0x7eff8065dc88>
  ?                                         -------  ^
  + <cairocffi.surfaces.ImageSurface at 0x7f3ec81aaa90>
  ?                                        ++   ^^^^^^


============================================================================= 1 failed, 2 passed in 2.13 seconds ==============================================================================

run loops for just one iteration?

Is there a way to have nbval-lax run slightly different code than Jupyter would? In the Jupyter notebook, I have an algorithm that runs for a long time -- too long for unit tests. I'd like to nbval-lax to run that loop for just 1 iteration. Is there a way to do that?

Failing on a cell that returns no output normally

I have a notebook in which a code cell is used to run a shell command that, if all goes well, does not return anything. If the shell command breaks, an error message is returned and displayed as code cell output.

This error does not seem to result in a failed test.

To test this, I created a notebook with 3 code cells, print('test') in each, ran the notebook, cleared the output of the second code cell and then ran a test. It passed whereas I would have expected it to fail in the sense of cell 2 generating an output where there was none originally.

Exploit nbdime to show diffs of outputs

The current workflow for nbval is that it a text-based diff of the computed and stored output is presented. Here is an example for failing cells 6 and 17:

bin:nbval fangohr$ py.test -v --nbval documentation.ipynb 
============================================ test session starts ============================================
platform darwin -- Python 3.5.2, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 -- /Users/fangohr/anaconda3/bin/python
cachedir: .cache
rootdir: /Users/fangohr/git/nbval, inifile: 
plugins: nbval-0.3.6, cov-2.3.1
collected 10 items 

documentation.ipynb::Cell 6 FAILED
documentation.ipynb::Cell 13 PASSED
documentation.ipynb::Cell 15 PASSED
documentation.ipynb::Cell 17 FAILED
documentation.ipynb::Cell 19 FAILED
documentation.ipynb::Cell 21 FAILED
documentation.ipynb::Cell 24 PASSED
documentation.ipynb::Cell 26 PASSED
documentation.ipynb::Cell 28 FAILED
documentation.ipynb::Cell 29 PASSED

================================================= FAILURES ==================================================
__________________________________________________ cell 6 ___________________________________________________
Notebook cell execution failed
Cell 6: Error with cell

Input:
%%writefile doc_sanitize.cfg
[regex1]
regex: \d{1,2}/\d{1,2}/\d{2,4}
replace: DATE-STAMP

[regex2]
regex: \d{2}:\d{2}:\d{2}
replace: TIME-STAMP

Traceback: mismatch 'text'
<<<<<<<<<<<< Reference output from ipynb file:
Writing doc_sanitize.cfg

============ disagrees with newly computed (test) output:  
Overwriting doc_sanitize.cfg

>>>>>>>>>>>>
__________________________________________________ cell 17 __________________________________________________
Notebook cell execution failed
Cell 17: Error with cell

Input:
print([np.random.rand() for i in range(4)])
print([np.random.rand() for i in range(4)])

Traceback: mismatch 'text'
<<<<<<<<<<<< Reference output from ipynb file:
[0.5484774726012661, 0.6435546033932157, 0.10826743499682889, 0.5748413548528436]
[0.7290940674500538, 0.6663117586823235, 0.7182293584340027, 0.22383996412490337]

============ disagrees with newly computed (test) output:  
[0.4660913775186526, 0.31178898136714595, 0.25151291353922156, 0.5269909473164134]
[0.4953052207494243, 0.04899424112512207, 0.14481404964135336, 0.5252754048311362]

>>>>>>>>>>>>

Would it be possible to use nbdime to display those diffs in a nicer way (in the browser)?

If so, then two possible user interfaces come to mind:

creation of a diff-document in browser which shows the changes in nbdime style for each failing test. (The changes here are the changes between the saved output cell, and the computed output data. The input data will be the same as it comes from the saved notebook for both outputs.)
the option to display the diff output for just a particular cell in the saved document, for example cell 6 in the example above, and to produce the nbdime diff output just for that cell. This might be useful to focus on a particular failing cell. This would require a new flag to py.test to indicated which cell we want to focus on. We would probably still have to execute all notebook cells up to that cell to make sure the commands in cell 6 can reasonably execute.

nbval and the %load mechanic

Internally we do a lot of courses with jupyter and we use the %load magic to seperate answers from the jupyter notebooks. We can keep some of the python files on our side while handing out the notebooks that contain exercises. This works very well for us.

Still, we also like to run tests against our notebooks. We like to know that none of the notebooks contain an error so we run pytest --nbval-lax as a CI step.

The problem is that nbval seems to ignore the %load. Would you be receptive for a pull request to fix this? Currently we've made a small command line tool asekuro that does this but I wouldn't mind committing it back to this project if you think it is a good idea.

@takluyver, any thoughts?

Rename #PYTEST_VALIDATE_IGNORE_OUTPUT

The keyword #PYTEST_VALIDATE_IGNORE_OUTPUT used to ignore comparison of computed and saved output originates from a time when the tool was called 'pytest-validate'. As the name now is 'nbval' (standing for NoteBook VALidation), it seems sensible to rename that keyword to #NBVAL_IGNORE_OUTPUT.

We should still allow #PYTEST_VALIDATE_IGNORE_OUTPUT to be used for backwards compatibility.

Must update documentation accordingly.

make tool available via pip

Maybe pytest-nbval is a good name?

super() is invalid in Python 2.7

I'm getting errors using nbval 0.4.0 with Python 2.7. Running

py.test --nbval documentation.ipynb

produces the following output:

============================= test session starts ==============================
platform darwin -- Python 2.7.10, pytest-3.0.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/Mike/nbval, inifile:
plugins: nbval-0.4.0
collected 11 items

documentation.ipynb EEEEEEEEEEEE

==================================== ERRORS ====================================
___________________________ ERROR at setup of cell 6 ___________________________

self = <CallInfo when='setup' exception: super() takes at least 1 argument (0 given)>
func = <function <lambda> at 0x10f261cf8>, when = 'setup'

    def __init__(self, func, when):
        #: context of invocation: one of "setup", "call",
        #: "teardown", "memocollect"
        self.when = when
        self.start = time()
        try:
>           self.result = func()

../Library/Python/2.7/lib/python/site-packages/_pytest/runner.py:163:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   return CallInfo(lambda: ihook(item=item, **kwds), when=when)

../Library/Python/2.7/lib/python/site-packages/_pytest/runner.py:151:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_HookCaller 'pytest_runtest_setup'>
kwargs = {'__multicall__': <_MultiCall 0 results, 1 meths, kwargs={'item': <IPyNbCell 'Cell 6'>, '__multicall__': <_MultiCall 0 results, 1 meths, kwargs={...}>}>, 'item': <IPyNbCell 'Cell 6'>}

    def __call__(self, **kwargs):
        assert not self.is_historic()
>       return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)

../Library/Python/2.7/lib/python/site-packages/_pytest/vendored_packages/pluggy.py:724:

# ... [snip] ....

/Library/Python/2.7/site-packages/jupyter_client/manager.py:230:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <jupyter_client.manager.KernelManager object at 0x10f282ad0>
extra_arguments = []

    def format_kernel_cmd(self, extra_arguments=None):
        """replace templated args (e.g. {connection_file})"""
        extra_arguments = extra_arguments or []
        if self.kernel_cmd:
            cmd = self.kernel_cmd + extra_arguments
        else:
>           cmd = self.kernel_spec.argv + extra_arguments

/Library/Python/2.7/site-packages/jupyter_client/manager.py:170:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <jupyter_client.manager.KernelManager object at 0x10f282ad0>

    @property
    def kernel_spec(self):
        if self._kernel_spec is None:
>           self._kernel_spec = self.kernel_spec_manager.get_kernel_spec(self.kernel_name)

/Library/Python/2.7/site-packages/jupyter_client/manager.py:82:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <nbval.kernel.NbvalKernelspecManager object at 0x10f282b50>
kernel_name = 'python3'

>   ???
E   TypeError: super() takes at least 1 argument (0 given)

build/bdist.macosx-10.10-intel/egg/nbval/kernel.py:33: TypeError
__________________________ ERROR at setup of cell 13 __________________________
# ... etc.

I don't get this with v0.3.6. With that version if fails in the places it is supposed to fail (as documented, that is).

add `ipykernel` to requirements

It seems we need ipykernel as one of the requirements for nbval.

However, our tests don't show that so far. I suggest we create a docker container as part of the testing process, and try to run py.test --version or a quick test on something inside the container to check the requirements.

This is based on finding that the requirement was missing for tests using --nbval in joommf/oommfc (joommf/oommfc@613c477)

Use pytest assertion rewriting

Is there a way to make use of pytest's pretty assertion rewriting with nbval?

`--nbval-lax` cannot find notebook to test

While working on the deliverable report, I have just found this behaviour:

(nbval) iota:examples fangohr$ py.test -v --nbval demo-with-fail.ipynb 
======================================= test session starts ========================================
platform darwin -- Python 3.6.0, pytest-3.0.6, py-1.4.32, pluggy-0.4.0 -- /Users/fangohr/anaconda/envs/nbval/bin/python
cachedir: .cache
rootdir: /Users/fangohr/git/OpenDreamKit-D48/WP4/D4.8/examples, inifile: 
plugins: nbval-0.4.1
collected 5 items 

demo-with-fail.ipynb::Cell 1 PASSED
demo-with-fail.ipynb::Cell 2 PASSED
demo-with-fail.ipynb::Cell 3 PASSED
demo-with-fail.ipynb::Cell 5 PASSED
demo-with-fail.ipynb::Cell 7 FAILED

============================================= FAILURES =============================================
______________________________________________ cell 7 ______________________________________________
Notebook cell execution failed
Cell 7: Cell outputs differ

Input:
datetime.datetime.now()

Traceback: mismatch 'text/plain'
<<<<<<<<<<<< Reference output from ipynb file:
datetime.datetime(2017, 2, 13, 10, 16, 15, 240125)
============ disagrees with newly computed (test) output:  
datetime.datetime(2017, 2, 13, 10, 43, 31, 39014)
>>>>>>>>>>>>
================================ 1 failed, 4 passed in 2.05 seconds ================================

The above is okay and works as desired. Interesting is the following for a very minor change in calling the command:

(nbval) iota:examples fangohr$ py.test -v --nbval-lax demo-with-fail.ipynb 
======================================= test session starts ========================================
platform darwin -- Python 3.6.0, pytest-3.0.6, py-1.4.32, pluggy-0.4.0 -- /Users/fangohr/anaconda/envs/nbval/bin/python
cachedir: .cache
rootdir: /Users/fangohr/git/OpenDreamKit-D48/WP4/D4.8/examples, inifile: 
plugins: nbval-0.4.1
collecting 0 items
=================================== no tests ran in 0.00 seconds ===================================
ERROR: not found: /Users/fangohr/git/OpenDreamKit-D48/WP4/D4.8/examples/demo-with-fail.ipynb
(no name '/Users/fangohr/git/OpenDreamKit-D48/WP4/D4.8/examples/demo-with-fail.ipynb' in any of [])

Looks like a bug?

Option to ignore logs (output to stderr)

Hi,
it would be great if there was an option to ignore output to stderr altogether when running tests. Have you considered implementing this?
Thanks
Mark

Unable to sanitize output_type

I'm getting a discrepancy in my output_types with nbval 0.5 that doesn't show up in nbval 0.4:

______________________________________________ cell 4 ______________________________________________
Notebook cell execution failed
Cell 4: Cell outputs differ

Input:
treecorr.corr2(config)

Traceback: mismatch 'output_type'
<<<<<<<<<<<< Reference output from ipynb file:
stream
============ disagrees with newly computed (test) output:  
streamst...<snip base64, md5=1d90ad9225fdb1ef...>
>>>>>>>>>>>>

The output is from a logger object with logging.StreamHandler(stream=sys.stdout) for its handler. My Jupiter installation (4.1.0) and nbval 0.4.0 both give the output_type as simply stream. But nbval 0.5 has the above weird streamst thing.

Anyway, I thought I'd be able to ignore this using my sanitize file. I added

[output_type]
regex: stream.*
replace: stream

but it didn't work. It turns out that you don't run sanitize on the output_type field.

So what is my workaround to deal with this? Thanks for any help you can provide.

Improve documentation

We need an overview of what NBVAL can do, including a screen shoot of typical test output to explain quickly why people may (or may not) want to read further about the tool. This might be an opportunity to move documentation to read the docs, to generally be more in line with what is expected from potential customers.

Option to select only notebook tests

The pytest --nbval picks all the tests it finds. It would be useful to have an option to select only the notebooks. I found some ways to do this via .cfg file, but command line option would be better.
Ideas?

Disable colors for junit reports

We use pytest and nbval to generate junit-xml files. Unfortunately there is no option to disable the coloring of output messages. Here is an example junit xml file:

<testsuite errors="0" failures="1" name="pytest" skips="0" tests="3" time="3.044">
<testcase classname="notebooks.tests.test_error.ipynb" file="notebooks/tests/test_error.ipynb" line="0" name="Cell 0" time="1.4594926834106445">
<failure message="#x1B[91mNotebook cell execution failed#x1B[0m #x1B[94mCell 0: Cell execution caused an exception Input: #x1B[0mfoo = "Hase" assert foo == "Igel" #x1B[94mTraceback:#x1B[0m #x1B[0;31m---------------------------------------------------------------------------#x1B[0m #x1B[0;31mAssertionError#x1B[0m Traceback (most recent call last) #x1B[0;32m<ipython-input-1-24658342da6f>#x1B[0m in #x1B[0;36m<module>#x1B[0;34m()#x1B[0m #x1B[1;32m 1#x1B[0m #x1B[0mfoo#x1B[0m #x1B[0;34m=#x1B[0m #x1B[0;34m"Hase"#x1B[0m#x1B[0;34m#x1B[0m#x1B[0m #x1B[1;32m 2#x1B[0m #x1B[0;34m#x1B[0m#x1B[0m #x1B[0;32m----> 3#x1B[0;31m #x1B[0;32massert#x1B[0m #x1B[0mfoo#x1B[0m #x1B[0;34m==#x1B[0m #x1B[0;34m"Igel"#x1B[0m#x1B[0;34m#x1B[0m#x1B[0m #x1B[0m #x1B[0;31mAssertionError#x1B[0m: ">
#x1B[91mNotebook cell execution failed#x1B[0m #x1B[94mCell 0: Cell execution caused an exception Input: #x1B[0mfoo = "Hase" assert foo == "Igel" #x1B[94mTraceback:#x1B[0m #x1B[0;31m---------------------------------------------------------------------------#x1B[0m #x1B[0;31mAssertionError#x1B[0m Traceback (most recent call last) #x1B[0;32m<ipython-input-1-24658342da6f>#x1B[0m in #x1B[0;36m<module>#x1B[0;34m()#x1B[0m #x1B[1;32m 1#x1B[0m #x1B[0mfoo#x1B[0m #x1B[0;34m=#x1B[0m #x1B[0;34m"Hase"#x1B[0m#x1B[0;34m#x1B[0m#x1B[0m #x1B[1;32m 2#x1B[0m #x1B[0;34m#x1B[0m#x1B[0m #x1B[0;32m----> 3#x1B[0;31m #x1B[0;32massert#x1B[0m #x1B[0mfoo#x1B[0m #x1B[0;34m==#x1B[0m #x1B[0;34m"Igel"#x1B[0m#x1B[0;34m#x1B[0m#x1B[0m #x1B[0m #x1B[0;31mAssertionError#x1B[0m:
</failure>
</testcase>
<testcase classname="notebooks.tests.test_error.ipynb" file="notebooks/tests/test_error.ipynb" line="0" name="Cell 1" time="1.2278406620025635"/>
<testcase classname="notebooks.tests.test_error.ipynb" file="notebooks/tests/test_error.ipynb" line="0" name="Cell 2" time="0.34316015243530273"/>
</testsuite>

Pattern matching for images

Is there a way to pattern match the actual HTML tags for images? HoloViews returns a binary blob, and I've not been able to get the pattern to match with this:

[holoviews]
regex: <img*>
replace: HOLOVIEWS

Show filename of notebook with failing cell

Example files in https://github.com/computationalmodelling/nbval/tree/master/issues/67

Issues is that

When running

py.test --nbval notebook1.ipynb notebook2.ipynb

the output is::

======================================= test session starts ===============
platform darwin -- Python 3.6.0, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /Users/fangohr/git/nbval, inifile:
plugins: nbval-0.6
collected 2 items

notebook1.ipynb .
notebook2.ipynb F

============================================ FAILURES =====================
_____________________________________________ cell 0 ______________________
Notebook cell execution failed
Cell 0: Cell outputs differ

Input:
import time

time.time()

Traceback: mismatch 'text/plain'
<<<<<<<<<<<< Reference output from ipynb file:
1498639847.528011
============ disagrees with newly computed (test) output:
1498640219.550588
>>>>>>>>>>>>
=============================== 1 failed, 1 passed in 2.07 seconds ========

This is not ideal as we know there is a fail in Cell 0, but we don't
know in which file. (Here it is notebook1.ipynb).

Suggest that the name of the file in which the reported failures take
place is displayed somehow.

Use cell metadata rather than inline comments

Ignore output is done by adding an inline comment to the cell. Could this be done in a cleaner way using cell metadata.

Funding acknowledgement in Readme?

Should we include the EU funding?

How to determine if a Notebook has cell errors

I am writing a New Relic Synthetic (Javascript) which opens and runs a Notebook. I block until Jupyter.notebook.kernel_busy is false. Now I need to know if the notebook completed successfully with NO errors in any cells. Is there such a boolean or status or event or ??? I can monitor to determine success?
Basically I need a programmatic way to know if a Notebook has completed with errors or success.

conda-forge package

Conda forge package.

I've submitted nbval as a conda package to conda-forge

conda-forge/staged-recipes#2750

If you want to help maintain the package please let me now

Smarter picking of kernel to start

Currently, nbval always starts the default Python kernel, though this is not necessarily running in the same Python environment as nbval itself (see discussion on #6).

In the default case, we should probably pick the kernel to start based on notebook metadata, like nbconvert does for --execute. This would allow validating notebooks in other languages.
We may also want an option which forces it to start a Python kernel in the same Python environment as nbconvert itself, so we know that test dependencies affect what's available in the kernel. Thoughts on what this should be called?

Compare LaTeX output

Arising from the Jupyter Workshop in Edinburgh (http://opendreamkit.org/meetings/2017-01-16-ICMS/programme/): A useful feature for computational mathematics (and presumably symbolic operations) would be the comparison of "latex" output as part of NBVAL's output comparison. (With Nicolas Thiery.)

inhibit comparison of matplotlib plots output as pdf

Hey guys, nice package. However, I have matplotlib outputting pdf format, so am getting a lot of:

Traceback: mismatch 'application/pdf'
<<<<<<<<<<<< Reference output from ipynb file:
JVBERi0x...<snip base64, md5=69c76c0e76f1e696...>
============ disagrees with newly computed (test) output:
JVBERi0x...<snip base64, md5=4f47c0c9cd063f84...>
>>>>>>>>>>>>

I guess the simplest fix would be to have the option to add "application/pdf" to self.skip_compare

Error in KernelSpecManager when using Travis CI

The following error is occurring when using Nbval with Travis CI.

self = <jupyter_client.kernelspec.KernelSpecManager object at 0x7fa7fc2c1c88>
kernel_name = 'python3'
    def get_kernel_spec(self, kernel_name):
        """Returns a :class:`KernelSpec` instance for the given kernel_name.
    
            Raises :exc:`NoSuchKernel` if the given kernel name is not found.
            """
        d = self.find_kernel_specs()
        try:
>           resource_dir = d[kernel_name.lower()]
E           KeyError: 'python3'
../../../miniconda/envs/test-environment/lib/python3.5/site-packages/jupyter_client/kernelspec.py:173: KeyError
During handling of the above exception, another exception occurred:
self = <CallInfo when='setup' exception: No such kernel named python3>
func = <function call_runtest_hook.<locals>.<lambda> at 0x7fa8025b4378>
when = 'setup'
    def __init__(self, func, when):
        #: context of invocation: one of "setup", "call",
        #: "teardown", "memocollect"
        self.when = when
        self.start = time()
        try:
>           self.result = func()
../../../miniconda/envs/test-environment/lib/python3.5/site-packages/_pytest/runner.py:163:

See https://travis-ci.org/wd15/nbval-bug/builds/174248175

The .travis.yml file is https://github.com/wd15/nbval-bug/blob/master/.travis.yml and the test, https://github.com/wd15/nbval-bug/blob/master/test.ipynb

The test seems to work fine on my local workstation outside of Travis CI.

Configure timeouts

The user should be given the option to specify how long to wait for cell execution timeout. The current timeout is set to ~half an hour (2000 s).

One option is to piggy-back on the timeout option added by the timeout plugin. If not, we should ensure that we choose an option name that won't conflict, like nbval-cell-timeout (or maybe a shorter version).

New feature: Debugging Jupyter cells

Please support debugging the code in Jupyter cells when they raise an Exception with:

pytest --nbval --pdb

Probably not easy to do, as I just heard.

Getting error for timestamps, matplotlib figures and pandas DataFrames

This a great plugin. I'm getting unexpected errors for the following types of cells:

timestamps
matplotlib figures
pandas dfs

Are there workarounds for all latter cell types such as an "ignore flag", and are there potential code integrations for any of the above types in future?

Many thanks.

Flaky test failures: 'stream' != 'streamstream'

I'm seeing unexplainable test failures for cells with print statements, for example

=================================== FAILURES ===================================
____________________________________ cell 8 ____________________________________
Notebook cell execution failed
Cell 8: Cell outputs differ

Input:
trace(system_test, 'unit_i', 10)

Traceback: mismatch 'output_type'
<<<<<<<<<<<< Reference output from ipynb file:
stream
============ disagrees with newly computed (test) output:
streamstream
>>>>>>>>>>>>

The cell in question runs a function that print()s a few lines then returns None

trace(system_test, 'unit_i', 10)

I have not been able to reproduce with a simple example, but you can see the failing notebook at
https://github.com/fritzo/pomagma/blob/47eecf8/src/examples/definable_systems.ipynb
where I've if 0 guarded the final cell to work around this issue.

Here's my system (running python 2.7)

$ pip freeze
altgraph==0.10.2
apipkg==1.4
appnope==0.1.0
appscript==1.0.1
argh==0.15.1
autoflake==0.6.6
autopep8==1.2.4
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.4.0.2
bdist-mpkg==0.5.0
BeautifulSoup==3.2.1
bleach==1.5.0
bonjour-py==0.3
boto==2.43.0
bottle==0.11.6
cdiff==0.9.6
certifi==2015.9.6.2
configparser==3.5.0
contextlib2==0.5.4
decorator==4.0.10
docformatter==0.8
entrypoints==0.2.2
enum34==1.1.6
execnet==1.4.1
flake8==3.2.1
funcsigs==1.0.2
functools32==3.2.3.post2
goftests==0.2.0
gprof2dot==2015.12.1
html5lib==0.9999999
hypothesis==3.6.1
ipykernel==4.5.2
ipython==5.1.0
ipython-genutils==0.1.0
ipywidgets==5.2.2
isodate==0.4.8
isort==4.2.5
Jinja2==2.8.1
jsonschema==2.5.1
jupyter==1.0.0
jupyter-client==4.4.0
jupyter-console==5.0.0
jupyter-core==4.2.1
jupyterthemes==0.13.9
lesscpy==0.12.0
logic==0.1.10
macholib==1.5.1
Markdown==2.2.0
MarkupSafe==0.23
matplotlib==1.3.1
mccabe==0.5.3
mistune==0.7.3
mock==2.0.0
modulegraph==0.10.4
multipledispatch==0.4.9
nbconvert==5.0.0
nbformat==4.2.0
nbval==0.5
nose==1.3.7
notebook==4.3.1
numpy==1.6.2
pandocfilters==1.4.1
parsable==0.2.2
pathlib2==2.1.0
pbr==1.10.0
pep8==1.7.0
pexpect==4.2.1
pickleshare==0.7.4
ply==3.9
-e [email protected]:fritzo/pomagma.git@47eecf89dda27a81f7e7900597c2ab5f0f8fbce3#egg=pomagma
prompt-toolkit==1.0.9
protobuf==2.6.1
psutil==4.4.2
ptyprocess==0.5.1
py==1.4.31
py2app==0.7.3
pycodestyle==2.2.0
pyflakes==1.3.0
pyformat==0.6
Pygments==1.5
pyobjc-core==2.5.1
pyobjc-framework-Accounts==2.5.1
pyobjc-framework-AddressBook==2.5.1
pyobjc-framework-AppleScriptKit==2.5.1
pyobjc-framework-AppleScriptObjC==2.5.1
pyobjc-framework-Automator==2.5.1
pyobjc-framework-CFNetwork==2.5.1
pyobjc-framework-Cocoa==2.5.1
pyobjc-framework-Collaboration==2.5.1
pyobjc-framework-CoreData==2.5.1
pyobjc-framework-CoreLocation==2.5.1
pyobjc-framework-CoreText==2.5.1
pyobjc-framework-DictionaryServices==2.5.1
pyobjc-framework-EventKit==2.5.1
pyobjc-framework-ExceptionHandling==2.5.1
pyobjc-framework-FSEvents==2.5.1
pyobjc-framework-InputMethodKit==2.5.1
pyobjc-framework-InstallerPlugins==2.5.1
pyobjc-framework-InstantMessage==2.5.1
pyobjc-framework-LatentSemanticMapping==2.5.1
pyobjc-framework-LaunchServices==2.5.1
pyobjc-framework-Message==2.5.1
pyobjc-framework-OpenDirectory==2.5.1
pyobjc-framework-PreferencePanes==2.5.1
pyobjc-framework-PubSub==2.5.1
pyobjc-framework-QTKit==2.5.1
pyobjc-framework-Quartz==2.5.1
pyobjc-framework-ScreenSaver==2.5.1
pyobjc-framework-ScriptingBridge==2.5.1
pyobjc-framework-SearchKit==2.5.1
pyobjc-framework-ServiceManagement==2.5.1
pyobjc-framework-Social==2.5.1
pyobjc-framework-SyncServices==2.5.1
pyobjc-framework-SystemConfiguration==2.5.1
pyobjc-framework-WebKit==2.5.1
pyOpenSSL==0.13.1
pyparsing==2.0.1
PySMT==0.6.1
pytest==3.0.3
pytest-profiling==1.1.1
pytest-timeout==1.0.0
pytest-xdist==1.15.0
python-dateutil==1.5
python-magic==0.4.6
pytz==2013.7
PyYAML==3.10
pyzmq==16.0.0
qtconsole==4.2.1
requests==2.12.4
RunSnakeRun==2.0.4
s3cmd==1.5.2
scipy==0.13.0b1
selenium==2.33.0
simplegeneric==0.8.1
simplejson==3.10.0
six==1.10.0
snakeviz==0.4.0
splinter==0.5.4
SquareMap==1.0.4
stevedore==1.18.0
terminado==0.6
testpath==0.3
toolz==0.8.0
tornado==4.2.1
traitlets==4.3.1
unification==0.2.2
unify==0.2
untokenize==0.1.1
vboxapi==1.0
veritable===0.9.8preBUILD-NUMBER
virtualenv==15.0.3
virtualenv-clone==0.2.6
virtualenvwrapper==4.7.2
wcwidth==0.1.7
widgetsnbextension==1.2.6
xattr==0.6.4
yapf==0.14.0
zope.interface==4.1.1

Record nbval test coverage in coverage

A description of the issue is provided in this jupyter notebook issues/7/issue.ipynb.

Check links (URLs) in notebook

A possible extension of the notebook testing: can we check URLs (in the markdown presumably) to see if those are accessible (i.e. URL linting)?

This could be activated by an additional switch (maybe --url-lint or so) and report failures if a URL is not accessible. (To run the test, Internet access will be required.)

This was proposed at the Jupyter Workshop January 2017 in Edinburgh (http://opendreamkit.org/meetings/2017-01-16-ICMS/programme/).

Kernel hooks to check output robustly

This is an ambitious idea, and I'm not yet entirely sure of the details.

Checking output data is often tricky: small changes can be meaningful, while large changes can easily be unimportant (e.g. in a PNG plot). It would be nice to have some hooks to communicate with kernels for more intelligent checks (e.g. using plotchecker on matplotlib plots).

How might this work? Could we define a special output mimetype which kernels would produce (e.g. application/x-nbval-output-data), and a way for the test framework to call a check function in the kernel like check_output('saved output data', 'computed output data')?

Count only code cells for cell numbers?

nbval currently names tests as e.g. 'Cell 3', counting all the cells of the notebook including markdown cells. Jupyter does not expose any visible numbering of cells, but if you start the kernel and 'run all', you will effectively get a numbering of only code cells. So nbval's 'Cell 3' could have an In [1]: prompt by it.

I think it would be better for the names in nbval to look like 'Code cell 1', so it's easy to match up to a notebook that has been executed straight-through.

@vidartf I assume the nbdime reporting doesn't rely on these names?

BUG: Problems when using tqdm

Dear nbval-Team,

unfortunately nbval is not working with tqdm. (this is the minimal working example for #55)

While a cell with output_type="execute_result" seem to always work, like e.g.

s = "foo"
s

the success of a cell with output_type="stream" depends on whether tqdm was imported or not. For instance

import tqdm
print("goo")

will throw the following error when tested with py.test --nbval path/to/file.ipynb

py.test --nbval .\test_nbval.ipynb
============================= test session starts =============================
platform win32 -- Python 3.6.0, pytest-3.0.5, py-1.4.32, pluggy-0.4.0
rootdir: D:\Projects\telefonica-callcenter-online, inifile:
plugins: nbval-0.5
collected 3 items 

test_nbval.ipynb ..F

================================== FAILURES ===================================
___________________________________ cell 2 ____________________________________
Notebook cell execution failed
Cell 2: Cell outputs differ

Input:
import tqdm
print("goo")

Traceback: mismatch 'output_type'
<<<<<<<<<<<< Reference output from ipynb file:
stream
============ disagrees with newly computed (test) output:  
streamstream
>>>>>>>>>>>>
===================== 1 failed, 2 passed in 2.13 seconds ======================

This behaviour does not depend at all on where tqdm is imported but only on whether tqdm gets imported. Without, everything runs through as expected.

I created a small gist with a respective example notebook, you can find it here https://gist.github.com/anonymous/15a17512fb72fee2c8a8abbe46316eff

Any help is highly appreciated, as I love this pytest extension and would really like to use it more.

Kernel should be interrupted on cell timeout

Currently, if a cell reaches timeout, it seems that it simply continues to the next cell, without interrupting the kernel. So e.g. if a cell enters an infinite loop, all following cells will try to execute, but the kernel will still be running the previous cell.