mwouts / jupytext Goto Github PK
View Code? Open in Web Editor NEWJupyter Notebooks as Markdown Documents, Julia, Python or R scripts
Home Page: https://jupytext.readthedocs.io
License: MIT License
Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
Home Page: https://jupytext.readthedocs.io
License: MIT License
See also here for a pointer to R notebooks as scripts documentation
The mixed R+python notebook Fourier with ioslides.Rmd from
is not represented correctly as a jupyter notebook.
Two issues encountered while working on #19 are
All previous steps works fine before saving the data in python format. Jupyter Notebook refuses to open the updated ipython output with the below message:
An unknown error occurred while loading this notebook. This version can load notebook formats v4 or earlier. See the server log for details.
Currently the criterion for end of python cell is 'two blank lines'.
For non pep8 files, this can break a python function into two cells.
We could use indentation to detect when a cell actually ends, and make sure that that f
in the example below is parsed as a single cell:
def f(a):
return a+1
In my jupyter config I set:
c.NotebookApp.contents_manager_class = 'nbrmd.RmdFileContentsManager'
c.ContentsManager.default_nbrmd_formats = ['.ipynb', '.Rmd']
I created a minimal .Rmd
file test.Rmd
# My RMD notebook
```{python}
print("Hello World!2")
```
Some more markdown text.
and open it in jupyter. When I re-save the notebook, the following files are created:
test.Rmd
test.ipynb
test.py
Given my configuration, I think the test.py
should not be there.
When I investigate test.Rmd
in a text editor, the following metadata has been added:
nbrmd_formats:
- .ipynb
- .Rmd
- .py
nbrmd_sourceonly_format: .Rmd
I am using the 0.3.0
release from pypi.
Cf. #40. Format would be similar to current R markdown module:
As pointed out at #40 , jupyter magics don't need to be escaped there.
R markdown and Jupyter notebooks are two formats for notebooks that target similar functionality. Unfortunately, in practice they don't overlap much, as
Imagine we could take the best from both worlds...
Objective here is to collect feedback from RStudio and Jupyter projects on whether they deem valuable to improve the compatibility between the two environments.
jupytext is installed and the paired notebooks is working great. (As I update the notebook, the associated .py file is also updated automatically. cool!). But running any of the commands jupytext notebook.ipynb --to md --output -
throws a NameError.
Here is the stacktrace:
Traceback (most recent call last): File "/home/binu.jasim/miniconda3/envs/pytorch/bin/jupytext", line 7, in <module> from jupytext.cli import jupytext File "/home/binu.jasim/miniconda3/envs/pytorch/lib/python3.6/site-packages/jupytext/__init__.py", line 26, in <module> from .contentsmanager import TextFileContentsManager File "/home/binu.jasim/miniconda3/envs/pytorch/lib/python3.6/site-packages/jupytext/contentsmanager.py", line 11, in <module> from notebook.services.contents.filemanager import FileContentsManager File "/home/binu.jasim/miniconda3/envs/pytorch/lib/python3.6/site-packages/notebook/services/contents/filemanager.py", line 22, in <module> from .manager import ContentsManager File "/home/binu.jasim/miniconda3/envs/pytorch/lib/python3.6/site-packages/notebook/services/contents/manager.py", line 39, in <module> class ContentsManager(LoggingConfigurable): File "/home/binu.jasim/miniconda3/envs/pytorch/lib/python3.6/site-packages/notebook/services/contents/manager.py", line 73, in ContentsManager untitled_notebook = Unicode(_("Untitled"), config=True, NameError: name '_' is not defined
conda list
shows:
# packages in environment at /home/binu.jasim/miniconda3/envs/pytorch: _nb_ext_conf 0.4.0 py36_1 alabaster 0.7.10 py36h306e16b_0 anaconda-client 1.6.14 py36_0 asn1crypto 0.24.0 py36_0 babel 2.5.3 py36_0 backcall 0.1.0 py36_0 bleach 2.1.3 py36_0 boto 2.49.0 <pip> boto3 1.9.0 <pip> botocore 1.12.0 <pip> bz2file 0.98 <pip> bzip2 1.0.6 h9a117a8_4 ca-certificates 2018.03.07 0 certifi 2018.4.16 py36_0 certifi 2018.8.24 <pip> cffi 1.11.5 py36h9745a5d_0 chardet 3.0.4 py36h0f667ec_1 clyent 1.2.2 py36h7e57e65_1 cmake 3.9.4 h142f0e9_0 colorama 0.3.9 <pip> cryptography 2.2.2 py36h14c3975_0 curl 7.59.0 h84994c4_0 cycler 0.10.0 py36h93f1223_0 cymem 1.31.2 <pip> cytoolz 0.9.0.1 <pip> dbus 1.13.2 h714fa37_1 decorator 4.3.0 py36_0 dill 0.2.8.2 <pip> docutils 0.14 py36hb0f60f5_0 en-core-web-sm 2.0.0 <pip> entrypoints 0.2.3 py36h1aec115_2 expat 2.2.5 he0dffb1_0 fontconfig 2.12.6 h49f89f6_0 fr-core-news-sm 2.0.0 <pip> freetype 2.8 hab7d2ae_1 gensim 3.5.0 <pip> gitdb2 2.0.4 <pip> GitPython 2.1.11 <pip> glib 2.56.1 h000015b_0 gmp 6.1.2 h6c8ec71_1 googletrans 2.3.0 <pip> gst-plugins-base 1.14.0 hbbd80ab_1 gstreamer 1.14.0 hb453b48_1 html5lib 1.0.1 py36h2f9c1c0_0 icu 58.2 h9c2bf20_1 idna 2.7 <pip> idna 2.6 py36h82fb2a8_1 imagesize 1.0.0 py36_0 intel-openmp 2018.0.0 8 ipykernel 4.8.2 py36_0 ipython 6.3.1 py36_0 ipython_genutils 0.2.0 py36hb52b0d5_0 ipywidgets 7.2.1 py36_0 jedi 0.12.0 py36_0 jinja2 2.10 py36ha16c418_0 jmespath 0.9.3 <pip> jpeg 9b h024ee3a_2 jsonschema 2.6.0 py36h006f8b5_0 jupyter-tensorboard 0.1.7 <pip> jupyter_client 5.2.3 py36_0 jupyter_core 4.4.0 py36h7c827e3_0 jupytext 0.6.4 <pip> kiwisolver 1.0.1 py36h764f252_0 libcurl 7.59.0 h1ad7b7a_0 libedit 3.1 heed3624_0 libffi 3.2.1 hd88cf55_4 libgcc-ng 7.2.0 hdf63c60_3 libgfortran-ng 7.2.0 hdf63c60_3 libpng 1.6.34 hb9fc6fc_0 libsodium 1.0.16 h1bed415_0 libssh2 1.8.0 h9cfc8f7_4 libstdcxx-ng 7.2.0 hdf63c60_3 libuv 1.20.0 h14c3975_0 libxcb 1.13 h1bed415_1 libxml2 2.9.8 hf84eae3_0 magma-cuda90 2.3.0 1 pytorch Markdown 2.6.11 <pip> markupsafe 1.0 py36hd9260cd_1 matplotlib 2.2.2 py36h0e671d2_1 mistune 0.8.3 py36h14c3975_1 mkl 2018.0.2 1 mkl-include 2018.0.2 1 mkl_fft 1.0.1 py36h3010b51_0 mkl_random 1.0.1 py36h629b387_0 mock 2.0.0 <pip> msgpack 0.5.6 <pip> msgpack-numpy 0.4.3.1 <pip> murmurhash 0.28.0 <pip> nb_anacondacloud 1.4.0 py36_0 nb_conda 2.2.1 py36h8118bb2_0 nb_conda_kernels 2.1.0 py36_0 nbconvert 5.3.1 py36hb41ffb7_0 nbdime 1.0.2 <pip> nbformat 4.4.0 py36h31c9010_0 nbpresent 3.0.2 py36h5f95a39_1 ncurses 6.0 h9df7e31_2 notebook 5.4.1 py36_0 numpy 1.14.2 py36hdbf6ddf_1 numpy 1.15.1 <pip> numpydoc 0.8.0 py36_0 openssl 1.0.2o h14c3975_1 packaging 17.1 py36_0 pandas 0.23.3 py36h04863e7_0 pandoc 1.19.2.1 hea2e7c5_1 pandocfilters 1.4.2 py36ha6701b7_1 parso 0.2.0 py36_0 pbr 4.2.0 <pip> pcre 8.42 h439df22_0 pexpect 4.5.0 py36_0 pickleshare 0.7.4 py36h63277f8_0 Pillow 5.1.0 <pip> pip 9.0.3 py36_0 plac 0.9.6 <pip> preshed 1.0.1 <pip> prompt_toolkit 1.0.15 py36h17d85b1_0 protobuf 3.6.0 <pip> ptyprocess 0.5.2 py36h69acd42_0 pycparser 2.18 py36hf9f622e_1 pygments 2.2.0 py36h0d3125c_0 pyopenssl 17.5.0 py36h20ba746_0 pyparsing 2.2.0 py36hee85983_1 pyqt 5.9.2 py36h751905a_0 pysocks 1.6.8 py36_0 python 3.6.5 hc3d631a_0 python-dateutil 2.7.3 <pip> python-dateutil 2.7.2 py36_0 pytz 2018.4 py36_0 PyYAML 3.13 <pip> pyyaml 3.12 py36hafb9ca4_1 pyzmq 17.0.0 py36h14c3975_0 qt 5.9.5 h7e424d6_0 readline 7.0 ha6073c6_4 regex 2017.4.5 <pip> requests 2.18.4 py36he2e5f8d_1 requests 2.19.1 <pip> rhash 1.3.5 hbf7ad62_1 s3transfer 0.1.13 <pip> scipy 1.1.0 <pip> sconce 0.0.2 <pip> send2trash 1.5.0 py36_0 sentencepiece 0.1.3 <pip> setuptools 39.0.1 py36_0 simplegeneric 0.8.1 py36_2 sip 4.19.8 py36hf484d3e_0 six 1.11.0 py36h372c433_1 smart-open 1.6.0 <pip> smmap2 2.0.4 <pip> snowballstemmer 1.2.1 py36h6febd40_0 spacy 2.0.12 <pip> sphinx 1.7.3 py36_0 sphinxcontrib 1.0 py36h6d0f590_1 sphinxcontrib-websupport 1.0.1 py36hb5cb234_1 sqlite 3.23.1 he433501_0 tensorboard 1.10.0 <pip> tensorboardX 1.2 <pip> terminado 0.8.1 py36_1 testfixtures 6.3.0 <pip> testpath 0.3.1 py36h8cadb63_0 thinc 6.10.3 <pip> tk 8.6.7 hc745277_3 toolz 0.9.0 <pip> torch 0.5.0a0+21e0fc8 <pip> torchfile 0.1.0 <pip> torchtext 0.3.0 <pip> tornado 5.0.2 py36_0 tqdm 4.23.0 py36_0 traitlets 4.3.2 py36h674d592_0 typing 3.6.4 py36_0 ujson 1.35 <pip> Unidecode 1.0.22 <pip> urllib3 1.23 <pip> urllib3 1.22 py36hbe7ace6_0 visdom 0.1.8.4 <pip> wcwidth 0.1.7 py36hdf4376a_0 webencodings 0.5.1 py36h800622e_1 websocket-client 0.48.0 <pip> Werkzeug 0.14.1 <pip> wheel 0.31.0 py36_0 widgetsnbextension 3.2.1 py36_0 wrapt 1.10.11 <pip> xz 5.2.3 h55aa19d_2 yaml 0.1.7 had09818_2 zeromq 4.2.5 h439df22_0 zlib 1.2.11 ha838bed_2
I really like the idea of having a synchronized copy of .Rmd
and .ipynb
. That's basically the jupyter-way of having a nb.html file and gives us the best of two worlds: a text-editable, version-controllable Rmd file and an easily-sharable, output-preserving ipynb file.
I would suggest to pull the output out of the corresponding .ipynb
file, even when opening .Rmd
. That solves the issue that one could accidentically overwrite changes made to the .Rmd
file in an external editor when opening the corresponding .ipynb
file because one would like to retrieve contents.
I suggest the following procedure:
Many editors recognize code cells in python scripts as blocks that start with # %%
, including
I found no specific marker for markdown () or raw cell yet. So let's be inventive and imagine a few specifications for that format: any cell starts with the above prefix. An optional cell type (among: code, markdown, raw) can be specified. An optional cell name can be stated (it cannot be any of code, markdown, raw). Then, if required, we add the cell metadata, in JSON format, like in the following:
# %% optional_cell_type optional_cell_name {"metadata_key1": "value1", "key2": "value2"}
import pandas as pd
pd.DataFrame({'A':[5]}).plot(kind='bar')
# %% markdown
# This is a markdown cell
# %%
# # This is a commented code cell
# import pandas
Hydrogen and VS code execute python code with Jupyter kernels, so there's no need to escape Jupyter magics here.
Attached minimal Jupyter notebook (strip ".txt"), given the following two commands, applied in series, results in the following error:
$ jupytext --to py error\ example2.ipynb
$ jupytext --test --update --to ipynb error\ example2.py
Traceback (most recent call last):
File "/home/user/testing/venv/bin/jupytext", line 11, in <module>
sys.exit(jupytext())
File "/home/user/testing/venv/lib/python3.5/site-packages/jupytext/cli.py", line 129, in jupytext
preserve_outputs=args.update)
File "/home/user/testing/venv/lib/python3.5/site-packages/jupytext/cli.py", line 38, in convert_notebook_files
test_round_trip_conversion(notebook, ext, preserve_outputs)
File "/home/user/testing/venv/lib/python3.5/site-packages/jupytext/compare.py", line 88, in test_round_trip_conversion
test_outputs=True)
File "/home/user/testing/venv/lib/python3.5/site-packages/jupytext/compare.py", line 65, in compare_notebooks
ref_cell = filtered_cell(ref_cell, preserve_outputs=test_outputs)
File "/home/user/testing/venv/lib/python3.5/site-packages/jupytext/compare.py", line 16, in filtered_cell
filtered['execution_count'] = cell['execution_count']
KeyError: 'execution_count'
Jupytext 0.6.5
With c.NotebookApp.notebook_dir
other than the default value, jupyter notebook
issues errors on the command line when saving the Rmd form of the notebook.
I added some suggestions in #50.
Besides
Python cells that contains two consecutive blank lines are converted into separated cells in jupyter/python/jupyter conversion. It would be more convenient to preserve the original cell structure.
Possibly one can use an explicit marker for determining the end of non-trivial python cells.
nbsrc EOL2.py -i
produces
R Markdown EOL2.py being converted to Jupyter notebook EOL2.ipynb
. This should be Python code EOL2.py…
.
I have a .ipynb notebook whose nbrmd-generated .py contains things like # + {"scrolled": true}
or # + {"endofcell": "-"}
. Are they necessary? I can imagine people being confused by these if they work on shared code, as they were not written by the user. Are they useful?
Add Rmd, R and py versions of a notebook with code, results and images, plus its ipynb rendering, so that people can open and test them in binder #19 .
Document a few manipulations:
.ipynb
notebook and reopen source notebook (inputs disappear)We're currently using # + {}
as start of cell marker. But possibly # +
(when not following a commented line) would be just enough. And also, Hydrogen seems to use # %%name of cell
. Can we design a common, simple and clear pattern?
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting jupytext
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/1d/de/ae5c3a0b1a07a581db9cfe3ee74466f53cad19182a8d58b4e819de6dd783/jupytext-0.6.3.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\ZZCE57~1\AppData\Local\Temp\pip-install-2gipwivl\jupytext\setup.py", line 6, in
long_description = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x94 in position 2178: illegal multibyte sequence
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\ZZCE57~1\AppData\Local\Temp\pip-install-2gipwivl\jupytext\
And I can't find that File...
I guess it has to be added to requirements.txt?
Currently, only specific raw cells are preserved by RTC:
Raw cells with no 'active' flags are currently converted to markdown. It would be more convenient to have them preserved in raw format.
The demo folder should only contain simple notebooks only, with clear names.
setup.py
only requires it for tests, but actually it's required for normal operations, resulting in:
$ jupytext mynoteboook.ipynb --to md --output -
Traceback (most recent call last):
File "/home/seb/.local/share/virtualenvs/jupyter/bin/jupytext", line 7, in <module>
from jupytext.cli import jupytext
File "/home/seb/.local/share/virtualenvs/jupyter/local/lib/python3.6/site-packages/jupytext/cli.py", line 9, in <module>
from .compare import test_round_trip_conversion
File "/home/seb/.local/share/virtualenvs/jupyter/local/lib/python3.6/site-packages/jupytext/compare.py", line 3, in <module>
from testfixtures import compare
ModuleNotFoundError: No module named 'testfixtures'
Some of my notebooks use
%load_ext autoreload
%autoreload 2
In the text representation of notebooks, the second line is not escaped, as autoreload
is not a standard Jupyter magic.
List of magics to be escaped should be configurable, and include per default the most standard magics, so that I don't have to add #escape
to the second line (current workaround).
I have a notebook cell which is only made of comments:
# …
# …
# …
When converting into .py and then back to .ipynb, the cell is transformed into pure Markdown, which is problematic (because the comments represent code that may have to be uncommented and run, sometimes).
In case this matters: the cell above contains Markdown, and the one below contains code.
Test round trip conversion on notebooks with empty code cells (1,2,3) either at the end of the notebook, and in the core of it
Consecutive markdown cells could be exported with two blank lines between them. Conversely, markdown text with paragraphs separated by two blank lines could go to two separate cells.
Cf. #69: the mirror/jupyter_again.py
representation of jupyter_again.ipynb
has an incorrect syntax - a line that starts with ?
.
We should escape the help calls just like we escape Jupyter magics (when cell language is Python or Julia, not for R).
Great tool! Would be great for me and future generations to document the steps needed to add a new language.
Consider Scheme. There is a link to a live kernel here:
Details:
.ss
;
(often times we use two or more)Anything else one needs?
I would find it useful to have documentation of the .py
/.R
script versions.
Emacs' org mode format (extension .org
) has a well documented syntax for code blocks:
#+NAME: <name>
#+BEGIN_SRC <language> <switches> <header arguments>
<body>
#+END_SRC
We could implement a notebook to/from org converter.
Further notes:
NAME
option seems optional.I wanted to use jupytext to have a paired md+ipynb notebook that I also use for presentation with the RISE plugin https://github.com/damianavila/RISE
Unfortunately, every time I reload the browser all the metadata information about whether a cell is a slide, fragment, or should be skipped, gets lost.
Reformating the python scripts representation of notebooks with Pycharm may change the cell structure of the notebook.
sphynx-gallery has a practice of representing notebooks (reStructuredText files) as Python scripts. See for instance the documentation. Can we provide a Jupyter notebook converter for that format?
The current produces too many small cells, and that breaks some plot. Cf.
matplotlib/matplotlib#12116
scikit-learn/scikit-learn#12075
The options for a r chunk: {r active="Rmd", include=FALSE}
are turned into {r active="Rmd"}
when opened/saved in Jupyter
Command line usage should be more similar to jupyter nbconvert
.
An additional argument --test-round-trip
for testing round trip conversion would be useful as well.
In the README we describe how to use Jupytext with Jupyter notebook.
For Jupyter Lab we should document
@grst , would you like to answer these questions? Thanks
The RmdFileContentsManager
is active in Jupyter Lab (and saves the edited notebook to the desired alternative extensions), but .Rmd
files still open as text.
Cf. also jupyterlab/jupyterlab#3896
jupytext
command can run a round trip test. That's very new, and should still be improved
Hi,
This seems to be a very useful tool! Thanks for developing it! How much effort would it be to get it working for Julia as well? I wouldn't mind trying to add support if it isn't a huge amount of work.
Cheers,
Durand
Python cells in R markdown files and plain python files cannot contain jupyter magic commands. These should be escaped, in a revertible way so that reconstructed jupyter notebook remains unchanged.
Possibly not a very frequent pattern... but that's a pattern found in nbrmd tests, like at test_read_simple_python.py
Hey @mwouts,
Was reminded of your tool thanks to your comment here: rstudio/rmarkdown#1020 (comment)
I was wondering if you have heard of https://mybinder.org/ and, if so, if you had considered using the service to render .Rmd
notebooks within Jupyter in an interactive way on Binder without converting to .ipynb
. You can link to specific .ipynb
within Binder and perhaps it can also work with .Rmd
notebooks using nbrmd
? I could give it a go sometime this weekend, but I would appreciate your thoughts in general.
Cheers,
A.
Cf. Hari's question on medium: is it safe to uninstall jupytext
?
Obviously uninstalling is safe. But reinstalling after a while may not be. As text representations have priority over ipynb file, the user that has worked on the ipynb version without jupytext will get the out of date version from the text representation, when he decides to reinstall.
To avoid such a situation (which can be solved by: closing the notebook without saving, and deleting the text file), we should check file timestamps, and refuse to load inputs from text when text seems to be out of date compared to the ipynb file.
Extract of jupyter notebook logs:
Pre-save hook failed on notebook.Rmd
Traceback (most recent call last):
(...)
File "nbrmd/nbrmd.py", line 381, in writef
nbformat.write(nb, fp)
File "nbformat/__init__.py", line 169, in write
fp.write(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd7' in position 14653: ordinal not in range(128)
Any notebook; the following commands:
$ jupytext --to py example.ipynb
$ diff example.py <(jupytext --to py -o- example.ipynb)
28a29
>
diff exits non-cleanly as -o-
adds an additional newline. I believe end=''
or sys.stdout.write()
* here might fix it.
Line 42 in 960a962
* Print function won't work without importing from __future__.
Text only notebook formats have a priority over ipynb notebooks. This may cause users to have their notebooks changed due to a file format update in nbrmd.
When both py and ipynb files are present, nbrmd should refuse to load py files that were not generated with the current version format, and instead, invite the reader to remove the py file (and get it regenerated in the new version, from the ipynb file).
When I create a new blank notebook,
the following default metadata are added to the notebook
nbrmd_formats": [
".ipynb",
".Rmd",
".py"
],
"nbrmd_sourceonly_format": ".Rmd",
There are two issues with that
c.NotebookApp.contents_manager_class = 'nbrmd.RmdFileContentsManager'
c.ContentsManager.default_nbrmd_formats = ['.ipynb', '.Rmd']
.ipynb
file, and the other only .Rmd
.Implementation should allow round trip conversion of cells that are not active in scripts. For instance cells with jupyter magic (another workaround for that case being #29 ).
Jupyter magic commands (like %%time
) currently appear commented out in Markdown. The problem is that the rendered version (say in HTML in a web browser) looks different from the notebook, which is surprising.
I can understand that it is useful to comment these magic commands in the Python (.py) version, but is it really necessary to comment them out in the Markdown output? Can't the Markdown output be less surprising?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.