jbn / nbmerge Goto Github PK

View Code? Open in Web Editor NEW

83.0 5.0 6.0 74 KB

A tool to merge / concatenate Jupyter (IPython) notebooks

License: MIT License

Python 85.03% Shell 14.97%

jupyter-notebook jupyter nbconvert ipython ipython-notebook

nbmerge's Introduction

`nbmerge`: merge / concatenate Jupyter notebooks

Installation

pip install nbmerge

Usage

For the usage as originally specified by @fperez's gist,

nbmerge file_1.ipynb file_2.ipynb file_3.ipynb > merged.ipynb

Alternatively, nbmerge can cursively collect all files in the current directory and below, recursively. After collection, it sorts them lexicographically. You can use a regular expression as a file name predicate. All .ipynb_checkpoints are automatically ignored. And, you can use the -i option to ignore any notebook prefixed with an underscore (think pseudo-private in python).

For example, the following command collects all notebooks in your project that have the word intro in the file name and saves it to a merged file named _merged.ipynb,

nbmerge --recursive -i -p ".*intro.*" -o _merged.ipynb

Finally, you can also instruct the script to demarcate the boundary between each original file with the -b / -boundary [BOUNDARY] flag. The src_nb value in the metadata for the first cell in each original notebook will then contain the path of the original notebook, relative to the cwd at the point of script execution.

More details

Read the docs: here.

nbmerge's People

Contributors

Stargazers

Watchers

Forkers

maplet peithous pythoncharmers khanfarhan10 fermiq bet12h03

nbmerge's Issues

Boundary key source path annotation fails on Windows when using notebooks from different drive

    annotate_source_path(nb, base_dir, path, boundary_key)
  File "C:\Users\pete\AppData\Local\Continuum\anaconda3\lib\site-packages\nbmerge\__init__.py", line 37, in annotate_source_path
    cells[0].metadata[boundary_key] = os.path.relpath(nb_path, base_dir)
  File "C:\Users\pete\AppData\Local\Continuum\anaconda3\lib\ntpath.py", line 584, in relpath
    path_drive, start_drive))
ValueError: path is on mount 'c:', start on mount 'D:'

Add ignore for hidden directories

Add merge level metadata to indicate depth relative to base directory

Move all imports to the top

Just wondering if it would be nice to add a feature where it moves all imports to the top and deduplicates them?

Emit usage if no files were specified and the recursive flag was unset

FIX argparse bug

When I used a pattern I don't often use,

nbmerge f1.ipynb f2.ipynb > output.ipynb

the program failed. For some reason, the full path to the nbmerge script is being included in the array argparse aggregates for the files parameter. I quickly fixed it with a kludge (see: 03dcbec). But, I need to figure out why argparse is behaving this way, then fix it correctly.

Add notebook demarcation to cell metadata

Generally, you use multiple notebooks because there are semantic difference between them. The merger exists mostly for serialization and -- for me -- easy of latex document preparation (bibliographies!). Dropping the demarcation completely is an information loss. Preprocessors and templates should be able to use the original source information (e.g. for header rewriting, implicit titles, etc).

Rename filter-re command line argument to accept-re

nbformat.reader.NotJSONError

Tried to merge a couple jupyterlab notebooks, but got an error:

nbmerge a.ipynb b.ipynb > a.ipynb
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/nbformat/reader.py", line 14, in parse_json
    nb_dict = json.loads(s, **kwargs)
  File "/usr/local/Cellar/[email protected]/3.7.10_3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/[email protected]/3.7.10_3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/[email protected]/3.7.10_3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/nbmerge", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/nbmerge/__init__.py", line 194, in main
    plan['boundary_key'])
  File "/usr/local/lib/python3.7/site-packages/nbmerge/__init__.py", line 72, in merge_notebooks
    nb = read_notebook(fp, as_version=4)
  File "/usr/local/lib/python3.7/site-packages/nbformat/__init__.py", line 143, in read
    return reads(buf, as_version, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/nbformat/__init__.py", line 73, in reads
    nb = reader.reads(s, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/nbformat/reader.py", line 58, in reads
    nb_dict = parse_json(s, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/nbformat/reader.py", line 17, in parse_json
    raise NotJSONError(("Notebook does not appear to be JSON: %r" % s)[:77] + "...") from e
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: ''...

I can't vouch for these notebooks being valid JSON, but they are valid jupyter notebooks.

Add recursive updating of metadata dictionaries

Add recursive descent merging with private notebook filtering

That is, something like,

nbmerge --recursive --exclude-private > merged.ipynb

BookBook also uses <number>-<name>.ipynb semantics for sorting. Specifically, it's a glob over -.ipynb (latex.py:143), lexicographically sorted. My convention uses lexicographic sorting, also. However, instead of that glob, it's aggregates all file that ending with .ipynb which do not begin with _. This lets me name files for inclusion as ###_Title_of_Notebook.ipynb. I think this is convenient because:

My eyes don't see the underscores;
You can produce formal titles by name.replace('_', ' '), stripping (optional) number prefix;
There are no spaces to worry about when performing shell actions.

Fix code to allow for formats other than version 4

not save as utf8, UnicodeDecodeError('utf-8',

if there are some character beyond ASCII, it do not save as utf-8.
for example, Chinese in *.ipynb, it is saved as GBK actully. So cause UnicodeDecodeError('utf-8',