A much faster version of nbstripout with a slightly different set of features, and autotrust notebooks local git configuration
This is a rewrite of nbstripout. It's much faster because it doesn't load any of the very heavy nbformat
, and operates on the json file directly.
It's used in all fastai projects, and this repo was created to make it easy to re-use it in other projects, so all files are in one place.
This tool implements only a sub-set of nbstripout (most of it) and makes no attempt to neither be identical nor try to keep it in sync with nbstripout. It implements the parts we needed for fastai needs.
This repository's purpose is not to maintain a different implementation of nbstripout, but to make it easy to integrate the existing functionality into other fastai projects, since it involves quite a few files.
Therefore please submit no PRs or Issues unless you found a bug in the current implementation.
If you'd like to create and maintain a faster version of nbstripout with all the features it provides, please feel free to fork this implementation and build upon it.
-
The strip out tool itself is:
tools/fastai-nbstripout
-
The helper tool
tools/trust-origin-git-config
autogenerates.gitconfig
, which in turn configures the action of the git filters and tells git to trust this local config file. (git config --local include.path ../.gitconfig
). Here is the usage instructions:tools/trust-origin-git-config -h usage: trust-origin-git-config [-h] [-e] [-d] [-t] optional arguments: -h, --help show this help message and exit -e, --enable Trust repo-wide .gitconfig (default action) -d, --disable Distrust repo-wide .gitconfig -t, --test Validate repo-wide .gitconfig config
Therefore, to enable the setup, you run:
tools/trust-origin-git-config -e
and to disable:
tools/trust-origin-git-config -d
You will want to add
.gitconfig
to.gitignores
, since it's autogenerated (the included in this repo.gitignores
already does that).Now, all that remains is to configure directories with jupyter notebooks to be stripped out. This is done via the
.gitattributes
file placed in the desired directories.Currently,
fastai-nbstripout
supports two stripout configurations:This
.gitattributes
will strip out all the unnecessary bits and keep theoutput
s:*.ipynb filter=fastai-nbstripout-code *.ipynb diff=ipynb-code
You can see this setup and its effects under
with-outputs
directory.This
.gitattributes
will strip out all the unnecessary bits, including theoutput
s:*.ipynb filter=fastai-nbstripout-docs *.ipynb diff=ipynb-docs
You can see this setup and its effects under
without-outputs
directory.These settings apply recursively to all sub-dirs.
You will need to
git add
all these files to your git repo before activating the setup.Normally, modified outside of jupyter environment notebooks lose their "trusted" state, so you won't be able to run them automatically and will need to manually set them to be trusted. The following setup does it automatically for you on
git pull
.-
Configure which directories you want to be "auto-trusted" by editing
tools/trust-doc-nbs
. E.g. in this sample project you will find:trust_nbs('without-outputs') trust_nbs('with-outputs')
tools/trust-doc-nbs
is a hook that runs automatically ongit pull
, andtools/trust-doc-nb-install-hooks
is the tool that configures that hook. So you need to run it once aftergit clone
.Who's going to remember all these tools to be run after
git clone
. That's why there is a wrappertools/run-after-git-clone
that runs all the other tools that configure git to do the right thing. And it's easy to remember. Of course, you don't have to use it and you can run each tool separately.Every user, that needs to have these configured, has to run the configuration scripts manually once upon the first clone of the repository. Which sucks, because some users will forget and commit unstripped out notebooks.
Unfortunately, due to the way git security is set up, there is no other way to go about it. The only way to catch unstripped out notebook committed is to have server-side git hooks, but github doesn't allow those.
Change the instructions for your projects to include the local git setup. For example, for this project it'd become:
git clone https://github.com/fastai/fastai-nbstripout cd fastai-nbstripout tools/run-after-git-clone
The way I personally deal with me forgetting to run the local git setup is to always run
git diff
beforegit commit
, it takes a few seconds and I catch my forgetfulness this way. If the output containsexecution_count
that is notnull
:{ "cell_type": "code", - "execution_count": null, + "execution_count": 1,
That means I don't have it configured.
fastai-nbstripout's People
fastai-nbstripout's Issues
cell-wise processing
It would be nice to be able to enable/disable the processing on a cell-wise basis, like the other version of nbstripout. I presume this is already planned, but if not, just thought I would suggest it. Thanks for sharing!
Keep-Alive-Actions
Keep-Alive-Actions
Keep-Alive-Actions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.