Comments (8)
One little gotcha – if we use the commas to separate content selection statements (like CSS) then it would probably be best to avoid them in the name of a section (e.g. "#Dictionaries, Lists & Stuff"). There's probably a way to deal with this, but just thought I'd flag.
from geopyter.
Another thing we need keep in mind is that headers require a space between the octothorp(s) and the heading text.
from geopyter.
Realised that there's no reason we have to stick with CSS-style selector format: if we use a semi-colon instead of a comma as the delimiter between 'statements' then our problem should pretty much disappear and headers could then contain commas:
Let's not do it, she said
@import {
h1.Let's not do it, she said; h1.His reply
}
from geopyter.
And was also thinking about the fact that the octothorp could indicate comment in a code block but the parser wouldn't know the difference. I think that we can handle this as follows:
jupyter nbconvert --to markdown <FILE>.ipynb
egrep --colour=auto -n '^(?:#|\`\`\`)' <FILE>.md
The colour option is just to make it easier to see the matches, but the main thing here is it simultaneously extracts the headers and and the python code tags within which comments might appear. Then the parser can take the output of egrep
, discard the comments within python blocks (and the python blocks themselves) and it will have a structure for the document that includes the line numbers on which all headers appear.
We could use this to build a kind of index, which would speed up including and recompiling notebooks substantially -- 'all' (famous last words) you'd need to do is store some metadata that allows you to check whether the index needs to be rebuilt for one or more files and you'd be away: no change (e.g. last modified date time hasn't changed) means that you could go directly to the index without grepping the file. A change means you rebuild the index, but only for the file that has changed.
Think that'd work?
from geopyter.
I think we can rely on a couple of things to handle this. 1) the cell type is known in the notebook (see cell [26] here); 2) for the H? levels they should be the first non-whitespace in the markdown cell.
I have some more progress on the parser exploration where it can extract the TOC structure along the lines you suggest above, using an assumed structure for the @include. Once we settle on the latter I can update the parser notebook and push it up.
from geopyter.
For the structure of the notebook I think a question that comes to my mind is whether we use
# Title
# Section
## Subsection
### Subsubsection
# Section
## Subsection
vs.
# Title
## Section
### Subsection
#### Subsubsection
## Section
### Subsection
or other?
from geopyter.
Responding to both comments:
- Does the type of cell idea account for the fact that you can have a Markdown cell that contains non-executing code (e.g. embedded ```python) which itself contains header-like lines (e.g.
# This is a comment
)? - I find the second one more elegant, and I guess that makes including entire notebooks easier because you're essentially saying "Grab everything" with
# Title
; however, formatting wise I've never liked how small the 4th-level headers get and the limitations to the formatting of the notebooks.
from geopyter.
In researching this a bit further with an eye towards extracting meta data from the notebook, as in author and title information, there are a few new pieces to consider.
First, in the json representation of the notebook itself there is a key in the metadata for author. Through nbformat it is easy to access this. The issue is having the author of the notebook being able to add that meta data.
It seems that there is no set standard, yet, for extending the notebook meta data in terms of the nbformat scheme (see discussions here and here).
It also seems that a common approach is to implement some structure for such information in the content of the notebook. As long as we have a consistent structure, we can extract the meta data for subsequent processing (i.e., building indexes, recombinations of notebooks, etc).
So one suggested approach would be to have as required two fields: title and author.
# Title
Author: Jon Reades and Serge Rey
# Section 1
content....
# Section 2
content...
We could add optional fields for organization, email, etc.
For any of these we should probably adopt an existing convention (i.e., something like in LaTeX http://tex.stackexchange.com/questions/4805/whats-the-correct-use-of-author-when-multiple-authors)
from geopyter.
Related Issues (14)
- Reading, sub-setting, writing notebooks HOT 1
- Specification of @include HOT 4
- Metadata Tracking HOT 4
- Recursion in Imports
- Management of Data Sets HOT 4
- Management of Questions & Answers / Quizzes & Exams HOT 5
- Selecting and Including Content HOT 1
- Setup testing infrastructure HOT 3
- Support Python 3 HOT 1
- Excludes only work on first match
- FR: Support remote notebooks HOT 5
- Change resource to nb HOT 2
- Conversion of markdown files to notebooks HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geopyter.