Code Monkey home page Code Monkey logo

md2py's Introduction

Markdown2Python (md2py)

md2py converts markdown into a Python parse tree. This allows you to navigate a markdown file as a document structure.

See tex2py for a LaTeX parse tree.

Usage

Markdown2Python offers only one function md2py, which generates a Python object from markdown text. This object is a navigable, "Tree of Contents" abstraction for the markdown file.

Take, for example, the following markdown file.

chikin.md

# Chikin Tales

## Chapter 1 : Chikin Fly

Chickens don't fly. They do only the following:

- waddle
- plop

### Waddling

## Chapter 2 : Chikin Scream

### Plopping

Plopping involves three steps:

1. squawk
2. plop
3. repeat, unless ordered to squat

### I Scream

Akin to a navigation bar, the TreeOfContents object allows you to expand a markdown file one level at a time. Running md2py on the above markdown file will generate a tree, abstracting the below structure.

          Chikin Tales
          /           \
    Chapter 1       Chapter 2
      /               /     \
  Waddling      Plopping    I Scream

At the global level, we can access the title.

>>> toc = md2py(markdown)
>>> toc.h1
Chikin Tales
>>> str(toc.h1)
'Chikin Tales'

Notice that at this level, there are no h2s.

>>> list(toc.h2s)
[]

The main h1 has two h2s beneath it. We can access both.

>>> list(toc.h1.h2s)
[Chapter 1 : Chikin Fly, Chapter 2 : Chikin Scream]
>>> toc.h1.h2
Chapter 1 : Chikin Fly

In total, there are 3 h3s in this document. However, only 1 h3 is actually nested within 'Chapter 1 : Chikin Fly' (accessible via toc.h1.h2). As a result, toc.h1.h2.h3s will only show one h3s.

>>> list(toc.h1.h2.h3s)
['Waddling']

The TreeOfContents class also has a few more conveniences defined. Among them is support for indexing. To access the ith child of an <element> - instead of <element>.branches[i] - use <element>[i].

See below for example usage.

>>> toc.h1.branches[0] == toc.h1[0] == toc.h1.h2
True
>>> list(toc.h1.h2s)[1] == toc.h1[1]
True
>>> toc.h1[1]
Chapter 2 : Chikin Scream
>>> list(toc.h1[1].h3s)
[Plopping, I Scream]
>>> list(map(str, toc.h1[1].h3s))
['Plopping', 'I Scream']

Installation

Install via pip.

pip install md2py

Additional Notes

  • Behind the scenes, the md2py uses BeautifulSoup. All md2py objects have a source attribute containing a BeautifulSoup object.

md2py's People

Contributors

alvinwan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

md2py's Issues

How to access the markdown file?

Sorry, I am new to python. I was trying out the function with your example given. But I always end up with a 'module' object is not callable' error. Where did I go wrong?

In [29]: path='/Users/satibodhi/Desktop/chikin.md'

In [30]: md_file=open(path,'r')

In [31]: md2py(md_file)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-5e87db4ab0f6> in <module>()
----> 1 md2py(md_file)

TypeError: 'module' object is not callable

In [32]: md2py(markdown)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-ce63b0909e59> in <module>()
----> 1 md2py(markdown)

TypeError: 'module' object is not callable

links not supported / cause paragraphs to be skipped

from md2py import md2py
m = md2py('''# my h1
first paragraph
## first h2
yay
''')
m.h1.descendants
[<p>first paragraph</p>, <h2>first h2</h2>, <p>yay</p>]

Note that the first paragraph, <p>first paragraph</p>, is present, as expected.

Adding a link causes it to disappear:

from md2py import md2py
m = md2py('''# my h1
first paragraph [w link](https://google.com)
## first h2
yay
''')
m.h1.descendants
[<h2>first h2</h2>, <p>yay</p>]

I can't find any trace of it in the parse tree in this case.

Likely relates to #3.

interprets comments from within code blocks as headers

Expected behavior

Code blocks are seen entirely as one element Lines from within code blocks are not parsed as markdown

Actual Behavior

Lines from within code blocks are handled as if they are normal markdown. Python comments, for example, show up as h1 elements

Steps to reproduce

markdown file, md:

# headerA
bodyA

# headerB
body B with code block
```
# code comment
code
```

parser file:

md = open("md").read()
toc = md2py(md)
print("\n".join([str(header) for header in toc.h1s]))

Output:

headerA
headerB
code comment

Expected output:

headerA
headerB

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.