Code Monkey home page Code Monkey logo

markdown-testsuite's Introduction

Colophon - October 1st, 2014

This project was initiated to provide a test suite for Markdown markup, and eventually create a specification from this test results. A part of of the community has started a new endeavor which seems to get traction as CommonMark. We are then closing this project and encourage you to contribute to CommonMark.

The most interesting part of this project would not have been possible without the dedication of Ciro Santilli @************. So big applause and thank you to him.

The rest is kept around for archives and references.

Markdown Test Suite

Inspired by questions on W3C Markdown Community Group.

Pull Requests are welcome. See the CONTRIBUTING Guidelines

Design goals

  • Comprehensive.
  • Small modularized tests.
  • Easy to run tests using any programming language. In particular, data representations must have an implementation on all major languages.
  • Develop a consensus based markdown specification at markdown-spec.html. Visualize it here.

Test Scripts

Markdown Test Suite already includes tests for many important markdown engines.

To see what the scripts do run:

./cat-all.py -h
./run-tests.py -h

To configure the scripts do:

cp config_local.py.example config_local.py

and edit config_local.py. It is already gitignored.

A Vagrantfile is provided with a provision script that installs all installable engines.

Sample output from run-tests.py:

blackfriday   |......FFF..............FF...F....................................F....................................|   0.93s  102    7   6%
gfm           |FF.....F.....F...FFFF..F....F........................FFFF...FF...............FF...F.......F...........| 262.88s  102   20  19%
hoedown       |..............................F.................................................F.....................|   0.36s  102    2   1%
kramdown      |......FFF.....FF.......FF.......FF.FFFFFFFFFFFFF.......................F..............................|  30.69s  102   23  22%
lunamark      |FFFFFF.F.FFFFFFFFFFFF.F.....FFFF..FF............FFFFF..FFFFFFFFFF..............F...FFFFFFFFFFF........|   1.58s  103   53  51%
markdown_pl   |.....................FFFF...............................F...............F.............................|   2.56s  102    6   5%
markdown2     |......................................................................................................|   5.39s  103    0   0%
marked        |..............F.................FFFFFFFFFFFFFFFF......................F.F.............................|   6.22s  102   19  18%
maruku        |......FFF......F.......F.FFF...............................................FF.........................|  37.02s  102   10   9%
md2html       |.........................FFF...F............................................FF........................|   7.42s  102    6   5%
multimarkdown |......FFF....FF...F............FFF.FFFFFFFFFFFFF.....FFFF.........F...................................|   0.58s  102   27  26%
pandoc        |FF...........F.F.FFFF..FFFFF....FF.FFFFFFFFFFFFF.....FFFF.....FF............FFF.FFF..................F|   1.11s  102   41  40%
peg_markdown  |.................................................................F....................................|   0.46s  102    1   0%
rdiscount     |.......F.........................................................F....................................|  25.19s  102    3   2%
redcarpet     |......................................................................................................|  21.49s  102    0   0%
showdown      |.......................................................................F..............................|   6.85s  102    1   0%

Extensions:

blackfriday   |F..|  0.04s    3    1  33%
gfm           |F.|   2.37s    2    1  50%
hoedown       |.|    0.00s    1    0   0%
lunamark      |F.F|  0.05s    3    2  66%
kramdown      |..|   0.62s    2    0   0%
markdown_pl   ||     0.00s    0    0   0%
markdown2     |F.|   0.14s    2    1  50%
marked        |F.|   0.13s    2    1  50%
maruku        |F..|  1.06s    3    1  33%
md2html       |F.|   0.14s    2    0   0%
multimarkdown |F.|   0.01s    2    1  50%
pandoc        |F.|   0.02s    2    1  50%
peg_markdown  |...|  0.02s    3    0   0%
rdiscount     |F..|   0.74s    3    1  33%
redcarpet     |..|   0.42s    2    0   0%
showdown      |F..|  0.20s    3    1  33%

Where F indicates a failing test.

Other Noticeable Test Suites

We haven't been the first test suite effort. Some projects have maintained their own test suite for a long time. Hopefully we can reach a state where people agree on the terms of what should be a good test suite for all developers.

In addition we should note the wonderful work made by John Mac Farlane. The Web service output the differences in between the different markdown implementations. It helps a lot when searching on the most common output.

markdown-testsuite's People

Contributors

cirosantilli avatar djwf avatar gettalong avatar karlcow avatar mildsunrise avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

markdown-testsuite's Issues

Same tests, different output

unordered-list-items.md and unordered-list-items-leading-1space.md are exactly same (both of them have one leading space), however their output is different, second result has and additional trailing newline.

Should we test the generated DOM rather than generated HTML?

Nice work on the test suite. Very impressive.

I was wondering whether the test harness should compare the generated DOM, both from markdown and the reference HTML blocks.

Likewise for the spec, really - I don't know if it should specify exactly what HTML source would be output, but what DOM it would create, and let the implementation serialise the HTML with whatever style they favour.

Create a CONTRIBUTING.md file

With more people participating (very cool) it's time to create a few rules on how to properly contribute to the repo.

[discussion] markdown-js

I just started to add the testsuite to a markdown-js fork, but I don’t know yet if the author will be interested. Anyway, I think it could be useful for the markdown-testsuite.

markdown-js is supporting multiple dialects, I chose the default one, “Gruber”, which targets the Gruber documentation, like the specification.

Good news: markdown-js already passes 51 of the 102 tests [1]! Most of the failing tests I observed are related to spaces, line breaks, or attributes order.

I posted all the results here (found is markdown-js, wanted is the markdown-testsuite): https://gist.github.com/bpierre/4991393

Do you think there are things to fix in the suite, based on these results?

What do you think about integrating the testsuite into the popular markdown parsers? It could be a great way to push the Makdown Specification project while improving the interoperability among the existing tools, but maybe it’s too soon?

[1] Instructions to launch the testsuite with markdown-js:

$ git clone https://github.com/bpierre/markdown-js && cd markdown-js
$ git checkout -b markdown-testsuite origin/markdown-testsuite
$ git submodule update --init
$ npm install --dev
$ node test/markdown-testsuite.t.js

Extension name of test results

This pull request #19 was trying to rename all output files as html instead of out. We are still wondering what we should compare too. And technically speaking the files are just fragment of html and not html files.

Math extension.

Will we have it or not? If yes with what syntax? What should the output be? What are the current implementation statuses?

Sanest is probably to follow LaTeX: \(\) and \[\]. Tex looks better IMHO $$ $$ $ $, but single dollar may generate conflicts, cruel doubt here.

Possible outputs:

  • MathML (ideal)
  • MathJax.

Perhaps the best is to just leave the exact output unspecified for now: only say that he input will be treated specially.

Related on scholarly markdown: https://github.com/scholmd/scholmd/wiki/Math

It would also be great to be able to refer to formulas analogously to LaTeX \ref \label.

How to treat the random mailto test.

Spawned from #6.

Recap from Karl's points:

  1. Testing that mailto gets converted as a link. So basically ignoring the value of href.

  2. Testing that the string generated is really a random string of hexadecimal and decimal entities. Verifying that the value of href matches certain criteria

The only solution I can see: metadata.

  1. a boolean regex field does it: if true the output is a regex to match, not a literal

2), depends on the criteria. If a regex does it, 1). If not, we need to use an arbitrary function, so... a metadata containing a Python function?

Group extension tests by equal output.

When running run-tests.py we should be able to see for a given extension:

  • which engine support the extension
  • what are all the engines that for a given extension have the same output

Since there is no clear right or wrong for the extension tests.

The output could be something like:

fenced-code-block:
    code: kramdown, multimarkdown (2/11 engines)
    pre: pandoc                                   (1/11 engines)

This would help decide how extensions should be standardized if any.

Markdown extensions

@karlcow Can we add markdown extension tests?

We could confine them to test/extension, and then use the most common extension name across compilers such as fenced-code-block.md.

If yes, I suggest adding it on the readme.

If no... must fork =(

Fix a single syntax for representing each element and deprecate the others.

There should only be a single way of doing each thing. Currently there are several ways of doing many things, which means that:

  • tool writers have a much harder job, and have to consider tons of possibilities
  • very different styles are possible, and copy pasting things is a mess. This prompted me to write a style guide: http://www.************.com/markdown-styleguide/

Things for which there are multiple ways of doing:

  • unordered lists: hyphen -, asterisk * or, plus sign + I favor hyphen because:

    • more different from italics * to the viewer
    • no one uses plus sign +
  • ordered lists: real order 1., 2. which makes one refactor the entire list if the first change vs only 1.: 1., 1., which is saner like HTML. I favor 1..

  • list indent and spaces after marker. indent by 2 or 4?

  • italics: *a* or _a_

  • boldface **a** or __a__

  • headers: Setex vs Atx. IMHO should be only Atx which covers h1 - h6, whille Setex only h1 and h2.

    • inside Atx: trailing hashes or not # a # or # a. Should deprecate trailing hashes.
  • link title quoting: [a](http://a.com "title") vs [a](http://a.com 'title'). Even parenthesis supported on link definitions:

    [foo]: http://example.com/  (Optional Title Here)
    

Extensions:

  • angle brackets vs. automatic links without. If the extension is adopted, I think angle brackets should not exist at all, because they are strictly equivalent since they only works with strings that start with http or contain an @. Relative links cannot be distinguished from HTML tags (e.g. what if you want to link to <script>?), and must be done with []() link syntax

  • fenced code blocks vs indented. Fenced was probably introduced to specify the language for the block. IMHO indented looks better and cleaner, but we would need to fix a way to set the language for them. Kramdown attribute extension maybe like:

    {.python}
        code
    
  • in CommonMark: triple backtick fences vs triple tilde fences

  • citations. Possible in both Multimarkdown and Pandoc, but with differnt syntaxes.

File to support engine implementation.

I'm going to add support for parsedown, however, looks like I will need something like the showdown-stdin.js file used in #45, would be interesting to have a folder for files like this one? This way the main directory is not polluted.

Btw, I'm going to need this file, because I found out that in Windows, *.bat, *.cmd files are not executed without the extension, only *.exe file. Would be possible to implement some logic to handle this? Like the possibility to specify commands depending on the os.

Should single quote link titles work?

Reference style links

Input:

[a][b]

[b]: http://a.com 'title'

Output according to the original docs:

<p><a href="http://a.com" title="title">a</a></p>

Quote:

optionally followed by a title attribute for the link, enclosed in double or single quotes, or enclosed in parentheses.

But the actual output of Markdown.pl, marked and showdown is different!

<p>[a][b]</p>
<p><a href="http://a.com" title="title">a</a></p>

How do we decide on this case, considering that the original implementation does not correspond to its docs?

Explicit links

The implementation situation is the same.

The original docs say:

along with an optional title for the link, surrounded in quotes

which is ambiguous since he does not say either double or single. Since:

  • it says just quotes
  • it says single quotes work for reference style

my interpretation is that for the docs, single quotes should work too.

My view

If I could chose without considering history / implementation status, I'd say: make it not work. Its better to have a single way of doing things.

Considering implementation status, I don't know what is best.

Link to other testsuites.

So far I have found the following interesting test suites:

How about clarifying the relationship between this testsuite and those in the README?

Are we compatible? What is the advantage of ours (larger, more modularized, better normalization, easy to run via multiple languages like Python, Ruby, etc.)?

The following is not an actual testsuite, but a very good way to see outputs without installing anything which we really should point to:

It probably does every significant engine in existence. Interesting design remarks on http://johnmacfarlane.net/babelmark2/faq.html also.

Add language specific branches

What are your thoughts on having language specific branches?

I'm writing a multi-flavour markdown parser in ruby and having programmatic access to the testsuite makes life much easier. I've wrapped this repo in a ruby gem to do just that https://rubygems.org/gems/markdown-testsuite but there is obviously ruby specific code there.

For better encapsulation and usability, it would be helpful to have a the markdown specs separate from any implementation of them - which would be on language specific branches. The idea would be:

  • master: main branch containing only the markdown spec - md contributions and engines go here.
  • python: the spec + pypi code
  • ruby: the ruby gem fork
  • ....

See https://github.com/davekinkead/markdown-testsuite for an example. I had to move some code around the idea should be clear.

Thoughts?

[link-automatic-email.md] almost always fails

The mailto: link will be generated randomly.

example: markdown.pl

sub _EncodeEmailAddress {
#
#   Input: an email address, e.g. "[email protected]"
#
#   Output: the email address as a mailto link, with each character
#       of the address encoded as either a decimal or hex entity, in
#       the hopes of foiling most address harvesting spam bots. E.g.:
#
#     <a href="&#x6D;&#97;&#105;&#108;&#x74;&#111;:&#102;&#111;&#111;&#64;&#101;
#       x&#x61;&#109;&#x70;&#108;&#x65;&#x2E;&#99;&#111;&#109;">&#102;&#111;&#111;
#       &#64;&#101;x&#x61;&#109;&#x70;&#108;&#x65;&#x2E;&#99;&#111;&#109;</a>
#
#   Based on a filter by Matthew Wickline, posted to the BBEdit-Talk
#   mailing list: <http://tinyurl.com/yu7ue>
#

    my $addr = shift;

    srand;
    my @encode = (
        sub { '&#' .                 ord(shift)   . ';' },
        sub { '&#x' . sprintf( "%X", ord(shift) ) . ';' },
        sub {                            shift          },
    );

    $addr = "mailto:" . $addr;

    $addr =~ s{(.)}{
        my $char = $1;
        if ( $char eq '@' ) {
            # this *must* be encoded. I insist.
            $char = $encode[int rand 1]->($char);
        } elsif ( $char ne ':' ) {
            # leave ':' alone (to spot mailto: later)
            my $r = rand;
            # roughly 10% raw, 45% hex, 45% dec
            $char = (
                $r > .9   ?  $encode[2]->($char)  :
                $r < .45  ?  $encode[1]->($char)  :
                             $encode[0]->($char)
            );
        }
        $char;
    }gex;

    $addr = qq{<a href="$addr">$addr</a>};
    $addr =~ s{">.+?:}{">}; # strip the mailto: from the visible part

    return $addr;
}

Note:
My fork of markdown passed 101 of 102 tests.
Thank you for your great job!

[discussion] Convert all test files to a list of triple quoted Python strings in a single .py file.

Currently:

  • tests are not DRY because your have to repeat the ID twice:

    • ID.md
    • ID.out

    As a consequence it is harder to create new tests.

  • tests are slower because we have to read from 200+ files

  • it is hard to define test metadata. For example, pandoc only interprets auto links http://a without the <> if a command line option is passed.

    This option would need to be passed on a per test basis, and the only solution I see is creating yet another file like autolink.medata, increasing even further the duplication.

I have already converted all data to a list of Python strings using a simple adaptation of the cat-all script. If other people agree I'll make the necessary adaptations to the testsuite.

New would be really easy to write:

IO('test-id',
"""multiline
input""",
"""multiline,
output""",
)

If people want to have individual IO files for some reason, we can just make another script that generates those files from the Python strings.

Typo

2-paragraphs-line.md.out should be renamed 2-paragraphs-line.out.

Citation / footnote extension.

Will we have it or not? If yes with what syntax? What should the output be? What are the current implementation statuses?

Implementations:

  • Multimarkdown: # based:

    This is a statement that should be attributed to
    its source[p. 23][#Doe:2006].
    
    [#Doe:2006]: John Doe. *Some Big Fancy Book*.  Vanity Press, 2006.
    
  • Pandoc: YAML frontmatter / bib parsing + @ based: http://johnmacfarlane.net/pandoc/README.html#citations [@smith04; @doe99].

  • Kramdown: ^ based

    This is some text.[^1]
    
    [^1]: Some *crazy* footnote definition.
    

Three different ones so far =)

Add Python-Markdown Implementation

Not sure why Markdown2 was included here but not Python-Markdown. Sure, Markdown2 seems to have more stars (694) on Github than Python-Markdown (528) - which is a metric I see has been used for adding other projects. But on PyPI, as of today, Markdown2 only has 11.8k downloads in the last month while Python-Markdown has over twice that (24.4k) for the last week -- 101k in the past month.

I'm not bitter (sorry if I sound that way -- I may be a little biased -- I am Python-Markdown's primary developer) but if the packages chosen are based on popularity, it seems you missed Python's most popular. Also the alleged benefits of Markdown2 cited in their Readme appear to also be way off (not confirmed by me).

As an aside, I hadn't been aware of this project (markdown-testsuite) until today and find it interesting as Python-Markdown has a pretty extensive test suit itself - also implemented in Python. One question: When an implementation offers multiple parsing options how do you handle that so that each variation is tested? That can be very complex with Python-Markdown's many extensions. Our approach was to include a config file in the test suite directory that defined the default for that directory with subsections for each individual file as documented (see an example). Note that that config can include "any keyword argument accepted by the Markdown class" for total flexibility.

Section, image, table and formula references extension like LaTeX ref label.

Will we have it or not? If yes with what syntax? What should the output be? What are the current implementation statuses?

Image references are specially important in PDF outputs since the image may float away from the referencing point even if they are together in the input.

But all are necessary when you want to refer to one of those elements from another place.

Also for the system to work for images and tables, a legend is necessary, so that it is possible to see the number on the target. The number can be seen in obvious ways for headers and formulas.

Related on SO for images: http://stackoverflow.com/questions/9434536/how-do-i-make-a-reference-to-a-figure-in-markdown-using-pandoc

Should the spec / tests consider PDF / ePUB output?

Even if the spec does not specify it entirely, it might influence design decisions or deserve non normative remarks: being able to output PDFs is a great strength of Markdown's simplicity.

Any HTML to PDF allows that, but HTML to PDF is highly non trivial, and since Markdown only needs to generate a subset of HTML there may be simpler approaches.

Many tools like Kramdown and Pandoc support PDF through LaTeX. This is probably the simplest approach.

Other than that, there experimental tools that implement pagination directly on a Markdown renderer, like Kramdown with Prawn: gettalong/kramdown#78

There is also Softcover and GitBook through Calibre ebook-convert, but I have to check what they are using on the backend.

I have also asked if it is possible to do it without LaTeX at: http://softwarerecs.stackexchange.com/questions/3588/convert-markdown-to-pdf-without-latex but it seems not.

Group standard test output by test instead of engine on the left of the table.

Instead of the current:

engine 1 |...F...|
engine 2 |F......|

It would be more useful to see:

1. kramdown
2. multimarkdown

 12
|.F| feature 1
|.F| feature 2

1. kramdown:        262.88s  102   20  19%
2. multimarkdown:     0.36s  102    2   1%

Because:

  • it fits better into the screen which has limited width
  • it allows to immediately see the name of the failing test

We could have an option to toggle between both outputs, or replace the current one completely.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.