github / cmark-gfm Goto Github PK

This project forked from commonmark/cmark

GitHub's fork of cmark, a CommonMark parsing and rendering library and program in C

License: Other

CMake 2.14% Makefile 0.67% C++ 18.75% C 71.57% Python 4.69% Batchfile 0.03% Ruby 0.03% Racket 0.80% Shell 0.08% JavaScript 0.01% Dockerfile 0.08% XSLT 1.11% Perl 0.05%

cmark-gfm's Introduction

cmark-gfm

cmark-gfm is an extended version of the C reference implementation of CommonMark, a rationalized version of Markdown syntax with a spec. This repository adds GitHub Flavored Markdown extensions to the upstream implementation, as defined in the spec.

The rest of the README is preserved as-is from the upstream source. Note that the library and binaries produced by this fork are suffixed with -gfm in order to distinguish them from the upstream.

It provides a shared library (libcmark) with functions for parsing CommonMark documents to an abstract syntax tree (AST), manipulating the AST, and rendering the document to HTML, groff man, LaTeX, CommonMark, or an XML representation of the AST. It also provides a command-line program (cmark) for parsing and rendering CommonMark documents.

Advantages of this library:

Portable. The library and program are written in standard C99 and have no external dependencies. They have been tested with MSVC, gcc, tcc, and clang.
Fast. cmark can render a Markdown version of War and Peace in the blink of an eye (127 milliseconds on a ten year old laptop, vs. 100-400 milliseconds for an eye blink). In our benchmarks, cmark is 10,000 times faster than the original Markdown.pl, and on par with the very fastest available Markdown processors.
Accurate. The library passes all CommonMark conformance tests.
Standardized. The library can be expected to parse CommonMark the same way as any other conforming parser. So, for example, you can use commonmark.js on the client to preview content that will be rendered on the server using cmark.
Robust. The library has been extensively fuzz-tested using american fuzzy lop. The test suite includes pathological cases that bring many other Markdown parsers to a crawl (for example, thousands-deep nested bracketed text or block quotes).
Flexible. CommonMark input is parsed to an AST which can be manipulated programmatically prior to rendering.
Multiple renderers. Output in HTML, groff man, LaTeX, CommonMark, and a custom XML format is supported. And it is easy to write new renderers to support other formats.
Free. BSD2-licensed.

It is easy to use libcmark in python, lua, ruby, and other dynamic languages: see the wrappers/ subdirectory for some simple examples.

There are also libraries that wrap libcmark for Go, Haskell, Ruby, Lua, Perl, Python, R, Tcl, Scala and Node.js.

Installing

Building the C program (cmark) and shared library (libcmark) requires cmake. If you modify scanners.re, then you will also need re2c (>= 0.14.2), which is used to generate scanners.c from scanners.re. We have included a pre-generated scanners.c in the repository to reduce build dependencies.

If you have GNU make, you can simply make, make test, and make install. This calls cmake to create a Makefile in the build directory, then uses that Makefile to create the executable and library. The binaries can be found in build/src. The default installation prefix is /usr/local. To change the installation prefix, pass the INSTALL_PREFIX variable if you run make for the first time: make INSTALL_PREFIX=path.

For a more portable method, you can use cmake manually. cmake knows how to create build environments for many build systems. For example, on FreeBSD:

mkdir build
cd build
cmake ..  # optionally: -DCMAKE_INSTALL_PREFIX=path
make      # executable will be created as build/src/cmark
make test
make install

Or, to create Xcode project files on OSX:

mkdir build
cd build
cmake -G Xcode ..
open cmark.xcodeproj

The GNU Makefile also provides a few other targets for developers. To run a benchmark:

make bench

For more detailed benchmarks:

make newbench

To run a test for memory leaks using valgrind:

make leakcheck

To reformat source code using clang-format:

make format

To run a "fuzz test" against ten long randomly generated inputs:

make fuzztest

To do a more systematic fuzz test with american fuzzy lop:

AFL_PATH=/path/to/afl_directory make afl

Fuzzing with libFuzzer is also supported but, because libFuzzer is still under active development, may not work with your system-installed version of clang. Assuming LLVM has been built in $HOME/src/llvm/build the fuzzer can be run with:

CC="$HOME/src/llvm/build/bin/clang" LIB_FUZZER_PATH="$HOME/src/llvm/lib/Fuzzer/libFuzzer.a" make libFuzzer

To make a release tarball and zip archive:

make archive

Installing (Windows)

To compile with MSVC and NMAKE:

nmake

You can cross-compile a Windows binary and dll on linux if you have the mingw32 compiler:

make mingw

The binaries will be in build-mingw/windows/bin.

Usage

Instructions for the use of the command line program and library can be found in the man pages in the man subdirectory.

Security

By default, the library will scrub raw HTML and potentially dangerous links (javascript:, vbscript:, data:, file:).

To allow these, use the option CMARK_OPT_UNSAFE (or --unsafe) with the command line program. If doing so, we recommend you use a HTML sanitizer specific to your needs to protect against XSS attacks.

Contributing

There is a forum for discussing CommonMark; you should use it instead of github issues for questions and possibly open-ended discussions. Use the github issue tracker only for simple, clear, actionable issues.

Authors

John MacFarlane wrote the original library and program. The block parsing algorithm was worked out together with David Greenspan. Vicent Marti optimized the C implementation for performance, increasing its speed tenfold. Kārlis Gaņģis helped work out a better parsing algorithm for links and emphasis, eliminating several worst-case performance issues. Nick Wellnhofer contributed many improvements, including most of the C library's API and its test harness.

cmark-gfm's People

Contributors

Stargazers

Watchers

Forkers

splade ynynnyynyy kdddddd linpei535201470 njlr daukantas brokenhandsio barkinet duruif nafest ashutoshbondre killagu jiayuanchan anithaselvipalanisamy lanjingling34 antons crissov shinyfrog primatelabs gfx winsx watson1978 mgenware dvorka molikto westermo collymy daya-prac bb-apple fir3721 forging2012 dalavancloud gstein mikekavouras jgm ohsdba petestreet yangmacheng felixonmars 2139272 hultner sjlver dougbeney lmihalkovic mpcjanssen gjtorikian webworkscollection keith-packard microh allenwu1973 lucifer1983wc satoshi-iwaki nightflyer kevinoupeng jinhucheung spatten apnadkarni documentnode rysavyjan moi-forki jonathan-g cntrump johnd0e erictepper uzitech jakkusakura alkuzad ttakuru88 tomoasleep analogjupiter breword hebertialmeida amatobuononato nankai-high-school alexhe brandonasuncion silvrwolfboy finnvoor orchitech unsignedapps notdaqy stianjensen sk-9 mstoer-ruboss mdlglobal-atlassian-net neilsultimatelab pd95 global-localhost global19 global19-atlassian-net freron flaviomarcilio xzzz9097 velnota zwaldowski pyrmont quietmisdreavus standardgalactic bulice isaponsoft

cmark-gfm's Issues

Strikethrough extension is underspecified

As far as Markdown-enabled textboxes on GitHub are concerned, the Strikethrough extension is underspecified. In general, it seems to have some of the same left-flanking/right-flanking rules. I haven't looked at the code yet, but here are some examples. According to the spec:

Strikethrough text is any text wrapped in tildes (~).

However, we can find a number of counter examples:

A "right-flanking" `~` cannot open text, nor can a "left-flanking" `~` close strikethrough

Markdown input	as rendered by GitHub
`~ text~`	~ text~
`~text ~`	~text ~

The "multiple of 3" combined delimiter rule seems to apply

From the last sentence of rule #9.

Markdown input	as rendered by GitHub
`~foo~~bar`	~foo~~bar
`~~foo~bar`	~~foo~bar

The shorter span rule seems to apply

From rule #16

Markdown input	as rendered by GitHub
`~~foo ~~bar~~`	~~foo ~~bar~~

The "links group more tightly" rule seems to apply

From rule #17

Markdown input	as rendered by GitHub
`~[foo~](bar)`	~foo~

I'd be happy to send a PR.

Example 248: unclear why there can be 4 leading spaces for a list item

Simplified example (corresponding to d and e in the example):

   - foo with 3 leading spaces
    - bar with 4 leading spaces

cmark-gfm produces:

<ul>
<li>foo with 3 leading spaces</li>
<li>bar with 4 leading spaces</li>
</ul>

However, the specification says there may only be 0-3 leading spaces for list items, so, as I understand it, the second line is not an item. As indented code blocks can't interrupt paragraphs, it seems like it should be parsed as:

<ul>
<li>foo with 3 leading spaces - bar with 4 leading spaces</li>
</ul>

Allow markdown inside a table-<td>

Is there a change you will extend your markdown flavor to allow markdown inside a table?

Example

My Markdown [Link](#Link)

<table><tr><td>
**My Markdown** [Link](#Link)
</td><tr></table>

online cmark-gfm for babelmark?

Hey,

I'm providing the service http://babelmark.github.io and I was wondering if GitHub could provide a HTTP handler for cmark-gfm so that we can add cmark-gfm to the comparison.

The registry for the URL is in the repository https://github.com/babelmark/babelmark-registry If you would like to store an encrypted URL, you can send it to me to my email and I will send you back an encrypted URL.

It would be of a great help to track compliance with GFM for all the other implementations!

autolink not following spec?

www.example.com

https://github.github.com/gfm/#autolinks-extension- defines delimiting characters for an autolink:

Autolinks can also be constructed without requiring the use of < and to > to delimit them, although they will be recognized under a smaller set of circumstances. All such recognized autolinks can only come after whitespace, or any of the delimiting characters *, _, ~, (, and [.

However, as you can see (live repro)

[http://www.example.com

[www.example.com

(http://www.example.com

(www.example.com

are not treated in the same way ([ appears to be ignored among valid delimiters).

Additionally, at the very top of this document an autolink has been rendered. However it has no such characters as a delimiter (because there are zero characters before it). Is w a delimiter in this case? Or perhaps the start of the string is a delimiter?

(zero characters do not count as whitespace per: https://github.github.com/gfm/#whitespace so the gfm spec should be clarified for this case).

(I'm asking as someone trying to write some implementation details based on the gfm spec, something I would gladly take those latter concerns to if github had a fork of the original CommonMark spec https://github.com/jgm/CommonMark 😉)

Problem in latest tag

It's not possible to rely on the libdir provided in the pkgconfig file in the latest tag.

Though it is fixed in afc9a17.
I'll just use the code in 'master' for my language binding.

(yes, of course I'm using your fork of cmark. Everyone loves GFM more than vanilla.)

cmark_extension_api.h contains include for uninstalled header files

There are six header files installed:

cmark-gfm /usr/include/cmark.h
cmark-gfm /usr/include/cmark_export.h
cmark-gfm /usr/include/cmark_extension_api.h
cmark-gfm /usr/include/cmark_version.h
cmark-gfm /usr/include/cmarkextensions_export.h
cmark-gfm /usr/include/core-extensions.h

But in the file cmark_extension_api.h, two includes of other header files are present:

#include <render.h>
#include <buffer.h>

This makes the file unusable when building as a shared library, along with core-extensions.h which includes it.

Cut a release for packaging

I learned about this fork from the Engineering blog post, and would love to add this to Homebrew (as well as use the extensions myself). A tag/release would be necessary for Homebrew though.

Would you please consider cutting a release? Thanks.

Specs for GFM auto-identifiers for headers

Hi,
In markdig markdown parser, I have a mode for generating auto-identifiers for headers, similar to github, but I'm currently using a rule closer to what pandoc is doing

I would like to add also support for github auto-identifiers for header. I have found that the autolink in GFM has a spec but failing to see the auto-identifier.

Have you implemented this in cmark? Where is the code that is handling this part so that I could replicate the behavior? (or if it is not in cmark, can you point me where it was previously implemented?)

Thanks!

"www extended autolink" and "valid domain" definitions need rewording.

The autolink extension specification contains this:

An extended www autolink will be recognized when a valid domain is found. A valid domain consists of the text www., followed by alphanumeric characters, underscores (_), hyphens (-) and periods (.). There must be at least one period, and no underscores may be present in the last two segments of the domain.

And later this:

An extended url autolink will be recognised when one of the schemes http://, https://, or ftp://, followed by a valid domain, then zero or more non-space non-< characters according to extended autolink path validation.

I.e. accordingly to these, the extended url autolink should always have www. after the domain, but examples show differently. The definition of valid domain should be changed to not enforce www., and that requirement should be moved to the definition of www autolink.

Ugly header hashes/fragments

## a & b

more or less becomes

<h2 id="a--b"><a href="#a--b">a & b</a></h2>

where #a--b is definitely ugly.

Triple asterisk with other punctuation in tables

The following table header does not render correctly:

|***(a)***|
|---|

The triple asterisks should be parsed as emphasis nested within strong, but because the table extension parses the entire line for inlines at once, the surrounding pipes cause the flanking rules to become confused.

A potential fix: parse each line for inlines to work out where the real pipes are, then split up the original line and reparse inlines one cell at a time so that ***a*** inside the cell parses as anticipated.

Ampersand in extended WWW autolink

According the one example in spec, this:

www.google.com/search?q=commonmark&hl=en

should be transformed to this:

<p><a href="http://www.google.com/search?q=commonmark&amp;hl=en">www.google.com/search?q=commonmark&amp;hl=en</a></p>

IMHO, the escaping of ampersand inside the href attribute is not good idea and it should be this instead:

<p><a href="http://www.google.com/search?q=commonmark&hl=en">www.google.com/search?q=commonmark&amp;hl=en</a></p>

GFM Version check macro

Please add something like CMARK_GFM_VERSION to detect whether you use this fork.

Autolink will misfire

via @extensions.achook.staticConfig.domain@. This

will convert to

<p>via @<a href="mailto:extensions.achook.staticConfig.domain@">extensions.achook.staticConfig.domain@</a>. This</p>

I don't expect this result.

Smaller example.

n@.  b

will convert to

<p><a href="mailto:n@">n@</a>. b</p>\n

n@ is invalid email address.

Exclamation before a footnote gets eaten

Check this out:

~/github/cmark master$ build/src/cmark-gfm --footnotes <<EOT
Hi![^hi]
>
> [^hi]: OK.
> EOT
<p>Hi<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup></p>
<section class="footnotes">
<ol>
<li id="fn1">
<p>OK. <a href="#fnref1" class="footnote-backref">↩</a></p>
</li>
</ol>
</section>

It's eaten up by the ! reader and not replaced by footnotes.

Should libcmark-gfmextensions be listed in the pkgconfig file?

I see that I need to pass -lcmark-gfmextensions if I want to be able to call core_extensions_ensure_registered() (and, of course, the extensions themselves).

Should that be listed in the pkgconfig file?

Spec incorrectly claims strikethroughs cannot be nested

The spec states:

Any number of tildes may be used on either side of the text; they do not need to match, and they cannot be nested.

This is false (or the impl has a bug):

$ bin/cmark-gfm --extension strikethrough
~~foo ~~bar~~ baz~~
^D
<p><del>foo <del>bar</del> baz</del></p>

The example illustrating that strikethroughs "cannot be nested" is actually demonstrating a sense of left-flanking and right-flanking delimiter runs:

$ out/bin/cmark-gfm --extension strikethrough
This ~text~~~~ is ~~~~curious~.

This ~text ~~~~is~~~~ curious~.
^D
<p>This <del>text</del> is <del>curious</del>.</p>
<p>This <del>text <del>is</del> curious</del>.</p>

Issues with pipes inside of backpacks inside a table

Actual Result:

This used to work as expected.

Parameter	Type	Description
`id`	`integer	null`
`description`	`string`	The description for the `approval` resource, if available.

Expected Result:

Parameter	Type	Description
`id`	`integer\|null`	The identifier for the `approval` resource, if available.
`description`	`string`	The description for the `approval` resource, if available.

Installing extensions on linux (using `make install`)

I found that make install does not install libcmark-gfmextensions.so, and there's no way to install the library. It would be good if there's a way to install it (like make install_extensions)!

Footnotes

Supported by, e.g. pulldown-cmark.

You broke my badges (which used shortcut reference syntax)

In the old github-flavored markdown, this used to do the Right Thing:

[![Build Status][build-status-badge]][build-status-link]
[build-status-badge]: https://travis-ci.org/myorg/myrepo.svg?branch=master
[build-status-link]: https://travis-ci.org/myorg/myrepo

Now the markup appears verbatim in rendered output (sans newlines).

Can I have it back? Or at least, what should I do?

.editorconfig & tabsize no longer respected?

Hey guys,

I believe .editorconfig support broke recently. Is this a resuly of switching to the cmark stack? Tabsize doesnt seem to be respected any more causing rendering issues withon code snippets that worked several days ago: https://github.com/leeoniya/domvm/blob/2.x-dev/README.md (for example, EOL comments are now misaligned).

Is this something that'll come back or permanently gone?

Documentation of GFM extensions to CM

commonmark/commonmark-spec#520

At first, I considered to create a pull request for /test/extensions.txt here, but I figured it would be better kept in the CommonMark spec repo. Anyway, have a look at the file for some test cases may be useful for your documentation as well.

Trim extra spaces in table cells

With the following input

| hi | lo |
| -- | -- |
| 5  |  7 |

cmark -t xml -e table yields

  <table>
    <table_header>
      <table_cell>
        <text> hi </text>
      </table_cell>
      <table_cell>
        <text> lo </text>
      </table_cell>
    </table_header>
    <table_row>
      <table_cell>
        <text> 5  </text>
      </table_cell>
      <table_cell>
        <text>  7 </text>
      </table_cell>
    </table_row>
  </table>

This renders fine, but it would be better, I think, to trim the leading and trailing spaces in the table cells before parsing as inlines. (This doesn't cause any real problems, because in HTML the leading and trailing spaces in td elements are ignored anyway. But, conceptually, the leading and trailing spaces are not part of the content; they're just there to fill out the cell, and often just to help things line up on the page.)

Class & ID

The spec now shows that HTML with classes & ID's are allowed (see example 120), but it appears that both are still being stripped away:

The following example would have red text, if this were allowed:

<div class="text-red">test</div>

test

Problem rendering list items with links followed by a single word

- [Foo]: Description.

[Foo]: http://example.com/

Renders as:

<ul>
<li></li>
</ul>

Live repro (currently repros on this issue on May, 12th, 2017):

Work around is to use more than one word after link:

- [Foo]: Longer description.

[Foo]: http://example.com/

Renders as:

<ul>
<li><a href="Description">Foo</a>: Longer description.</li>
</ul>

Live demo:

Foo: Longer description.

Original discovered while investigating: wincent/masochist#101

Autoidentifier in cmark-gfm are not backward compatible

A followup of #65, it is not per se a real issue as @kivikakk mentioned that it is currently done at HTML level on github, but the cmark-gfm seems to output its own identifiers (even if they get rewritten later)... But maybe this should still be consistent there?

Generated auto-identifiers for headings in cmark-gfm are different from https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/toc_filter.rb as they are stripping _ unlike the regexp \w which keeps the underscore.

See pull request here that was looking to modify this based on cmark-gfm results: xoofx/markdig#173

Table row alignment not available in the cmark AST

I am using cmark's AST to write a CommonMark renderer that renders with Skia (directcmr). I am rendering tables with this fork but currently there is no way to access the alignments field of the node_table struct. I'd suggest to add a getter for the row alignments.

Links with stars are misparsed

https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* is a valid url that leads to a webpage, but the parser used in issues leaves the * behind as demonstrated here:
https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-*

The tagfilter extension filters only lowercase tags

Given that e.g. <IFRAME> is as problematic as <iframe>, the comparisons in extensions/tagfilter.c should be case-insensitive.

nodejs wrapper

i've implement nodejs wrapper for cmark.

https://github.com/killa123/node-cmark

hope it can be added to Readme.

Quoted terms are not detected as emphasis, in Japanese text

Description

Generally I can emphasize a quoted term in English like as:

$ echo 'before *"phrase"* after'  | build/src/cmark-gfm 
<p>before <em>&quot;phrase&quot;</em> after</p>

On the other hand, its translated version in Japanese is not emphasized same to above:

$ echo '前*「フレーズ」*後'  | build/src/cmark-gfm 
<p>前*「フレーズ」*後</p>

(Note:

前 means before
「 (`\u300c*) is an open quote
フレーズ means phrase
」 (`\u300d*) is a close quote
後 means after

)

Both English version and Japanese version examples should be parsed in the same way.

Steps to reproduce

Clone the repository: git clone https://github.com/github/cmark.git
CD to the repository: cd cmark
Build the command line tool: make
Try to parse a emphasis for quoted term echo '前*「フレーズ」*後' | build/src/cmark-gfm

Expected result

<p>前<em>「フレーズ」</em>後</p>

Actual result

<p>前*「フレーズ」*後</p>

Details

It seems to be parsed based on the rule described at https://github.com/github/cmark/blob/master/test/spec.txt#L6346 :

This is not emphasis, because the opening `*` is preceded
by an alphanumeric and followed by punctuation, and hence
not part of a [left-flanking delimiter run]:

```````````````````````````````` example
a*"foo"*
.
<p>a*&quot;foo&quot;*</p>

However, in Japanese (and some other languages), terms are not separated with white spaces and emphasized quoted terms are generally written like as the example above.

Support disabling inline HTML

This is a feature request to support not parsing HTML at all.

Emoji support not documented

You can do useful emoji on GitHub :tada: 🎉, yet not no Emoji extension is documented :cry: 😢.

Spaces around info string?

The spec says:

The line with the opening code fence may optionally contain some text following the code fence; this is trimmed of leading and trailing spaces and called the info string.

I wonder what spaces actually means to in this context? spaces only? spaces and tabs?

I usually check my assumptions using GitHub Markdown API, but this time I got strange output!

Output:

<pre lang="hey"><code>print 'Hello world!'
</code></pre>

Input:

```·→hey
print 'Hello world!'
```

May someone explain what just happened?

NOTE
· represents space
→ represents tab

Eaten backslash in a code span

Currently the spec.txt contains this example for table extension:

| f\|oo  |
| ------ |
| b `\|` az |
| b **\|** im |
.
<table>
<thead>
<tr>
<th>f|oo</th>
</tr>
</thead>
<tbody>
<tr>
<td>b <code>|</code> az</td>
</tr>
<tr>
<td>b <strong>|</strong> im</td>
</tr></tbody></table>

See section 6.1 about Backslash escapes where the spec says:

Backslash escapes do not work in code blocks, code spans, autolinks, or raw HTML

Numbered lists with sub-lists

When I have a number list like this:

1. First point
  1. First point, sub-point 1
  1. First point, sub-point 2
  1. First point, sub-point 3
2. Second point

I would expect it to render like this:

<ol>
  <li>
    <p>First point</p>
    <ol>
      <li>First point, sub-point 1</li>
      <li>First point, sub-point 2</li>
      <li>First point, sub-point 3</li>
    </ol>
  </li>
  <li><p>Second point</p></li>
</ol>

Instead it renders like this:

<ol>
  <li><p>First point</p></li>
  <li>First point, sub-point 1</li>
  <li>First point, sub-point 2</li>
  <li>First point, sub-point 3</li>
  <li><p>Second point</p></li>
</ol>

I see in the spec that you supported nested unordered lists, but I couldn't see anything for nested numbered lists like this. Maybe I didn't look hard enough?

I would attempt fixing this myself, but I am not that great at C.

Manpage naming confusion (e.g. `make install` does not work)

Currently, make install does not work, as the manpages named in man/man3 has been named cmark-gfm.3, but the Makefile still refers to cmark.3 (see patch below).

But this seems to be even more messy: the binary created is called cmark-gfm, so naturally, the cmark.1 should be called also cmark-gfm.1

Then there is the content of the manpage itself. It still refers to cmark instead of cmark gfm. Since I don't know what the intention and final naming goal is, I couldn't roll a pull request out of that. Anyway, first suggested patch below, but all the other things above would need attention too.

--- a/man/CMakeLists.txt
+++ b/man/CMakeLists.txt
@@ -5,6 +5,6 @@ include(GNUInstallDirs)
   install(FILES ${CMAKE_CURRENT_SOURCE_DIR}/man1/cmark.1
     DESTINATION ${CMAKE_INSTALL_MANDIR}/man1)
 
-  install(FILES ${CMAKE_CURRENT_SOURCE_DIR}/man3/cmark.3
+  install(FILES ${CMAKE_CURRENT_SOURCE_DIR}/man3/cmark-gfm.3
     DESTINATION ${CMAKE_INSTALL_MANDIR}/man3)
 endif(NOT MSVC)

Documentation for syntax extensions

Is there a documentation for writing your own syntax extensions?

Use after free for text literal after extensive AST manipulation

I've been unable to find a small example that reproduces the error, so I try to describe it.

I have a syntax extension that does some heavy AST manipulation in the post process function. In particular, sometimes it copies all children of a paragraph into a new paragraph and cmark_node_frees the old one. In some circumstances, this leads to a heap use after free.

As far as I can tell, the issue is with text nodes whose chunk have alloc set to 0. If I understand the parsing code correctly, their memory is then owned by the parent container and is only lazily copied over. But when I add them to a new parent and destroy the old one, their chunks refer to freed memory.

I'm trying to get a small example that reproduces the error but maybe you understand the issue.

Compiler warnings on win64 with mingw-w64

I would really like to eliminate these. The problem is we store an unsinged char in a void * but on Windows 64 these have different size.

Found the following significant warnings:
  cmark/blocks.c:395:41: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
  cmark/inlines.c:511:41: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
  cmark/inlines.c:525:45: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Equations extension

It would be very cool to support math in cmark and on Github markdown documents. I suppose on Github you would conditionally have to include mathjax in the html header.

/cc r-lib/roxygen2#520

Incorrect table alignment in latex output

Using this example table, which renders correctly to HTML:

Colons can be used to align columns.

| Tables        | Are           | Cool  |
| ------------- |:-------------:| -----:|
| col 3 is      | right-aligned | $1600 |
| col 2 is      | centered      |   $12 |
| zebra stripes | are neat      |    $1 |

There must be at least 3 dashes separating each header cell.
The outer pipes (|) are optional, and you don't need to make the 
raw Markdown line up prettily. You can also use inline Markdown.

Markdown | Less | Pretty
--- | --- | ---
*Still* | `renders` | **nicely**
1 | 2 | 3

However latex output fails to use the correct alignment. The {lll} part below should be {lcr}.

Colons can be used to align columns.

\begin{table}
\begin{tabular}{lll}
 Tables         &  Are            &  Cool   \\
 col 3 is       &  right-aligned  &  \$1600  \\
 col 2 is       &  centered       &    \$12  \\
 zebra stripes  &  are neat       &     \$1  \\
\end{tabular}
\end{table}
There must be at least 3 dashes separating each header cell.
The outer pipes (\textbar{}) are optional, and you don\textquotesingle{}t need to make the
raw Markdown line up prettily. You can also use inline Markdown.

\begin{table}
\begin{tabular}{lll}
Markdown  &  Less  &  Pretty \\
\emph{Still}  &  \texttt{renders}  &  \textbf{nicely} \\
1  &  2  &  3 \\
\end{tabular}
\end{table}

Where is the tasklist extension?

At the moment (0.27.1.gfm.0), the tasklist extension does not show up in --list-extensions; its test case in test/spec.txt is disabled; and it doesn't seem to be mentioned anywhere else in the code base. Am I right it's not included in the code base? Any plan to release it?

Strikethrough by double tildes

We want to migrate from redcarpet to github/cmark.
Unfortunately, we have two major blockers. One of them is strikethrough.

In redcarpet double tildes starts strikethrough while cmark does that with just one tilde.

From redcarpet's readme:

:strikethrough: parse strikethrough, PHP-Markdown style. Two ~ characters mark the start of a strikethrough, e.g. this is ~~good~~ bad.

Is there any chance to update cmark somehow to let it use ~~ instead of ~ (maybe as an option)?

Thank you! 🙂

/cc @kivikakk

Missing `cmark_syntax_extension_get_private()`

There is cmark_syntax_extension_set_private() but it is kind of useless as there is now way to get the private data back.

How to deal with tabs?

The spec says:

Tabs in lines are not expanded to spaces. However, in contexts where whitespace helps to define block structure, tabs behave as if they were replaced by spaces with a tab stop of 4 characters.

Thus, for example, a tab can be used instead of four spaces in an indented code block. (Note, however, that internal tabs are passed through as literal tabs, not expanded to spaces.)

→foo→baz→→bim

<pre><code>foo→baz→→bim</code></pre>

Later the spec also says:

In the following case > is followed by a tab, which is treated as if it were expanded into three spaces. Since one of these spaces is considered part of the delimiter, foo is considered to be indented six spaces inside the block quote context, so we get an indented code block starting with two spaces.
.-. foo
...-. bar
→.-.baz

("." => space)

I would like to know in which cases exactly do we consider tab as 4? 3? 2? spaces? I'm building a parser but I think that part of the spec is confusing!

CM Autolinks in spec conflict with GFM Autolinks extension

CM Autolink examples 592 and 595 show how Autolinks shouldn't match for raw links, such as < http://foo.bar > and http://example.com. When the GFM Autolinks extension is implemented to spec, those links get matched and transformed in to URLs.

This conflict creates a problem for projects that use the examples listed here as tests for their markdown renders, such as DartLang's markdown package.

Documentation of table extensions is wrong

extensions.txt claims that these tables without a body are "not enough table", but they are successfully parsed in this very input form.

| Just enough table | to be considered table |
| ----------------- | ---------------------- |

|x|
|-|

| xyz |
| --- |

| a |
--- |

Just enough table	to be considered table

x

xyz

a