markedjs / marked Goto Github PK

View Code? Open in Web Editor NEW

31.9K 31.9K 3.3K 9.66 MB

A markdown parser and compiler. Built for speed.

Home Page: https://marked.js.org

License: Other

JavaScript 34.80% HTML 30.99% TypeScript 34.21%

commonmark compiler gfm hacktoberfest markdown parser

marked's People

Contributors

Stargazers

Watchers

Forkers

oztc rahuldave jnordberg gigonaut ceejbot yuest mitry kwinsch dshaw rf thedekel doubleotoo aleray dherman ryanramage freewind zipang boomyjee edpaget chemzqm isaacs jorisroling nhq krijosoft hastebrot kangax kami ecarter omarsve dmartinpro erkoala tobie odbol briancavalier bbailes nduhamel coolcloud jmpressjs kinvey georgerogers42 philipvr psefxx demchenkoe tlvince afjlambert sharemyworld audreyt stoke roadlabs nijikokun mwasyou mereskin-zz akoumjian tyrchen hlb pierrerust keepcleargas ilkkah jhoffner arkadiuszputko kitsonk sophiebits csnw vladikoff tml a-sk levhita bjornblomqvist tqc spontaneouscms netconstructor leadsplus mapping lepture swaagie skopp philschatz marcusphillips marcinnajder htruong fengmk2 newsky johnmdonahue mko grimmkull kelleyvanevert krisnod frank-fan wuxq bruth ardalanaz web-ninja lancee olanb7 jerrysievert francoisgrolier robertthegrey loucypher gjtorikian kerbyfc

marked's Issues

Sublists

The markdown syntax document is pretty explicit that the marker doesn't matter:

% ../marked/bin/marked
* test
+ test
- test

<ul><li>test <ul><li>test <ul><li>test </li></ul></li></ul></li></ul>
% Markdown.pl 
* test
+ test
- test

<ul>
<li>test</li>
<li>test</li>
<li>test</li>
</ul>

Should not link hyperlinks found in anchor text

Test case:

> marked.parse('<p>Already linked: <a href="http://foo.com">http://foo.com</a>  </p>')
'<p>Already linked: <a href="http://foo.com"><a href="http://foo.com&lt;/a&gt;">http://foo.com&lt;/a&gt;</a>  </p>'

This is quite common when a bit of markdown contains some inline HTML.

A simple workaround that is probably not ideal or free of edge case failures:

result = result.replace(/<a href="([^"]+)&lt;\/a&gt;">\1&lt;\/a&gt;/, '$1');

Multiple underscores in words

The handling of multiple underscores in words is not consistent. github_flavored_markdown handles this differently than the original MarkDown syntax, so does marked. Try this:

perform_complicated_task
do_this_and_do_that_and_another_thing

This bug affects prose/prose#146 see there for how to reproduce it.

Need advice on extending marked with custom markup

We need to extend marked with [[reference]] syntax, that woud work in the same way, like highlight (via external function call)

https://github.com/nodeca/ndoc/blob/master/syntax.md#short-links

Could you explain, how to monkeypatch marked, to add this inline tag?

async "highlight" function support?

It would be nice to be able to have the "highlight" callback function return asynchronously, as it's the only way to integrate with something like pygments. I realize that this means that marked itself would have to have a new API that returns asynchronously, but that's probably ok. Thoughts?

Markdown references are case-insensitive

As specified in the syntax document, and demonstrated thus:

% Markdown.pl 
[hi]

[HI]: /url
<p><a href="/url">hi</a></p>

Marked doesn't work with literal newlines

Re: OscarGodson/EpicEditor#75

Think i figured it out. Marked doesn't know what to do with literal \n like this:

marked('hello  \
world');

I think this is in fact a bug with Marked since in this example, native JSON.parse will convert that and to get it to work with Marked we'd have to go and add \n (the text, not literal) in place of it. What do think?

A link containing # in a header

I'm not sure whether this is a bug in marked or a symbol I need to escape, but parsing:

### Something [blabla](#/bla) is *great*.

results in a new paragraph starting and the header ending on the # in #/bla and doesn't give me a link (to #/bla) in the header.

Thanks for marked, I'm using it in coffeekup in a zappa project and it's awesome to parse markdown so easily.

rel=nofollow

Would it be possible to have the option of adding no follow for all output anchor tags?

If this is relatively trivial to add, it would be nice.

Otherwise, it's not that important and I could just hack it directly onto the links.

Implement markdown extra features

Thanks for the excellent md parser!

There are a bunch of various extensions to markdown
eg
http://freewisdom.org/projects/python-markdown/Extra
http://michelf.com/projects/php-markdown/extra/
http://maruku.rubyforge.org/proposal.html

While not all are exactly critical, things like super/subscript, tables, definition lists, abbreviations can go a long way to make writing md documents easier.

Also simple transformations (like texttile http://rpc.textpattern.com/help/?item=intro) can make the text much more readable
For example
... -> …
' ' -> ‘ ’
" " -> “ ”
-- -> –
(c) -> ©
(r) -> ®
and so on .

Ideally the goal should be to never write pure html for any typographical feature.

Allow options.highlight to specify custom classes

I'm using marked together with highlight.js, but there's a bit of snag: highlight.js (and/or its styles) requires the plain name of the language (like "python") as a class on the <code> tag, but marked uses class names that look like "lang-python". I hacked around this in my copy of marked.js by adding the class name -- you can see the change here -- but it'd be nicer if marked let me specify additional/alternative class names.

I suggest that options.highlight() be allowed (optionally -- use a type check) to return an object that has two properties: the code string (i.e., what is returned now) and an array of class names to be applied. I can certainly code it and submit a pull request, if desired.

(For completeness: I could alternatively ask highlight.js to make the class names it expects be configurable. But it's not clear how to do that in any sane way -- the classes are explicit in the CSS. Making the change in marked feels like the correct approach anyway.)

Thanks for the great tool. It was utterly essential in the development of my Markdown-email browser extension.

"Loose" Lists

When "loose" lists, as they're called (lists with their items separated by 2 line feeds), get implemented, it introduces some ambiguity to parsing. This is a bit of a problem with the markdown grammar itself.

* List 1
  Text
* List 1
  Text

* Loose List 1
  Text

* Loose List 1
  Text

What is that? Is it 2 separate lists? One tight, one loose? Is it three lists? One with 2 items, and 2 with one item? Is it a single list with the items spaced differently? The only solution I see, is to actually always separate consecutive lists by 3 line feeds:

* List 1
  Text
* List 1
  Text


* Loose List 1
  Text

* Loose List 1
  Text

Which would be rendered as 2 lists: one tight, one loose. This is probably what will happen once I merge the experimental branch.

Support contenteditable better with \u00a0 characters

As you know from my ticket, contneditables will convert spaces to \u00a0. From: http://www.fileformat.info/info/unicode/char/A0/index.htm

It's decomposition is actually a \u0020, or, a space. It'd be nice if Marked checked for both of those characters since they mean almost the same thing. I can't imagine someone wanting to keep a no-break space but convert spaces.

Thoughts?

Problem when converting headings

It appears to be an inconsistency in how marked handles single line conversions of headings.

If I have the following test.js:

var marked = require('marked');
console.log(marked('# title'));

I would expect the output to be <h1>title</h1>, but I get:

$ node test.js 
<p># title</p>

If I however do the following:

$ marked -o test.html
# title
^D

I get:

$ cat test.html 
<h1>title</h1>

Which is correct.

If I change my test.js to the following:

var marked = require('marked');
console.log(marked('# title\n\ntest'));

It works properly:

$ node test.js
<h1>title</h1>
<p>test</p>

Arguably wrong treatment of code in link

% ../marked/bin/marked
[the `]` character](/url)
<p>[the <code>]</code> character](/url)
</p>

Markdown.pl's behavior is more intuitive:

% Markdown.pl 
[the `]` character](/url)
<p><a href="/url">the <code>]</code> character</a></p>

> should be escapable

% Markdown.pl 
\>
^ID
<p>></p>
% ../marked/bin/marked 
\>
<p>\&gt;
</p>

GitHub code fences should output pre

GitHub's code fences should output a pre element and not a code element, you can see that at the first lines of http://github.github.com/github-flavored-markdown/

Btw, thanks for an awesome module. It was a huge advantage to be able to use marked instead of other, more limited JavaScript Markdown parsers for my StyleDocco project.

Bad results on raw html from markdown test suite

% ../marked/bin/marked 
<div style=">"/>
^D
<p><div style=">&quot;/&gt;
</p>

empty tail on block quotes

If text have 2 empty lines after block quote, then empty line is attached to quoted text after rendering. That should not happen.

http://imm.io/cmoh

Crashes on input

% ../marked/bin/marked
\\[test](not a link)
^D
/Users/jgm/src/marked/lib/marked.js:386
        href: text[1],
                  ^
TypeError: Cannot read property '1' of null
    at Object.lexer (/Users/jgm/src/marked/lib/marked.js:386:19)
    at /Users/jgm/src/marked/lib/marked.js:578:18
    at /Users/jgm/src/marked/lib/marked.js:606:14
    at /Users/jgm/src/marked/lib/marked.js:652:10
    at write (/Users/jgm/src/marked/bin/marked:111:9)
    at ReadStream.<anonymous> (/Users/jgm/src/marked/bin/marked:101:7)
    at ReadStream.emit (events.js:61:17)
    at ReadStream._onReadable (net.js:652:51)
    at IOWatcher.onReadable [as callback] (net.js:177:10)

<pre> mis-parse?

it looks like a strong/em is incorrectly parsed out of the ascii flower <pre> element here:

https://github.com/mojombo/github-flavored-markdown/blob/gh-pages/_site/sample_content.md
vs raw:
https://raw.github.com/mojombo/github-flavored-markdown/gh-pages/_site/sample_content.md

what's interesting is that Github's own parser seems to trip up on the same thing. i don't believe this should be happening so maybe it's a bug?

php-markdown parses it correctly though.

thanks!

It would be nice to parce ```

Github understands ``` as block comments:

``` javascript
var a = 2;
console.log(a);
```

will produce

var a = 2;
console.log(a);

It would be nice to make marked understand such blocks. That's significant sometime, when you like to use existing readme in generated docs.

Full syntax for code spans not implemented

Read the syntax document's explanation of code spans carefully - it's a bit more complicated than marked seems to assume.

% Markdown.pl 
````` hi ther `` ok ``` `````
<p><code>hi ther `` ok ```</code></p>

% ../marked/bin/marked
````` hi ther `` ok ``` `````
<p><code>`</code> hi ther <code> ok </code><code> </code>````
</p>

Issue with lists

Input:

**v0.1.3** _(16 Apr 2012)_

* Removed `getFileName()` internal function.

**v0.1.2** _(15 Apr 2012)_

* The property value is now converted to a string before persisting.

**v0.1.1** _(15 Apr 2012)_

* Now it's possible to add comments for each key-value pair. A header comment can also be added.

**v0.1.0** _(15 Apr 2012)_

* First commit.

Output:

<p><strong>v0.1.3</strong> <em>(16 Apr 2012)</em>

</p>
<ul>
<li><p>Removed <code>getFileName()</code> internal function.</p>
<p><strong>v0.1.2</strong> <em>(15 Apr 2012)</em></p>
</li>
<li><p>The property value is now converted to a string before persisting.</p>
<p><strong>v0.1.1</strong> <em>(15 Apr 2012)</em></p>
</li>
<li><p>Now it&#39;s possible to add comments for each key-value pair. A header comment can also be added.</p>
<p><strong>v0.1.0</strong> <em>(15 Apr 2012)</em></p>
</li>
<li><p>First commit.</p>
</li>
</ul>

Expected:

<p><strong>v0.1.3</strong> <em>(16 Apr 2012)</em></p>
<ul>
<li><p>Removed <code>getFileName()</code> internal function.</p>
</li>
</ul>
<p><strong>v0.1.2</strong> <em>(15 Apr 2012)</em></p>
<ul>
<li><p>The property value is now converted to a string before persisting.</p>
</li>
</ul>
<p><strong>v0.1.1</strong> <em>(15 Apr 2012)</em></p>
<ul>
<li><p>Now it&#39;s possible to add comments for each key-value pair. A header comment can also be added.</p>
</li>
</ul>
<p><strong>v0.1.0</strong> <em>(15 Apr 2012)</em></p>
<ul>
<li><p>First commit.</p>
</li>
</ul>

Mark tables as "table" instead of "paragraph" at lexer?

Hi,

Would it be possible for you to mark tables using "table" instead of "paragraph" at the lexer? This would make my work a lot easier. I use your lexer to apply some custom transformations to the source and then output as HTML using your parser. Having tables marked properly would make my life a lot easier. :)

GFM code with syntax highlighting

GitHub flavored markdown supports syntax highlighted code like this:

/* comment */
var s = "string", i = 123;
function foo () {
   for (var x in xs) {
      console.log(x, xs[x]);
   }
}

This is not supported by marked. It should either state this or (better) implement it.

Speed?

Curious about how this compares to
https://github.com/visionmedia/node-discount

Either way, I like yours because I was able to install it, no C-compiling, yay!

Also, I like that GFM is on your roadmap :)
(Hopefully you can do the newer version that they use on the site, but if not that's cool too).

Cheers and thanks for working code!
D

"Lazy" form of block quotes not implemented

% Markdown.pl 
> hi there
bud

<blockquote>
  <p>hi there
bud</p>
</blockquote>

% ../marked/bin/marked
> hi there
bud

<blockquote><p> hi there</p></blockquote>
<p>bud
</p>

Links titles with parentheses are parsed wrongly

Example:

[Test](http://google.com "Google (Test)")

... should be:

Test

It doesn't work in GitHub's markdown parser too (Redcarpet), but it works in other markdown parsers like Showdown (JS) and RDiscount (Ruby).

Marked in the wild

Just wanted to let you know that we're using marked for previewing markdown on http://prose.io. In case you want to link to in-the-wild-examples. :)

Here's an example doc, rendered by marked.

http://prose.io/#substance/text/blob/gh-pages/README.md

Thank you! :)

Add newline between tokens

I see on line 506, in parse(), you separate tokens with a space which makes viewing the rendered HTML source really hard to read. If you could change that single space to a newline then it makes all the difference and is no more or less efficient in bandwidth as the resulting filesize is the same.

- return out.join(' ');
+ return out.join('\n');

GFM line breaks

GFM line breaks are not supported :

Line 1
Line 2

Sould be rendered as

Line 1<br/>Line 2

When the gfm flag is set to true.

sequence of <em>'s is mis-rendered in some cases.

by itself,

#### *expression<sub>1</sub>* *expression<sub>2</sub>* ... *expression<sub>n</sub>*

renders correctly as

<h4><em>expression<sub>1</sub></em> <em>expression<sub>2</sub></em> ... <em>expression<sub>n</sub></em></h4>

but in the context of https://raw.github.com/dmajda/pegjs/master/README.md renders as

<h4><em>expression<sub>1</sub></em> / <em>expression<sub>2</sub></em> / ... / <em>expression<sub>n</sub></em></h4>

maybe a parser issue?
thanks!

gfm

Part of the Github Flavored Markdown is to support output of links that will link to users, commit sha's, repo, and issues. This is my understanding, when reading through showdown.js which is included in 'github-flavored-markdown' available on NPM. Do you have any plan for the future to add these features to the parser?

Issue with ordered list immediately following unordered list

I would expect the following input:

* list
* list
* list

1. list
2. list
3. list

To produce an unordered list followed by an ordered list, but instead it produces a single unordered list (with a bit of funky spacing). Inserting an extra line break between the two resolves the issue.

GitHub handles this case as expected.

list
list
list

list
list
list

GFM Tables

I.e.

First Header | Second Header
------------ | -------------
Content Cell | Content Cell
Content Cell | Content Cell

becomes

First Header	Second Header
Content Cell	Content Cell
Content Cell	Content Cell

Provide some configurations for enabling/disabling some marked features

Sometimes you do not want to parse and convert all markdown into html.
Maybe someone do not want links,etc. to be parsed. node-discount does it this way.

Kind regards

node v0.6.11 test 11 and 16 are failing

missing <br> and <hr> respectively

Link titles in references can be in parentheses

The syntax doc is explicit on this. Many implementations don't notice.

% Markdown.pl
[hi]

[hi]: /url (there)
<p><a href="/url" title="there">hi</a></p>
% ../marked/bin/marked
[hi]

[hi]: /url (there)
<p>[hi]

</p>
<p>[hi]: /url (there)
</p>

optional escape

Hi!

If output from marked is fed further to syntax highlighter (say, [https://github.com/andris9/highlight]) the resulting output contains html entities codes instead of symbols.

Wonder if you could make call to escape() optional, or, better, make escape configurable from outside function.

update: marked.escape would fit perfectly, i believe.

TIA,
--Vladimir

Markdown in html tokens is processed

The Markdown formatting syntax should not be processed within block-level HTML tags, according to http://daringfireball.net/projects/markdown/syntax

html with line breaks

I have run across a slight problem with marked when trying to render html with line breaks between the attributes:

<input type="text"
  name="email"
  value=""
  id="email-input"
  class="etc" />

I know this is not a normal use case but as far as I know it is valid html, I am willing to submit a patch if I can get some pointers on where/ how to modify this behavior.

links parsed twice + nested if href == content?

[http://www.facebook.com/developers/](http://www.facebook.com/developers/)

gives

<p><a href="http://www.facebook.com/developers/"><a href="http://www.facebook.com/developers/">http://www.facebook.com/developers/</a></a></p>

should be

<p><a href="http://www.facebook.com/developers/">http://www.facebook.com/developers/</a></p>

Marked crash Firefox (Windows XP)

I've been using marked for a while, but today I was doing cross-platform test and I noticed that Firefox crashed when marked tried to parse my *.md file. Firefox just crash. I test it in chrome (still on windows XP) and it works fine. Also I test it on Firefox (Windows 7) and works fine. I don't know what is the main problem in here.

ps. Firefox v6.0

Improper parsing of nested strong/emph

% ../marked/bin/marked
*test **test***

<p><em>test *</em>test<em>*</em>

</p>

Of course it is arguable what counts as proper, since the standard is so vague. But I would expect (and good markdown implementations provide):

<p><em>test <strong>test</strong></em></p>

Accessing the inline lexer

It would be great if you somehow could access the inline lexer.

One example where this would be useful is to transform relative urls to absolute.

for token in tokens
  if token.type is 'link' and is_relative(token.href)
    token.href = base + token.href

Support more syntax variants

The popularity of Markdown has lead to several syntax variants extending the original Markdown syntax. Marked currently supports github_flavored_markdown. In adition there is at least Pandoc's markdown and StackExchange Markdown. The latter should be easy to implement, to support [tag:foo] links. Pandoc markdown may be more difficult but more useful too. See gitit for a use case. Maybe additional syntax variants are something for a 1.0 release?

Support Strikethrough?

Any chance you could add support for strikethrough? I know it's not in the official Markdown spec but it's something I need for my application.

Option to filter unsafe tags

I'm really surprised to see that marked will not filter out the <script> tag, and there's not an option to do it. If a website uses marked on its comment system, then this can be a very severe security issue. And the unfiltered style attribute for html tags can also be used by malicious visitors to make the site dysfunctional.

So, I wonder if there's a plan to add in options for safe parsing ?

And another problem: the lexer fails for the following case (the type should be html, not paragraph IMO)

> m.lexer('<script>document.write(\'Oops\')</script><br>')
[ { type: 'paragraph',
    text: '<script>document.write(\'Oops\')</script><br>' },
  links: {} ]

Chrome Renderer crashes

I've been doing a few tests with marked, and I've found that if I feed it large documents with embedded HTML (or plain HTML alone), it will crash Chrome Renderer.

I've been able to reproduce this reliably by force-feeding it raw HTML - notably the contents of the container div at http://oss.sapo.pt/ (which I'm re-doing using marked, BTW - you'll notice the site currently fetches Markdown and parses it on the fly using showdown...).

Further testing reveals that Safari and Firefox are immune to this, BTW - I can toss in the same documents and they come out fine.