Code Monkey home page Code Monkey logo

markdownfmt's Issues

Support Front Matter of Markdown

Now front matter like this:

---
weight: 3
title: "撰写"
bookToc: false
---

Will be ruined like this:

---

## weight: 3
title: "撰写"
bookToc: false

Make hash headers optional and default

Hash and underline headers result in the same parsed structure, so we should be able to manage and generate both.

Other parsers will default to hash headers, so I feel okay defaulting to the same (merged in #13).
This leaves underline headers as an optional generation target. An option flag fits here as a solution to the bike-shedding problem.

Possibility of a v3 release

This issue is discuss a v3 release.

First, why a v3 release?
We recently landed #56 which is technically a breaking change
in that it changes a behavioral contract quite significantly.
There are ways to make that change that are more backwards compatible,
like adding an alternative constructor (instead of NewRenderer) that doesn't format by default,
and that's still an option to consider if there's strong opposition to a v3.

However, if we do make a v3, that's an opportunity to make some other breaking changes.
Some other ideas:

  • Turn markdown.Option into an interface that complies with goldmark/renderer.Option. It does not have to use goldmark's renderer.Config struct to be used. Instead, AddOptions will just type-match and search for markdown.Option-compliant values in the list of renderer.Option values. Hiding the Option implementation is generally a good idea.
  • Reorganize package structure a tad bid: make root directory the 'markdownfmt' package, and add a cmd/markdownfmt subpackage for the CLI. Right now, trying to run go build inside root tries to generate a 'markdownfmt' binary which conflicts with the existing markdownfmt/ package directory.

Feature: hard wrap lines to specified length

There was an issue for this, if I'm reading it correctly, but it was closed because the issue filer now uses an IDE to do this.

I'd like a flag like -wrap <int> to set a maximum line length, with markdownfmt reflowing paragraphs automatically to fit the limit.

Something like the following logic should be appropriate:

  • Never reflow literal environments (code blocks, tables, etc).
  • Reflow quote environments to maintain the indentation of the first line at the current depth, and prefix continuing lines with "> ".
  • Reflow bulleted / numbered environments by reflowing each item separately, maintaining the indentation of the opening text of the item on succeeding lines.
  • Reflow all other environments, inserting a newline at a word boundary when adding the next word would exceed the character limit.
    • Always insert at least one word (or URL, image, or other entity) on each new line, even if that would cause the line to exceed the character limit.
    • Count characters using grapheme clustering, eg. with uniseg.

Cleanup Renderer implementation

1, There are some TODOs still in code:

// TODO: Clean these up.
	headers      []string
	columnAligns []extAST.Alignment
	columnWidths []int
	cells        []string

As well as minor inconsistencies.

  1. For example, buf usage is quite ambiguous. At some point, we hold so many of those and I am not sure if there is a strong reason. (e.g why we cannot use just resBuf instead of mr.buf). The problem with mr.buf is that this variable is suddenly used differently depending on the context which can lead easily to side effects etc. It would be nice to scope buf only to mr.RenderSingle to ensure we can e.g render things concurrently at some point!
func (mr *Renderer) renderChildren(source []byte, node ast.Node) []byte {
	oldBuf := mr.buf
	mr.buf = bytes.NewBuffer(nil)
	mr.normalTextMarker = map[*bytes.Buffer]int{}
	resBuf := bytes.NewBuffer(nil)
	for n := node.FirstChild(); n != nil; n = n.NextSibling() {
		ast.Walk(n, func(n ast.Node, entering bool) (ast.WalkStatus, error) {
			return mr.RenderSingle(resBuf, source, n, entering), nil
		})
	}
	resBuf.Write(mr.buf.Bytes())
	mr.buf = oldBuf
	return resBuf.Bytes()
}
  1. Is there any reason RenderSingle is public?

Do not delete soft lines (optionally?)

Currently we put together newlines. That is

foo bar
baz baz

will be formatted as

foo bar baz baz

This is basically correct.

A conforming parser may render a soft line break in HTML either as a line break or as a space.

https://spec.commonmark.org/0.29/#soft-line-breaks

However, as the spec says also,

A renderer may also provide an option to render soft line breaks as hard line breaks.

I have now saw an issue in one project where those soft lines were actually interpreted as hard lines, and this option was "on by default", so these soft lines were actually important. (Related to nhn/tui.editor#1347 )

We can see goldmark for inspiration; goldmark has option WithHardLines for renderer; we can copy that directly almost.

heading classes/attributes not preserved

Here’s a minimum reproduction example:

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/Kunde21/markdownfmt/v2/markdown"
	"github.com/yuin/goldmark"
	"github.com/yuin/goldmark/parser"
	"github.com/yuin/goldmark/text"
)

func repro() error {
	p := goldmark.DefaultParser()
	p.AddOptions(parser.WithAttribute())
	source := []byte(`# document heading {#bleh .since_417 translated=yeh} 
hello world`)
	fmt.Printf("--- input ---\n")
	fmt.Println(string(source))
	doc := p.Parse(text.NewReader(source))

	fmt.Printf("\n--- goldmark render to HTML ---\n")
	rnd := goldmark.DefaultRenderer()
	rnd.Render(os.Stdout, source, doc)

	fmt.Printf("\n--- markdownfmt render to markdown ---\n")
	mdrnd := markdown.NewRenderer()
	mdrnd.Render(os.Stdout, source, doc)

	return nil
}

func main() {
	if err := repro(); err != nil {
		log.Fatal(err)
	}
}

prints:

--- input ---
# document heading {#bleh .since_417 translated=yeh} 
hello world

--- goldmark render to HTML ---
<h1 id="bleh" class="since_417">document heading</h1>
<p>hello world</p>

--- markdownfmt render to markdown ---
# document heading {#bleh}

hello world

Detach the fork

Seeing as this variant of markdownfmt is maintained independently of the original blackfriday-based version and as far as I can tell, there are no plans to merge this work back, we should consider detaching the fork. This will get rid of the "forked from" message at the top of the page, and the following message in the tree browser:

This branch is 43 commits ahead, 13 commits behind shurcooL:master.

The process for doing this is slightly manual; it's documented at https://ralphjsmit.com/unfork-github-repo.

Disable formatting of code blocks by default

Creating this with regard to this comment originally posted by @karelbilek in #38 (comment):


To quickly explain - the original usage for this was mainly for go samples in code documentation.

I think the original behaviour doesn't need to be enabled by default... I think default behaviour should be to just not do any reformatting.

(I am surprised that Uber is using our stuff, but also happy! :D )

Btw maybe some added tests would be nice...


So this issue is to discuss/track whether the auto-formatting of code blocks should be disabled by default.

My opinion: I agree with @karelbilek. By default code blocks should be left alone, reproduced exactly as-is in the final output.

add original authors to license

I think we should put the original authors to (C) somewhere in the license.

The original license was MIT that requires us to credit them

error when formatting list

Revision: 3cd4054

Example input, test.md:

# h1

this is a list

* item 1

* item 2

Command: cat test.md | ./markdownfmt

Expected behavior: the output is identical as input.

Actual behavior: output

# h1

this is a list

*
item 1
EOL

*
item 2
EOL

Nested list elements get indented by two spaces instead of four, which breaks certain renderers

I'm having an issue with v2.1.0 (can't switch to v3.0.0 as I have to support Go < 1.18):

Given this list (with the spaces replaced with Unicode Open Box characters for illustration purposes):

- An outer list element

␣␣␣␣- An inner list element.
␣␣␣␣- Another inner list element.

- The second outer list element.

processing with markdownfmt yields

- An outer list element

␣␣- An inner list element.
␣␣- Another inner list element.

- The second outer list element.

which makes certain compliant renderers, namely, https://github.com/vsch/flexmark-java produce a single list of four elements instead of an outer list of two elements, in which the first one contains a nested list.

I have found this section in the CommonMark 0.30 spec which says something about indentation in this mode, but honestly, I failed to grasp a clear idea from it, and anyway I'm not sure whether markdownfmt tries to follow CommonMark in this particular case or I've spotted a bug.

If it's an attempt to implement a spec diverging from the original installment then the question is: is it possible to implement a renderer option controlling this behavior?

Honestly, once I've read the original Markdown spec, I've always used a TAB or four spaces for indentation purposes, so to me, the described behavior looks like a bug, but given that CommonMark's gobbledygook, I'm not so sure 🤷

Maintenance.

Sorry for the spam, but not sure if you saw this.

As mentioned on some PR @Kunde21 @karelbilek, I am happy to help in maintaining this project if needed. 🤗 Not sure if my perception is right but it looked like some fun experiment, but actually we plan to use it.. "on production" if you can call the CI of a few of the bigger CNCF projects' CIs a production.

markdownfmt removing all emphasis

Revision: 429996b

Example input, test.md:

_transaction_

*transaction*

**transaction**

Command: cat test.md | ./markdownfmt

Expected behavior: the output is identical as input.

Actual behavior: output

transaction

transaction

transaction

Drop use of pkg/errors

The Markdown renderer uses github.com/pkg/errors to wrap errors.
This is unnecessary with Go 1.13's error wrapping support: fmt.Errorf("foo: %w", err) gives the same ability.
On the other hand, pkg/errors' Wrap function always captures the stack trace (which is expensive).
Further, pkg/errors is now frozen and archived so it's unlikely to see any improvements in the future.

Project Goal/Roadmap? Potential Collaboration or known fork

Hi @Kunde21 and @bwplotka I've forked this project, since it seems our goals may be different. If they aren't though, I'd rather work with others instead of alone. It seems your current goal is a formatter matching goldmark's feature set? Do you plan to add other extensions? I'm adding all goldmark extensions, plus these:

Example of different goals (I'm sure there are more) from /markdownfmt/testfiles/example1.input.md

Paragraphs will be also concatenated for clean view.
However, it might be not easy to edit it via editors, so you can specify text line width to be ensured. It also makes sure words are together,so it will ensure wanted line length as you wish.

I want to keep that soft linebreak in the markdown after formatting, it is especially important in the case of quotes.

How to change Italics character

Is it possible to change the delimiter used for italics to _ instead of *?

While both are allowed by the spec, I find using _ personally more comfortable. Is there an option to change what delimiter is used for italics?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.