Code Monkey home page Code Monkey logo

gmnhg's Introduction

Hugo-to-Gemini converter

PkgGoDev Go Report Card Push to GHCR

This repo holds a converter of Hugo Markdown posts to text/gemini (also named Gemtext in this README). The converter is supposed to make people using Hugo's entrance to Project Gemini, the alternate web, somewhat simpler.

The renderer uses the gomarkdown library for parsing Markdown. gomarkdown has a few quirks at this time, the most notable one being unable to parse links/images inside other links.

At this time, gmnhg can convert these Markdown elements to Gemtext:

  • paragraphs, converting them to soft wrap as per Gemini spec p. 5.4.1;
  • inline text formatting (bold, emphasis, strikethrough, code, subscript, superscript), which stays in the text to preserve stylistic context;
  • headings;
  • blockquotes;
  • preformatted blocks;
  • tables, displayed as ASCII preformatted blocks;
  • lists (as Gemini doesn't allow lists of level >= 2, those will be reflected with an extra indentation level): ordered, numbered, definition;
  • links and images, rendered as Gemtext links (inline links are rendered after their parent paragraph or other block element in a links block sorted by element type);
  • footnotes, rendered as paragraphs;
  • horizontal rules.

The renderer will also treat lists of links and paragraphs consisting of links only the special way: it will render only the links block for them.

To get a better idea of how source Markdown looks like after the conversion to Gemtext, see testdata directory.

gmnhg

This program converts Hugo Markdown content files from content/ in accordance with templates found in gmnhg/ to the output dir. It also copies static files from static/ to the output dir.

For more details about the rendering process, see the doc attached to the program.

Usage of gmnhg:
  -output string
        output directory (will be created if missing) (default "output/")
  -working string
        working directory (defaults to current directory)

md2gmn

This program reads Markdown input from either text file (if -f filename is given), or stdin. The resulting Gemtext goes to stdout.

Usage of md2gmn:
  -f string
        input file

md2gmn is mainly made to facilitate testing the Gemtext renderer but can be used as a standalone program as well.

Site configuration

gmnhg will pick up some attributes such as site title, base URL, and language code from your Hugo configuration file (config.toml, config.yaml, or config.json). Presently these are used in the default RSS template.

gmnhg provides a way to override these attributes by defining a gmnhg section in the configuration file and nesting the attributes to override underneath this section. Presently you can override both baseUrl and title in this manner.

For example, you could add the following to your config.toml to override your baseUrl:

[gmnhg]
baseUrl = "gemini://mysite.com"

This is recommended, as it will ensure that RSS links on your Gemini site use the correct URL.

License

This program is redistributed under the terms and conditions of the GNU General Public License, more specifically version 3 of the License. For details, see COPYING.

gmnhg's People

Contributors

mntn-xyz avatar tdemin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

mntn-xyz

gmnhg's Issues

Change to parsing of config.toml/yaml/json - add "overrides" for values used by gmnhg

In implementing the RSS rendering I introduced geminiBaseURL in config.toml. This works but it doesn't feel very clean. What do you think about allowing "overrides" in config.toml using dictionaries/tables?

So if I had the following:

baseUrl = https://www.example.com
title = "My website"

Then the RSS renderer would use these values for the base URL and title. But if I add overrides:

baseUrl = https://www.example.com
title = "My website"

[gmnhg]
baseUrl = https://gemini.example.com
title = "My capsule"

Then anything defined under gmnhg will override the default values. (Hugo will of course ignore these values.) The equivalents for yaml/json would work the same way.

This feels a lot cleaner to me. Thoughts?

Automatic processing (dithering) of images/preview images

Perhaps this is better for a plugin, or another program entirely, but I got this idea after seeing the Gemini "imageboard" someone just made (iich.space/img) and I couldn't help but share it.

What if there were an option to either generate dithered "preview" images (or to replace the images entirely) via this library: https://github.com/makeworld-the-better-one/dither

I'm more partial to generating preview images, because some things like technical diagrams wouldn't work so well with dithering. But I can imagine that people running their Gemini sites on low bandwidth/low power servers might want the option to shrink all their images.

Inherited index templates

With the metadata now being the same across all indices, I was wondering if it would make sense to have subdirectories inherit their template from parent directories by default. This would let users make a template that would be inherited across a subdirectory tree. With a little bit of work, it would also allow the default template to generate indices across the entire site.

The main argument against this is that some people may not want an index for certain directories, or they may want to use the default directory index generated by their server. I think we could get around this by either treating an empty template file as a "do not render" instruction, or by looking for a specific file (directoryname.norender?) that turns off index generation for a specific tree.

H4-H6 should be handled per Gemini spec

First of all, thanks for making this! I plan on using it in a project.

The Gemini spec only supports three levels of headings: "Headings are limited to a single line and start with either one, two or three # symbols followed by one mandatory space character"

The extra #s generated by gmnhg seem to be inconsistently handled by some clients I've tried. It seems that the generated headings should be limited to three levels, although it would be nice to include some additional markup to help people distinguish H3 from H4-H6. It could be as simple as this:

### # Heading 4
### ## Heading 5
### ### Heading 6

Or maybe another character would be better?

### + Heading 4
### ++ Heading 5
### +++ Heading 6

Optionally generate pages for taxonomies (tags, categories, etc)

https://gohugo.io/content-management/taxonomies/

I'd suggest duplicating the Hugo defaults, and using the defaults (tags/categories) OR the list from [taxonomies] if present, plus any exclusions from [disableKinds]. This could probably be handled in the same way as baseUrl and title, where overrides can be established in the [gmnhg] section.

Taxonomy index templates would be passed a list of pages with metadata, just like an index page.

I'll probably work on a PR for this when I have some free time.

Support TOML / JSON / org-mode post front matter

Hugo supports Markdown front matter for pages in TOML, YAML, JSON, or org-mode formats. They are specified like this:

TOML (+++ separator):

+++
key = value
+++
# Post content

Text.

org-mode (notice the lack of newlines between the last line of org-mode keywords and post content itself):

#+KEY: value
# Post content

Text.

JSON (identified as a JSON object with a newline after it):

{
"key": value
}

# Post content

YAML front matter is already implemented in gmnhg.

The front matter parsing code should also be removed from the library itself, as it's never been relevant to the Markdown -> Gemtext converter itself.

Render hooks for custom formatting of links

This may not be possible without ugly hacks, but I wanted to generate some discussion around it since it would be a useful feature.

There are some sites that publish on both Gemini and the web. If I am linking to such a blog from the web, I'll probably use the web link. If I'm linking from Gemini, I probably want to use the Gemini link. I wonder if there's any way to provide both links in such a way that the Hugo renderer will output the HTTP(S) link while gmnhg will output the Gemini link.

I looked at attributes, but they can't yet be applied to links in Commonmark (goldmark) or blackfriday (gomarkdown), although there has been discussion about it: see commonmark/commonmark-spec#105 and russross/blackfriday#181

The one thing that is supported by both Commonmark and blackfriday, and appears to be supported in gomarkdown, is the "title" attribute. I suppose that could be abused to hold an alternate Gemini link, and on the Hugo side a custom link template (layouts/_default/_markup/render-link.html) could be used to ignore that attribute entirely. But this feels wrong.

Here's what it would look like:

[My wish-list for the next YAML](https://drewdevault.com/2021/07/28/The-next-YAML.html "gemini://drewdevault.com/2021/07/28/The-next-YAML.gmi")

I don't think that the markup looks so bad, it's just that the abuse of the title attribute feels dirty. And of course there must be someone out there who uses the title attribute for legitimate purposes.

Any thoughts on a better approach for this? Or is this best shelved for some future date when link attributes are available?

Make front matter data accessible to index pages

gmnhg currently assumes users will type index page title and other metadata unrelated to content right in the Markdown file, essentially controlling rendering by themselves. This makes a user unable to use _index.md as the single source of index content for both the Gemini and the Web site.

This is partially why _index.gmi.md was a thing at all: not providing the user with the means to render metadata which would usually be controlled by the template would mean it would require an additional copy of the index page will the metadata tossed in.

Tables do not render

Tables don't currently render. My suggestion is to print them inside a preformatted text block:

| Syntax      | Description |
| ----------- | ----------- |
| Header      | Title       |
| Paragraph   | Text        |

This should be legible in all clients and should translate more or less directly from Markdown. Using preformatted text will ensure that extra-long tables don't wrap.

One approach would be something like this:

  • Iterate over the table, counting characters in each cell to determine the maximum width of each column.
  • Render each row, padding each cell to the maximum width of the column.
  • While rendering rows, add in the header separator, borders, and border spacing as needed. The number of dashes in the header separator should be the same as the maximum width of the column.
  • To handle cell alignment, either do nothing (left aligned), move the cell padding to the left side (right aligned), or distribute cell padding evenly on left and right (center aligned).

Technically gomarkdown supports colspan > 1, but I haven't seen this in Hugo markdown and I'm not sure it's even supported. An initial implementation could probably ignore this for simplicity. Hypothetical tables with colspan > 1 would still render, they would just be misaligned.

Problem with footnotes

Not sure what's happening here, will take a look when I have some time.

Markdown:

Itatur? Quiatae cullecum[^1] rem ent aut odis in re eossequodi nonsequ idebis ne sapicia[^2] is sinveli squiatum, core et que aut hariosam[^3] ex eat.

[^1]: Example footnote
[^2]: Example footnote with link to [example.com](https://example.com)
[^3]: [example.com](https://example.com)

Output:

Itatur? Quiatae cullecum[^1] rem ent aut odis in re eossequodi nonsequ idebis ne sapicia[^2] is sinveli squiatum, core et que aut hariosam^3 ex eat.

=> [example.com](https://example.com) ^3

[^1]: Example footnote [^2]: Example footnote with link to example.com

=> https://example.com example.com

Migrate to Goldmark

gomarkdown, a fork of Blackfriday, is a large source of weird behavior (#33, #6, footnotes-oneliners, etc, etc), and can possibly lead to gmnhg flavor of Markdown being incompatible with current Hugo's defaults in unexpected ways.

As Hugo migrated to yuin's Goldmark since v0.60, there's little reason to stay with gomarkdown.

Shortcode templates

Just wanted to put this out there for consideration... it would be helpful to have some kind of shortcode processing in gmnhg. Given the variety of shortcodes available, as well as the ability for people to define custom shortcodes, it would make sense to build this around the idea of user-defined templates.

Here's one approach:

  • Shortcode templates will be placed in a specified folder and must be named to match the desired shortcode (highlight.go would match the built-in {{< highlight >}} shortcode, and foo/bar.go would match {{< foo/bar >}}).
  • Before parsing with gomarkdown, input files are parsed for shortcodes. This process will have to take into account unpaired, paired, and self-closing shortcodes. The templates themselves will have to handle "nested" shortcodes; the parser will not bother with this. This should simplify parsing somewhat.
  • Each shortcode template presents a common method with arguments that represent both shortcode parameters (if any) and inner content (if any). The method should return the processed text for rendering.
  • Depending on whether the shortcode was called with {{% %}} (further rendering requested) or {{< >}} (present content without further rendering), the template output will either be inlined with the text and passed to gomarkdown, or it will be set aside for inclusion in the final rendered text as-is. The latter could be accomplished by replacing the template with some kind of temporary token like {{<1>}}, which will be replaced after rendering; there may be better approaches as well.

List of links should be formatted as links

Markdown:

* [Gemini specification](https://gemini.circumlunar.space/docs/specification.gmi)
* [Gemtext markup specification](https://gemini.circumlunar.space/docs/gemtext.gmi)

Output:

* Gemini specification
* Gemtext markup specification

Proposed output:

=> https://gemini.circumlunar.space/docs/specification.gmi Gemini specification
=> https://gemini.circumlunar.space/docs/gemtext.gmi Gemtext markup specification

My suggestion if a list contains both link-only and non-link items is to split the list up. Not perfect, but at least no information is lost. Example:

Markdown:

* Additional info [TBD]
* [Gemini specification](https://gemini.circumlunar.space/docs/specification.gmi)
* [Gemtext markup specification](https://gemini.circumlunar.space/docs/gemtext.gmi)
* Item with [inline link](https://www.example.com)
* Even more links coming soon...

Proposed output:

* Additional info [TBD]

=> https://gemini.circumlunar.space/docs/specification.gmi Gemini specification
=> https://gemini.circumlunar.space/docs/gemtext.gmi Gemtext markup specification

* Item with inline link

=> https://www.example.com inline link

* Even more links coming soon...

From glancing at the list/link code, I think it's supposed to work this way already, but there's a bug somewhere. I'll dig into it more when I have time.

Links are stripped from footnotes

Original Markdown:

Itatur? Quiatae cullecum[^1] rem ent aut odis in re eossequodi nonsequ idebis ne sapicia[^2] is sinveli squiatum, core et que aut hariosam[^3] ex eat.

[^1]: Example footnote
[^2]: Example footnote with link to [example.com](https://example.com)
[^3]: [example.com](https://example.com)

Output:

Itatur? Quiatae cullecum[^1] rem ent aut odis in re eossequodi nonsequ idebis ne sapicia[^2] is sinveli squiatum, core et que aut hariosam[^3] ex eat.

[^1]: Example footnote
[^2]: Example footnote with link to example.com
[^3]: example.com

Note that the links in the footnotes are completely lost. I guess it would make sense to put the link right after the footnote, like this:

Itatur? Quiatae cullecum[^1] rem ent aut odis in re eossequodi nonsequ idebis ne sapicia[^2] is sinveli squiatum, core et que aut hariosam[^3] ex eat.

[^1]: Example footnote
[^2]: Example footnote with link to example.com

=> https://example.com example.com

[^3]: example.com

=> https://example.com example.com

Unclear documentation about templates and "top"

I'm struggling to understand how the layouts work. I want to generate indexes but I don't understand where the "top" directory is supposed to be, I tried multiple places with no luck. The documentation is unclear and should specify more things (preferably it shouldn't be a series of comments inside the code too because the GitHub Wiki feature is made for that).

Image issues

Images on my site don't seem to work in any clients, either on my live site or when testing locally. (I've just started dithering the images on my site, but this was also an issue before that.) I'm starting to wonder if gmnhg could be doing something to the images during the copy process that is messing with the ability of clients to display them? The thing is, the images display perfectly fine if I open them from the filesystem.

If you want to see an example, check out gemini://mntn.xyz/test-markdown-syntax and look under "Images." Or for a direct link: gemini://mntn.xyz/test-markdown-syntax/PIXNIO-2545657-5760x3840.gif

Server returns code 20 (ok) when the image is loaded, so it's not that. If I download the image using a client like gmni and view it externally, it displays just fine. If I try to view it in Lagrange or Ariane, it doesn't work. Extremely weird behavior.

Edit: only thing I can think of is that maybe gmnhg is chopping off a byte or something like that. Something that more forgiving image viewers might be able to handle but which the clients cannot.

add option to render links as they are on markdown

basically when it creates the links at the button it will break the view and format.
I have a list of software I use, is a long list.. like:

  • Desktop:
  • Web:
  • rss feeds:

doing the links at the button of the page is super annoying for someone that just wants to go to a section and click on the software link there as I have it on markdown.

as an example this is the result from parsing this:
https://git.sr.ht/~rek2/dotfiles/tree/main/item/README.md
to this:
gemini://rek2.hispagatos.org/software.gmi

in the mean time I found this other tool that does exactly what I need https://github.com/makeworld-the-better-one/md2gemini

I rather use a compiled tool in GO or RUST or C etc so as soon if you ever do add this feature I will switch. thanks

HTML tags in blockquotes are not stripped

Initially discovered in #5.

Despite (Renderer).paragraph() utilizing (mostly) the same logic as (Renderer).blockquote(), raw HTML is stripped from text paragraphs, but not from blockquotes. Appears to be a gomarkdown issue.

Blockquote with an HTML line break

Blockquotes

Just wanted to note a couple of issues:

  • Multiline blockquotes don't render properly; renderer should be able to handle line breaks and blank quoted lines
  • HTML tags in blockquotes are not stripped (the typical Hugo markdown example page uses <br> and <cite> which are present in the output)

Task list formatting

These should probably be formatted as regular lists, right now they are collapsed into a single line.

Markdown:

Example task list
- [x] Theme website
- [x] Write formatting test page
- [ ] Fix the bugs

Output:

Example task list - [x] Theme website - [x] Write formatting test page - [ ] Fix the bugs

Atom feeds

Creating "gemfeeds" is easy enough, but CAPCOM requires Atom feeds. I was thinking that maybe I could add Atom feed generation through a new type of template. Gmnhg could look for gmnhg/index.xml.gotmpl, and if it exists it would use that file for generating the feed /index.xml. Sub-feeds would be automatically generated for each branch node and placed into the proper folder, for instance /posts/index.xml would contain a feed with only posts.

Future enhancement for someone who wants to write it: add top/posts.xml.gotmpl to override the feed template for posts, top/series/one.xml.gotmpl to override the feed template for series/one, etc. I don't feel that this is necessary for a 1.0.

Thoughts? I'd be glad to work on a patch.

Escape URIs in links

In Gemtext, the URI and the alt text are separated with a space, meaning links containing spaces will be broken after render. Minimal reproduction sample:

[I am a link with spaces](content/filename with spaces.gmi).

A single pass of net/uri.PathEscape() should be enough.

Write autotests for renderer

This is the thing md2gmn was supposed to assist with, but manually testing input Markdown with it was never super convenient.

internal/renderer should include a number of test cases for at least the most commonly used Markdown features and their Gemtext representations, setting the baseline of how we expect the processed Gemtext to look like.

Definition list formatting

Might be nice to change up the formatting, maybe the term could be on one line and the definitions could be list items, with a space between terms? List type can be determined through a List's ListFlags.

Markdown:

saudade
: (n.) in Portuguese culture, a deep emotional state of melancholic longing for a person or thing that is absent

agape
: (n.) a Greek word meaning “sacrificial love”
: (adj.) wide open

Current output:

* saudade
* (n.) in Portuguese culture, a deep emotional state of melancholic longing for a person or thing that is absent
* agape
* (n.) a Greek word meaning “sacrificial love”
* (adj.) wide open

Proposed output:

saudade
* (n.) in Portuguese culture, a deep emotional state of melancholic longing for a person or thing that is absent

agape
* (n.) a Greek word meaning “sacrificial love”
* (adj.) wide open

Problem with bullets in blockquotes

I was quoting a post where someone used bullets and it concatenated all the bulleted text onto a single line.

This is an example of a broken blockquote

  • Item 1
  • Item 2
  • Item 3

Becomes

This is an example of a broken blockquote

Item 1Item 2Item 3

I'll look at it when I have some free time, I'm sure it's something about the handling of lists inside of blockquotes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.