Code Monkey home page Code Monkey logo

Comments (28)

josevalim avatar josevalim commented on August 16, 2024 1

I will have to think more about it. Right now, it is clear what we would get from it and its benefits, but we still need to decide if we want to go down this route and what coupling it introduces. :) Thank you!

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

Hi @pnezis, thanks for the proposal.

The example you proposed doesn't work semantically speaking (you can't invoke a function before you define it) but I think the embed idea is valid. I am wondering though if we need the embed:: prefix or if we just allow you to hijack anything. If we do the latter, we could even implement the current syntax highlighter as "embeds".

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

Hi @josevalim , thanks for the quick reply

The example you proposed doesn't work semantically speaking (you can't invoke a function before you define it

Correct, semantically the interpolation shouldn't work. It was just a demonstration of what we cannot do and how we can overcome this with the proposed "embed" functionality. Such a functionality could open up the road for auto-generated doc segments, for example you could have special embeds for ecto schemas:

defmodule User do
    @moduledoc """
    A `User` object

    ```ecto-schema
    function_that_auto_generates_ecto_schema_docs(User)
    ``` 
    use Ecto.Schema
    ... 
"""

Another example is the mermaid graph in the POC project, where this function doc:

image

is transformed to this rich version:

image

I am wondering though if we need the embed:: prefix or if we just allow you to hijack anything. If we do the latter, we could even implement the current syntax highlighter as "embeds".

You are right, we could hijack anything the embed:: prefix was just a POC suggestion. But I think we should limit it to fenced code blocks.

Also embeds could be exported from dependencies, which you could use directly in your project for enhancing the documentation, for example ecto could expose an embed that auto generates schema docs, and vega_lite could export an embed that pretty-prints a VegaLite plot.

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

What do you think about this:

  1. We introduce the concept of FenceProcessor.

  2. For each fenced code block, we call the FenceProcessor.process(format, fence, contents). You return {:ok, binary} | :skip.

  3. We allow folks to register custom fence processors to be called in order. The current syntax highlighter could be a fence processor (which we will use to dog food this API).

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

At which level do you see this to be applied? The syntax highlighter is applied once the original markdown has been transformed from AST to string and transforms the string to html.

I would expect this to be also applicable at the AST level. This way the output of the FenceProcessor could be plain markdown, e.g.

def process(_format, fence, contents) do
   """
   Returns a `markdown` formatted version of:

   ```elixir
   #{contents}
   ``` 
   """
end

We could add this in the ExDoc.Markdown.to_ast, something like:

defmodule ExDoc.Markdown

  def to_ast(text, opts \\ []) when is_binary(text) do
      {processor, options} = get_markdown_processor()

      text
      |> processor.to_ast(options |> Keyword.merge(opts))
      # add this
      |> apply_fence_processors(opts)
   end

   def apply_fence_processors(ast, opts)
        # call FenceProcessor.process on each valid fence block contents 
        # convert the returned binary to ast and replace the original
        # ast's code block with the new one 
   end 

Applying this to the string level is also valid. What if:

  • We make FenceProcessor a behaviour?
  • Add two optional callbacks:
    • ast_process
    • string_process

We allow folks to register custom fence processors to be called in order.

Shouldn't the fence processors be somehow mapped to the fence language, something like:

fence_processors: [
     elixir: [ExDoc.FenceProcessors.ElixirSyntaxHighlighter] 
     elixir_fancy: [
         ExDoc.FenceProcessors.ElixirSyntaxHighlighter,
         MyCustomElixirFenceProcessor
     ],
     vega_lite: [
          MyVegaLiteFenceProcessor
     ]
]

What do you think?

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

I don't want to pre-process the markdown because that comes with a lot of complexity.

Shouldn't the fence processors be somehow mapped to the fence language, something like:

Not necessary. The preprocessor will receive all fenced languages and, if it doesn't care about it, it will return :skip.

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

I don't want to pre-process the markdown because that comes with a lot of complexity.

Not the markdown but the AST, this seems to me straight forward (except if I am missing something). Otherwise how would the FenceProcessor support complex output? For example look at the implementation of the mermaid renderer in the demo project.

It returns a mardown string including two fenced blocks:

"""
```elixir
#{code}
```

```mermaid
#{mermaid}
```
"""

How would we handle this if the fence processing takes place at the string level? These fenced blocks should also be "fence processed".

Not necessary. The preprocessor will receive all fenced languages and, if it doesn't care about it, it will return :skip

Correct but you will lose composability this way. Imagine you have an Elixir fence processor and a generic processor that puts a code block in a fancy div. Now you want to use this fancy div only on some elixir code blocks.

By specifying an elixir_fancy language like this:

elixir: [ElixirFenceProcessor] 
elixir_fancy: [
      ElixirFenceProcessor,       
      FancyDivFenceProcessor
],

you will be able to have both block types in your codebase

```elixir
# this is the default elixir block
```

```elixir_fancy
# this will use the fancy processor on top of the default elixir processor
``` 

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

How would we handle this if the fence processing takes place at the string level? These fenced blocks should also be "fence processed".

Then you call the fences API recursively.

Correct but you will lose composability this way.

I am pretty sure you can call the fences API recursively here too. :) We will expose something like ExDoc.FencesProcessor.process(format, language, content).

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

Then you call the fences API recursively.

Correct but you will have to implement the markdown to html conversion again internally in your Processor. Right?

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

Yes, but I can't a good way to do pre-processing in Markdown, in a way we can inject HTML (necessary for syntax highlighting) and without coupling to Earmark. And your markdown example could easily return the HTML bits directly, no?

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

For the syntax highlighting use case, I totally agree this is the best way to proceed. But FenceProcessors could be something much more versatile.

And your markdown example could easily return the HTML bits directly, no?

It could return the HTML bits directly yes but I see some drawbacks here:

  • any change on the way ExDoc renders markdown would make the processors inconsistent
  • it wouldn't handle existing functionality e.g. autolinks
  • it would make the initial purpose of this proposal, actual code evaluation, very difficult since the transformed string will be escaped HTML.

Let's look at a concrete example. Let's imagine we want to implement a fence processor that:

  • Evaluates the inline content
  • Returns two fenced elixir blocks the first with the initial content and the second with the evaluated result
```inspect
Enum.map([1, 2, 3], fn x -> 2*x end)
```

Working with the AST

  • The implementation is straight forward since the content is verbatim the code in the original inspect block
def process(_format, "inspect", content) do
   {result, _} = Code.eval_string(code, [], __ENV__)

    """
    ```elixir
    #{code}
    ```

    ```elixir
    #{inspect(result)}
    ```
end
  • We only have to traverse the markdown ast, apply the processor on the matching pre blocks and replace the pre inspect block:
[
  {:pre, [],
   [{:code, [class: "inspect"], ["Enum.map([1, 2, 3], fn x -> 2*x end)"], %{}}],
   %{}}
]

with the processed one:

[
  {:pre, [],
   [{:code, [class: "elixir"], ["Enum.map([1, 2, 3], fn x -> 2*x end)"], %{}}],
   %{}},
  {:pre, [], [{:code, [class: "elixir"], ["[2, 4, 6]"], %{}}], %{}}
]
  • The output of the process function can be any markdown text, no need to handle the html internally, everything will be handled by ex_doc, autolinks will also work

Working on the string level

If I have understood correctly we will call the processors on the DocAST.to_string output so in this example the string will be:

<pre><code class=\"inspect\">Enum.map([1, 2, 3], fn x -&gt; 2*x end)</code></pre>

which means that the content part of the process will be Enum.map([1, 2, 3], fn x -&gt; 2*x end)

In this case we need to

  • unescape the string
  • apply the processor's logic
  • manually generate the html bits

What if we support both, e.g. add make the process signature:

FenceProcessor.process(format, fence, mode, contents)

where mode is one of :ast, :string?

Sorry for the back and forth, I am probably missing something.

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

We only have to traverse the markdown ast

I found the point of confusion. That's not the markdown ast, that's html. :) My proposal is that we will indeed match on the HTML AST but we will require an HTML string to be returned.

The output of the process function can be any markdown text, no need to handle the html internally, everything will be handled by ex_doc, autolinks will also work

Well, except it doesn't just work in corner cases. For example, what happens if the code you are inspect contains ``` in it? Now you need to escape it. I think Markdown is tricky as an auto-generated format due to all of its ambiguities.

Working with HTML is not much more complex, your inspect snippet would be implemented as:

def process(_format, "inspect", content) do
  {result, _} = Code.eval_string(code, [], __ENV__)

  with {:ok, content} <- ExDoc.FenceProcessor.format(format, "elixir", content),
       {:ok, result} <- ExDoc.FenceProcessor.format(format, "elixir", inspect(result)),
       do: {:ok, content <> result}
end

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

found the point of confusion. That's not the markdown ast, that's html. :)

That makes sense now, I was sure I was confused :)

Working with HTML is not much more complex

How would you handle a processor that wants to return something like:

def process(_format, "inspect", _content) do
    """
    This is a **rich** response, containing

    * lists
       * with sublists
    
    > Some quoted text

    ### A header

    This is an `Enum` (with an autolink) example:

    ```elixir
    Enum.map([1, 2, 3], fn x -> 2*x end 
    ```  
    """ 
end

For example, what happens if the code you are inspect contains ``` in it? Now you need to escape it.

Valid point

I think Markdown is tricky as an auto-generated format due to all of its ambiguities.

I agree but the docs are already in markdown. Also a lot of libraries have helper functions that generate markdown docs, take for example NimbleOptions.docs. You could use it directly in a fenced block without having to convert it to html.

def process(_format, "nimble_options", content) do
    {schema, _} = Code.eval_string(content, [], __ENV__)
    
    NimbleOptions.docs(schema)
end

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

How would you handle a processor that wants to return something like:

Do we have valid use cases for it? Perhaps the NimbleOptions is one but we can as easily interpolate it at compile-time nowadays?

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

It's me again @josevalim, now that I am thinking it again, if we expose an ExDoc.markdown_to_html we could handle both use cases:

def process(_format, "inspect", _content) do
    """
    This is a **rich** response, containing

    * lists
       * with sublists
    
    > Some quoted text

    ### A header

    This is an `Enum` (with an autolink) example:

    ```elixir
    Enum.map([1, 2, 3], fn x -> 2*x end 
    ```  
    """ 
    |> ExDoc.markdown_to_html()
end

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

Perhaps. The question is: can we expose something like that? I am afraid auto-linking requires further context that comes just later in the stack but I may be wrong.

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

Do we have valid use cases for it? Perhaps the NimbleOptions is one but we can as easily interpolate it at compile-time nowadays?

We can interpolate it at compile time if you define the schema as a module attribute. I often define schemas in a callback with complex logic which makes it impossible to interpolate. So I call the docs in another module and not the one where the schema is defined in order for interpolation to work.

Another use case is auto-generated documentation for ecto schemas. Currently I am having a function similar to NimbleOptions.docs:

defmodule MySchema do
    @moduledoc """
    #{ SchemaDocs.docs(__ENV__.file, "my_schema")}
    """
end

which loads the file, traverses the AST, and auto-generates the schema docs.

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

I am afraid auto-linking requires further context that comes just later in the stack but I may be wrong.

We can do it without having the context of the module. Which means local references won't work, which will likely break usage within NimbleOptions and perhaps Ecto schemas, so I don't think that's a viable path.

It seems in order to make this work we would have to couple several different parts of ExDoc and I am not sure the costs justify the benefits.

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

We can do it without having the context of the module. Which means local references won't work, which will likely break usage within NimbleOptions and perhaps Ecto schemas, so I don't think that's a viable path.

If you do the injection directly in the Markdown.to_ast this will work. Proof of concept:

  @doc """
  A `NimbleOptions` schema

  ```nimble_options
  MyModule.schema()
  ```
  """
  def schema do
    [
      name: [
        required: true,
        type: :atom,
        doc: "See also `echo_name/1`"
      ]
    ]
  end
image

Would you like to see a PR first and discuss the details there?

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

Yes, let's see a PR, because I can't see a straight-forward way of doing that, but maybe there is. :)

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

Thanks for the POC! I was missing that the autolinking actually happens in the HTML AST, so it is all good. I think we can do this contract:

{:html, content} | {:markdown, content} | {:next, content}

The first means we returned HTML, we return markdown, or we return updated content to continue to go through the pipeline (solving the recursion case).

WDYT?

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

I am not that familiar with the ex_doc codebase so correct me if I am wrong. I think the function under discussion is this one:

 defp autolink_and_render(doc, language, autolink_opts, opts) do
    doc # This is an HTML AST
    |> language.autolink_doc(autolink_opts)
    |> ExDoc.DocAST.to_string()
    |> ExDoc.DocAST.highlight(language, opts)
  end
  • If we want to support markdown expansion we need to update the HTML ast before the autolink_doc call
  • The syntax highlighting will take place after it.

So we need to expand the fences twice and we need to indicate at which point of the pipeline we are:

 defp autolink_and_render(doc, language, autolink_opts, opts) do
    doc
    |> apply_fence_processing(:markdown) # apply markdown expansion and update the AST like in the example
    |> language.autolink_doc(autolink_opts)
    |> ExDoc.DocAST.to_string()  # this will also apply fence processing internally and the current highlight
                                 # function will be an `elixir` FenceProcessor 
  end

So should the signature of the process be:

@spec process(format :: binary(), fence :: binary(), content :: {:html | :markdown, binary()} )

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

You are right. So basically my idea of integrating Makeup into this wouldn't actually work. We need to keep them as distinct operations.

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

Would you be interested on the functionality proposed by the PR #1719? IMHO it would help with auto-generated doc segments in many places.

If yes I can polish the existing PR, and create a new one for the syntax highlighting part.

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

Hi @josevalim, thinking it again I don't think this should be added as default behaviour to ex_doc. Since you can define markdown_processor I could just create an earmark wrapper that post-processes the ast.

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

@pnezis that's actually a great idea. Please give it a try and let me know if you run into any issues :)

from ex_doc.

pnezis avatar pnezis commented on August 16, 2024

@josevalim it works like a charm :)

Wrapper: https://github.com/pnezis/fancy_fences
Sample config on the POC project :

      markdown_processor:
        {FancyFences,
         [
           fences: %{
             "inspect" => {FenceProcessors, :io_inspect, []},
             "nimble_options" => {FenceProcessors, :nimble_options, []},
             ...
           }
         ]},

Thanks for all the help, feel free to close the issue

from ex_doc.

josevalim avatar josevalim commented on August 16, 2024

❤️ !!!

from ex_doc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.