Code Monkey home page Code Monkey logo

mlc's People

Contributors

alexiri avatar becheran avatar dependabot[bot] avatar diegorondini avatar g3rv4 avatar hoijui avatar melvillian avatar paulhazen avatar rubo3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mlc's Issues

List of features

Could you maybe add a list of features to the README?

Things I'd be interested in knowing:

  • Does it only check links to full URLs, or also links to local files?
  • Does it check references? (the part of a URL after the #)
  • Does it check HTML links in Markdown files?

Surely there is more that would make sense in the list, but these are most interesting for me.
I need a tool to check consistency of machine documentation, and local files and references are very important for that.

Allow specifying HTTP request parameters

Is your feature request related to a problem? Please describe.
Some URLs require specific HTTP request parameters.
One example is the github docs pages, for example this .md will fail:

$ cat mdtest.md 
= Test =

[Github docs link](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository)

$ mlc

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                                                          +
+            markup link checker - mlc v0.15.2             +
+                                                          +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[Err ] ./mdtest.md (3, 1) => https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository - 403 - Forbidden

Result (1 links):

OK       0
Skipped  0
Warnings 0
Errors   1


The following links could not be resolved:

./mdtest.md (3, 1) => https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository.

The reason is that the page requires specific HTTP headers:
community/community#14773

Describe the solution you'd like
It would be nice to have a way to specify HTTP request parameters, possibly per-URL.

Panic when deleting file while checking

thread 'main' panicked at 'File could not be opened.: Os { code: 3, kind: NotFound, message: "Das System kann den angegebenen Pfad nicht finden." }', src\libcore\result.rs:1165:5
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace.

Support reporting redirections

Is your feature request related to a problem? Please describe.
I'd love to have the ability to report redirections as failures. If there's a 301 redirect in place, I'd love to flag it as broken so that I can fix and link to the appropriate place.

Describe the solution you'd like
An extra flag --report-redirections could work. Maybe it could even be --treat-status-codes-as-failure where we could pass status codes that are not successful from our PoV. I'd do --treat-status-codes-as-failure "301,307"

Describe alternatives you've considered
I looked at the different cli options, but I couldn't find something that helps here

Additional context
I really like to reduce points of failure :) I have 0 experience with rust, but if this sounds like something you'd like to support I'd be happy to attempt at a PR.

Support for ignore link list from a file

Is your feature request related to a problem? Please describe.
At the moment the links to ignore are provided as option parameters on the command line:

mlc -i '<mylinktoignore>'

This is inconvenient if mlc is run during a pipeline, as adding / removing a link to ignore requires changing the pipeline config file.

Describe the solution you'd like
Allow to specify a file that provides the list of links to ignore. This would allow to just edit the file to change the list of links to ignore, like when you edit .gitignore to specify the files / dirs that should be ignored by git.

Evaluate keeping compatibility with some not-so-old glibc

If you try to run the precompiled mlc-x86_64-linux binary on latest stable Debian 11 bullseye you'll get:

# ./mlc
./mlc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./mlc)
./mlc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./mlc)
./mlc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./mlc)

This is due to Debian 11 using glibc 2.31:

# ldd --version
ldd (Debian GLIBC 2.31-13+deb11u5) 2.31
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

The glibc version on Debian is not so old, so it would be nice to build the mlc binary with compatibility for Debian stable.

A similar issue is discussed here:
atanunq/viu#68

Thank you

Directory parameter is ignored if provided after an option

Describe the bug
Unlike what is described in the --help, the directory parameter, at least in the following specific case, seems to be ignored when provided after an option.

To Reproduce
The following command reports broken links that are in <workdir>, rather than just <workdir>/<myfolder>:

mlc -i '<mylinktoignore>' <myfolder>

This one works as expected (only broken links from <workdir>/<myfolder>):

mlc <myfolder> -i '<mylinktoignore>'

Expected behavior
Syntax indicated by:

mlc --help
...
USAGE:
    mlc [FLAGS] [OPTIONS] [directory]
...

should work.

Links with spaces in not parsed correctly

Describe the bug
A link with a space in the filename, such as <a href="test file.html"> is parsed as pointing to test. A link with an escaped space is parsed as the full filename, but the escape is not converted back to a space so it expects test%20file.html to exist on disk.

To Reproduce

mlc/jms_test % ls
test.html  testing 2.html
mlc/jms_test % cat test.html
<html>
    <body>
        <a href="testing 2.html">hello</a>
        <a href="testing%202.html">hello</a>
    </body>
</html>
mlc/jms_test % mlc

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                                                          +
+            markup link checker - mlc v0.16.1             +
+                                                          +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[Err ] ./test.html (4, 3) => testing%202.html - Target filename not found.
[Err ] ./test.html (3, 3) => testing - Target not found.

Result (2 links):

OK       0
Skipped  0
Warnings 0
Errors   2


The following links could not be resolved:

./test.html (4, 3) => testing%202.html
./test.html (3, 3) => testing

Expected behavior

  • Spaces in filenames are parsed as part of lines
  • and/or %20 in filenames are converted to spaces before checking local filenames (and maybe other escapes too)

Desktop (please complete the following information):

  • OS: macOS
  • Browser N/A
  • Version N/A

Additional context
I added the following test to validate my theory, but my rust is not good enough to fix the bug :(

diff --git a/src/link_extractors/html_link_extractor.rs b/src/link_extractors/html_link_extractor.rs
index f475514..1d83f8b 100644
--- a/src/link_extractors/html_link_extractor.rs
+++ b/src/link_extractors/html_link_extractor.rs
@@ -125,6 +125,20 @@ mod tests {
         assert!(result.is_empty());
     }

+    #[test]
+    fn space() {
+        let le = HtmlLinkExtractor();
+        let input = "blah <a href=\"some file.html\">foo</a>.";
+        let result = le.find_links(input);
+        let expected = MarkupLink {
+            target: "some file.html".to_string(),
+            line: 1,
+            column: 6,
+            source: "".to_string(),
+        };
+        assert_eq!(vec![expected], result);
+    }
+
failures:

---- link_extractors::html_link_extractor::tests::space stdout ----
thread 'link_extractors::html_link_extractor::tests::space' panicked at 'assertion failed: `(left == right)`
  left: `[ => some file.html (line 1, column 6)]`,
 right: `[ => some (line 1, column 6)]`', src/link_extractors/html_link_extractor.rs:139:9

Missing binaries for 0.15.x releases

Describe the bug
Binaries for 0.15.x releases are missing in the release page:
https://github.com/becheran/mlc/releases

To Reproduce
Run the command in the README:

$ curl -L https://github.com/becheran/mlc/releases/download/v0.15.2/mlc -o mlc
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     9    0     9    0     0     32      0 --:--:-- --:--:-- --:--:--    32
$ file mlc 
mlc: ASCII text, with no line terminators
$ cat mlc
Not Found

crates.io links cannot be resolved

No crates.io links can be resolved. For example, a file only contains a single link:

https://crates.io/crates/tokio

mlc gives the following output:

> mlc

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                                                          +
+            markup link checker - mlc v0.13.4             +
+                                                          +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[Err ] ./test.md (1, 1) => https://crates.io/crates/tokio. 404 - Not Found

Result (1 links):

OK       0
Skipped  0
Warnings 0
Errors   1


The following links could not be resolved:

./test.md (1, 1) => https://crates.io/crates/tokio.

By the way, are all browser and smartphone details in the issue template really needed for this project?

Support `ignore` or `disable` lines and blocks in checked files

Is your feature request related to a problem? Please describe.

I would like a way to disable or ignore specific parts of flies that otherwise should be link checked.
For example, I use some hacks in a project that will eventually be valid links once a static site generator renders it (here).

Describe the solution you'd like

This is a common pattern: have a comment in line or comments around some block to disable for that line/block

Describe alternatives you've considered

I otherwise need to amass a running list of links in the config #65 supports to ignore that gets unwieldy in the case of many links you want to skip (mostly throws 403 or the link as sites block bots...?). Not as bad for me, I have a very specific syntax on internal links that would be broken ( "*slides.*" works to skip the file not the links within it.)

Additional context

I am keen on Markdown support, but a nice-to-have would be this implemented for all langs mlc supports.

Option to check links in code blocks

Is your feature request related to a problem? Please describe.
We have some links that are inside of code blocks, particularly to download config files as part of an installation process. It is important that these links are correct, but unfortunately mlc ignores them.

Describe the solution you'd like
A command line option to check links inside of code blocks.

Describe alternatives you've considered
Moving the links out of code blocks. This is not a good solution because its necessary to mention the specific commands for users who are not familiar with Linux/bash.

Additional context
Example:

# download default config files
wget https://raw.githubusercontent.com/LemmyNet/lemmy/release/v0.17/docker/prod/docker-compose.yml
wget https://raw.githubusercontent.com/LemmyNet/lemmy/release/v0.17/docker/lemmy.hjson

# start the server
docker-compose up -d

Support anchor link targets

Anchor links

The part after the # called anchor link is currently not checked. A markdown link including an anchor link target is for example [anchor](#go-to-anchor-on-same-page), or [external](http://external.com#headline-1).

How do anchor links work

HTML defines anchors targets via the anchor name tag (e.g. <a id="generator"></a>). An anchor target can also be any html tag with an id attribute (e.g. <div id="fol-bar"></div>).

The official markdown spec does not define anchor targets. But most interpreters and renderer support the generation of default anchor targets in markdown documents. For example the github markdown flawor supports auto generated link targets for all headline (h1 to h6) to with the following rules:

  1. downcase the headline
  2. remove anything that is not a letter, number, space or hyphen
  3. changes any space to a hyphen

Implementation hints

A first good step would be to add valid anchor links example markdown files to the benches dir which will be used for the [end-to-end unit tests[(./tests/end_to_end.rs).

The library run method is the most important method which will use all submodules and does the actual execution.

In the link extractor module the part after the # needs to be extracted and saved in the MarkupLink struct.

The lilnk validator module is responsible for the actual resolving and check whether a resource exists (either on disk or as URL. This code needs to be enhanced to not only check for existence if an anchor link was extracted, but also actually parse the target file and extract all anchor targets. Same must be done for we links. Here a HEAD request is send right now and only of that failes a GET is send. If an achor link needs to be followed a GET request is needed and the resulting page needs to be parsed for all anchors.

Besides the already existing grouping of same links which are only checked once for performance boost, it would also make sense to parse an document wich contains an anchor to it only once and reuse the parse result for others references to the same doc, Also for performacne reasons it would be great to only download and parse documents which actually have an anchor link to them and not all docs for all links.

Support checking anchor links

It would be great if this tool was able to check anchor links. It would really help to maintain references to internal MD docs. (But if it could also parse external html resources, that would be beyond awesomeness ๐Ÿ˜ƒ ).

I guess it can be introduced with a CLI option to enable or disable anchor link checks.

And thanks for creating this tool! Looking forward to replacing markdown-link-check with it :)

Offline mode

When supplying a switch, e.g. --offline, mlc would not try to access anything outside the local machine.
Might this also need a new, neutral result state?

This can be useful when working off-grid, and still wanting at least a preliminary (very fast) check of the docu, before committing.

It might only check local links, or use a cache of earlier checked links.

Priority: low

First line separator is wrong on Windows

When running mlc on windows without any arguments, it will begin the path to files with ./ instead of the windows typical backslash \..

For example ./target\package\mlc-0.15.4\benches\throttle\same_ip.md (8, 1) => https://127.0.0.1/f5. should be .\target\package\mlc-0.15.4\benches\throttle\same_ip.md (8, 1) => https://127.0.0.1/f5.

MLC github action

It would be nice to have the github action that simplifies the usage of mlc on (github actions) CI.

Latest Docker images broken

Describe the bug

v0.15.x Docker images are broken.

To Reproduce
Steps to reproduce the behavior:

  1. Run and observe v0.15.2 does not work:
โžœ docker run --rm -it becheran/mlc:0.15.2 mlc -h

mlc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by mlc)

Expected behavior

mlc should continue to function in Docker like v0.14.3 does.

โžœ docker run --rm -it becheran/mlc:0.14.3 mlc -h

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                                                          +
+            markup link checker - mlc v0.14.3             +
+                                                          +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Desktop (please complete the following information):

โžœ docker --version

Docker version 20.10.14, build a224086

Mismatch between definition and access of `throttle`

Describe the bug
Usage of the --throttle (or -T) option causes a panic and fatal crash.

To Reproduce
Steps to reproduce the behavior:

  1. Run mlc:
mlc --root-dir ./ --match-file-extension --ignore-links "http://localhost:8080" --throttle 15 ./docs/

Expected behavior
Expecting standard predicted output

Behavior observed
Instead of output, the following is printed:

thread 'main' panicked at C:\Users\PaulPEW\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.4.7\src\parser\error.rs:32:9:
Mismatch between definition and access of `throttle`. Could not downcast to TypeId { t: 25882202575019293479932656973818029271 }, need to downcast to TypeId { t: 96503125482807615452342895184004937604 }

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Turning on RUST_BACKTRACE=1 and in PowerShell, I get the following more complete output:

thread 'main' panicked at C:\Users\PaulPEW\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.4.7\src\parser\error.rs:32:9:
Mismatch between definition and access of `throttle`. Could not downcast to TypeId { t: 25882202575019293479932656973818029271 }, need to downcast to TypeId { t: 96503125482807615452342895184004937604 }

stack backtrace:
   0:     0x7ff73cd055ca - <unknown>
   1:     0x7ff73cd218cb - <unknown>
   2:     0x7ff73cd009e1 - <unknown>
   3:     0x7ff73cd0534a - <unknown>
   4:     0x7ff73cd07aea - <unknown>
   5:     0x7ff73cd07758 - <unknown>
   6:     0x7ff73cd0819e - <unknown>
   7:     0x7ff73cd0808d - <unknown>
   8:     0x7ff73cd05fb9 - <unknown>
   9:     0x7ff73cd07d90 - <unknown>
  10:     0x7ff73cd39b75 - <unknown>
  11:     0x7ff73c9d05cc - <unknown>
  12:     0x7ff73c99d2e3 - <unknown>
  13:     0x7ff73c9bd8bc - <unknown>
  14:     0x7ff73c9b1601 - <unknown>
  15:     0x7ff73c9bb676 - <unknown>
  16:     0x7ff73c9bb68c - <unknown>
  17:     0x7ff73ccf9ee8 - <unknown>
  18:     0x7ff73c9b178c - <unknown>
  19:     0x7ff73cd29a7c - <unknown>
  20:     0x7ff8f1d7257d - BaseThreadInitThunk
  21:     0x7ff8f256aa78 - RtlUserThreadStart

Desktop (please complete the following information):

  • OS: Windows 11
  • Version v0.16.2

Additional context:
It seems clear that the issue is being thrown by clap_builder-4.4.7, but it is unclear whether this is a bug in that crate, or whether the bug exists in how mlc implements usage of that crate.

It seems like this issue filed on clapper has some information in the thread about how to ameliorate the situation.

Unable to extract link if on the same line as inline code block

Describe the bug
mlc currently fails to find a valid link if it's on the same line as an inline code block.

To Reproduce
Steps to reproduce the behavior:

  1. This md file:
`bug` [code](http://example.net/), link!.
  1. This command: 'mlc .'
  2. See error: no link found

Expected behavior
A link should be found

Additional context
Test case submitted in PR #33.

Installation failed

Describe the bug
Use the command cargo install mlc to install failed

Expected behavior
Successful installation.

Desktop (please complete the following information):

  • OS: windows
  • Version 11

Additional context
The error log:

error: cannot find derive macro `Deserialize` in this scope
 --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\logger.rs:6:30
  |
6 | #[derive(Debug, Clone, Copy, Deserialize)]
  |                              ^^^^^^^^^^^
  |
  = note: consider importing this derive macro:
          serde_derive::Deserialize
note: `Deserialize` is imported here, but it is only a trait, without a derive macro
 --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\logger.rs:3:5
  |
3 | use serde::Deserialize;
  |     ^^^^^^^^^^^^^^^^^^

error: cannot find derive macro `Deserialize` in this scope
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:36:26
   |
36 | #[derive(Default, Debug, Deserialize)]
   |                          ^^^^^^^^^^^
   |
   = note: consider importing this derive macro:
           serde_derive::Deserialize
note: `Deserialize` is imported here, but it is only a trait, without a derive macro
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:13:5
   |
13 | use serde::Deserialize;
   |     ^^^^^^^^^^^^^^^^^^

error: cannot find attribute `serde` in this scope
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:39:7
   |
39 |     #[serde(rename(deserialize = "markup-types"))]
   |       ^^^^^
   |
   = note: `serde` is in scope, but it is a crate, not an attribute

error: cannot find attribute `serde` in this scope
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:42:7
   |
42 |     #[serde(rename(deserialize = "match-file-extension"))]
   |       ^^^^^
   |
   = note: `serde` is in scope, but it is a crate, not an attribute

error: cannot find attribute `serde` in this scope
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:44:7
   |
44 |     #[serde(rename(deserialize = "ignore-links"))]
   |       ^^^^^
   |
   = note: `serde` is in scope, but it is a crate, not an attribute

error: cannot find attribute `serde` in this scope
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:46:7
   |
46 |     #[serde(rename(deserialize = "ignore-path"))]
   |       ^^^^^
   |
   = note: `serde` is in scope, but it is a crate, not an attribute

error: cannot find attribute `serde` in this scope
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:48:7
   |
48 |     #[serde(rename(deserialize = "root-dir"))]
   |       ^^^^^
   |
   = note: `serde` is in scope, but it is a crate, not an attribute

error: cannot find derive macro `Deserialize` in this scope
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:53:26
   |
53 | #[derive(Default, Debug, Deserialize)]
   |                          ^^^^^^^^^^^
   |
   = note: consider importing this derive macro:
           serde_derive::Deserialize
note: `Deserialize` is imported here, but it is only a trait, without a derive macro
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:13:5
   |
13 | use serde::Deserialize;
   |     ^^^^^^^^^^^^^^^^^^

error[E0277]: the trait bound `OptionalConfig: Deserialize<'_>` is not satisfied
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\cli.rs:15:30
   |
15 |         Ok(content) => match toml::from_str(&content) {
   |                              ^^^^^^^^^^^^^^ the trait `Deserialize<'_>` is not implemented for `OptionalConfig`
   |
   = help: the following other types implement trait `Deserialize<'de>`:
             &'a [u8]
             &'a std::path::Path
             &'a str
             ()
             (T0, T1)
             (T0, T1, T2)
             (T0, T1, T2, T3)
             (T0, T1, T2, T3, T4)
           and 130 others
note: required by a bound in `toml::from_str`
  --> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\toml-0.5.10\src\de.rs:75:8
   |
75 |     T: de::Deserialize<'de>,
   |        ^^^^^^^^^^^^^^^^^^^^ required by this bound in `toml::from_str`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `mlc` due to 9 previous errors
warning: build failed, waiting for other jobs to finish...
error: failed to compile `mlc v0.16.0`, intermediate artifacts can be found at `C:\Users\\AppData\Local\Temp\cargo-installR5kWcz`

Select output format

Instead of the current output format, which is mainly human oriented, we might want to add support for more machine-friendly formats as well. The desired format could be selected through a command-line switch.

crates.io links: 403 - Forbidden

Describe the bug
MLC reports that crates.io links could not be resolved. I'm not sure if it is related to #20, but this time the error is different: "403 - Forbidden" instead of "404 - Not Found". I'm not sure if this issue is on the MLC side.

To Reproduce
Steps to reproduce the behavior:

  1. Create a md file with any crates.io link, for example https://crates.io/crates/syn.
  2. Run mlc.
  3. See error:
[Err ] ./e.md (1, 1) => https://crates.io/crates/syn. 403 - Forbidden

Result (1 links):

OK       0
Skipped  0
Warnings 0
Errors   1


The following links could not be resolved:

./e.md (1, 1) => https://crates.io/crates/syn.

Expected behavior
I expect to see no errors.

Desktop (please complete the following information):

  • OS: Ubuntu 20.04.1 LTS
  • Browser Firefox 80.0.1 (64-bit)
  • Version 0.13.6

Additional context

Cannot parse a URL link containing closing parentheses ')'

Describe the bug
A clear and concise description of what the bug is.
This is a valid URL link, yet mlc cannot parse it correctly

https://en.wikipedia.org/wiki/Stack_(abstract_data_type)

To Reproduce
Steps to reproduce the behavior:

  1. This md file:
    [a Stack data structure](https://en.wikipedia.org/wiki/Stack_(abstract_data_type))

  2. This command '....'
    mlc /path/to/mdfile

  3. See error

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                                                          +
+        markup link checker - mlc v0.13.7-alpha.0         +
+                                                          +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[Err ] ../melvillian.github.io/posts/blah/blah.md (1, 26) => https://en.wikipedia.org/wiki/Stack_(abstract_data_type. 404 - Not Found

Result (1 links):

OK       0
Skipped  0
Warnings 0
Errors   1


The following links could not be resolved:

../melvillian.github.io/posts/blah/blah.md (1, 26) => https://en.wikipedia.org/wiki/Stack_(abstract_data_type.

Expected behavior
A clear and concise description of what you expected to happen.
I expect mlc succeed and return 1 OK and 0 Errors

Desktop (please complete the following information):

  • OS: [e.g. iOS]
    Ubuntu 18.04.5 LTS

  • Browser [e.g. chrome, safari]
    firefox

  • Version [e.g. 22]
    80.0 64-bit

Additional context
I am working on a PR for this right now. My approach is to keep a tally of any additional ( that appear, and only break out of the forward_until when we've reached a matching ).

Option to throttle connections to each domain

Is your feature request related to a problem? Please describe.
If you have more than a couple github.com links in your markdown files, then you get lots of 429 - Too Many Requests responses when you run mlc.

Describe the solution you'd like
Limit the web requests so that there is at least N (milli)seconds between each request to each domain.

Describe alternatives you've considered
A global throttle would be simpler to implement, but at a much higher run time cost.

Incorrect "the following links could not be resolved" list

Describe the bug
In some cases the resulting "the following links could not be resolved" list includes valid links. For example:

The following links could not be resolved:
./os_info/README.md (6, 87) => https://deps.rs/repo/github/stanislav-tkach/os_info.
./cli/README.md (6, 87) => https://deps.rs/repo/github/stanislav-tkach/os_info.
./README.md (6, 87) => https://deps.rs/repo/github/stanislav-tkach/os_info.
[ OK ] ./CHANGELOG.md (222, 10) => https://github.com/stanislav-tkach/os_info/compare/v0.7.0...v1.0. 

Result (90 links):

OK       87
Skipped  0
Warnings 0
Errors   3

There are only three failures in the log above, but the list includes four links.

To Reproduce
Unfortunately I'm unable to create a minimal reproducible example, moreover this issue isn't reproduced locally. However, it is stably reproduced on CI. Perhaps the MLC github action can affect this somehow?

Here are logs with and without debug output.

Expected behavior
I expect to see no "[ OK ]" links in the "the following links could not be resolved" section.

Desktop (please complete the following information):

  • OS: Ubuntu 20.04.1 LTS
  • Browser Firefox 82.0 (64-bit)
  • Version 0.13.11

Docker image doesn't work

Describe the bug
I tried to use the docker image to avoid having to install Rust on OSX, where there's no precompiled binaries.

I got this error from all the HTTP requests:

[Err ] /capi/docs/book/src/developer/providers/implementers-guide/controllers_and_reconciliation.md (201, 13) => https://godoc.org/sigs.k8s.io/cluster-api/util#GetOwnerMachine. Http(s) request failed. error sending request for url (https://godoc.org/sigs.k8s.io/cluster-api/util#GetOwnerMachine): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1915: (unable to get local issuer certificate)

I'd guess that something isn't configured correctly with system certificates in the docker image.

To Reproduce

docker run --rm becheran/mlc bash -c "echo [hi]\(https://google.com\) > foo.md && mlc foo.md"

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                                                          +
+            markup link checker - mlc v0.13.1             +
+                                                          +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Head request error: error sending request for url (https://google.com/): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify
failed:ssl/statem/statem_clnt.c:1915: (unable to get local issuer certificate). Retry with get-request.
[Err ] foo.md (1, 6) => https://google.com. Http(s) request failed. error sending request for url (https://google.com/): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1915: (unable to get local issuer certificate)

The following links could not be resolved:
foo.md (1, 6) => https://google.com.

Result (1 links):

OK       0
Skipped  0
Warnings 0
Errors   1

(maybe) Support a `--secure-only` flag, that warns about http (vs https) links

Problem

Outdated content often has http://... links instead of the https://... versions,
which almost all sites support by now. It would be good if they were changed.

Is the same possible in other cases?
Looking at ftp vs sftp, for example, I think we could not do the same,
as one can not generally assume that there is sftp where there is ftp,
nor that it points to the same content.

Solution

With something like a --secure-only flag,
mlc issues a warning for each http (unencrypted) link.

Alternatives

It would be quite trivial to do this manually using grep -e '^http://' on a list of exported, gathered links.
Personally, I prefer this way of doing it.

Only extract anchor targets/link fragments

With a command line switch, like --only-extract-anchors-targets, only that second phase of the process would be done, and the list of anchor targets would be printed.

This can be useful for debugging, for using other tools for doing the other steps, gathering statistics about anchors or the changing of anchors over time, or .. who knows what!
It would move mlc closer to the UNIX style of software, and give more power to the users of this software.

Option to (maybe conditionally) hide redirects

Is your feature request related to a problem? Please describe.
In our project we have a lot of redirects in our links (most as simple as GitHub redirecting the link to the specific blob). With this, each Pull-Request is cluttered with warnings about the redirect.

Describe the solution you'd like
Probably, a flag along the line of --do-not-warn-for-redirect would suffice.
Something like --do-not-warn-for-redirect-to "https://github.com/*" similar to the --ignore-links option would be nice too!

Describe alternatives you've considered
We could use --ignore-links but with this we loose all information on if the target is reachable or not.

Additional context
If desired i can try to implement the feature.

Support reading ` .gitignore` if present to ignore paths

Is your feature request related to a problem? Please describe.

I don't want to duplicate the paths listed in .gitignore in mlc config. I would like the option (on by default) to use .gitignore as a baseline, and then respect ignoring any additional paths defined in the config.

Describe the solution you'd like

Ideally detecting this present in the pwd and/or target path to do check on and using it automatically would be great. Alternative patters:

Describe alternatives you've considered

Duplication of the ignore settings in this and other tools ๐Ÿ™ƒ

--ignore-links syntax?

How do I pass multiple URLs/patterns to --ignore-links? I can't figure out the syntax.

Only validate links

With a command line switch, like --only-validate-links, only that last phase of the process would be done, and the list of links would have to be provided, instead of an input directory.

This can be useful for debugging, for using other tools for doing the link-extraction step (e.g. pandoc or the like, which support more formats), or .. who knows what!
It would move mlc closer to the UNIX style of software, and give more power to the users of this software.

Create snap release

It should be easy to create a self contained snap for mlc using snapcraft. This would make it easier install and use mlc on various linux OS systems.

Only extract links

With a command line switch, like --only-extract-links, only that first phase of the process would be done, and the list of links would be printed.

This can be useful for debugging, for using other tools for doing the other steps, gathering statistics about links or the changing of links over time, or .. who knows what!
It would move mlc closer to the UNIX style of software, and give more power to the users of this software.

Spotify and LinkedIn links don't work

Describe the bug
See title.

To Reproduce
Try to validate either https://www.linkedin.com/in/jessemillar or https://open.spotify.com/playlist/1zl6faUb2RkH9eE6LwA0ae via mlc.

Expected behavior
I'd expect the links to validate since they work in a browser.

Screenshots

[Err ] ./linkedin/index.html (12, 51) => https://www.linkedin.com/in/jessemillar. Http(s) request failed. error sending request for url (https://www.linkedin.com/in/jessemillar): invalid HTTP status-code parsed
[Err ] ./spotify/4-3-2020/index.html (12, 51) => https://open.spotify.com/playlist/1zl6faUb2RkH9eE6LwA0ae. 400 - Bad Request

Desktop (please complete the following information):

  • OS: Ubuntu 18.04.4 LTS
  • Browser: Chrome Version 81.0.4044.138 (Official Build) (64-bit)
  • Rust: rustc 1.43.1 (8d69840ab 2020-05-04)

`.mlc.toml` options are not used

The file is read correctly:

mlc/src/cli.rs

Lines 14 to 20 in 377d20f

let mut opt: OptionalConfig = match fs::read_to_string(CONFIG_FILE_PATH) {
Ok(content) => match toml::from_str(&content) {
Ok(o) => o,
Err(err) => panic!("Invalid TOML file {:?}", err),
},
Err(_) => OptionalConfig::default(),
};

I do get an error as expected with a bad TOML file passed in, but the settings are then not respected by the command. I suspect it's simply overridden by the opt var getting clobbered by nothing passed into the cli ars?

Potential alternative/similar project: mdbook-linkcheck

I stumbled over this by accident.
It also does not support anchors yet (and is unlikely to do so, as it is marked as "passively maintained" in its Cargo.toml); here a link to the anchors issue there:
Michael-F-Bryan/mdbook-linkcheck#52

I did not yet have a look at the code, but I imagine that it at least does not support only Markdown, not HTML, as there would be no need for that in mdbook, if not checking anchors.

Once one of us would have an understanding of that project, should we maybe mention it in the README, giving a short evaluation where it is better/worse then mlc?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.