becheran / mlc Goto Github PK
View Code? Open in Web Editor NEWCheck for broken links in markup files
License: MIT License
Check for broken links in markup files
License: MIT License
Is your feature request related to a problem? Please describe.
In our project we have a lot of redirects in our links (most as simple as GitHub redirecting the link to the specific blob). With this, each Pull-Request is cluttered with warnings about the redirect.
Describe the solution you'd like
Probably, a flag along the line of --do-not-warn-for-redirect
would suffice.
Something like --do-not-warn-for-redirect-to "https://github.com/*"
similar to the --ignore-links
option would be nice too!
Describe alternatives you've considered
We could use --ignore-links
but with this we loose all information on if the target is reachable or not.
Additional context
If desired i can try to implement the feature.
How do I pass multiple URLs/patterns to --ignore-links
? I can't figure out the syntax.
Is your feature request related to a problem? Please describe.
I would like a way to disable or ignore specific parts of flies that otherwise should be link checked.
For example, I use some hacks in a project that will eventually be valid links once a static site generator renders it (here).
Describe the solution you'd like
This is a common pattern: have a comment in line or comments around some block to disable for that line/block
Describe alternatives you've considered
I otherwise need to amass a running list of links in the config #65 supports to ignore that gets unwieldy in the case of many links you want to skip (mostly throws 403
or the link as sites block bots...?). Not as bad for me, I have a very specific syntax on internal links that would be broken ( "*slides.*"
works to skip the file not the links within it.)
Additional context
I am keen on Markdown support, but a nice-to-have would be this implemented for all langs mlc
supports.
If you try to run the precompiled mlc-x86_64-linux
binary on latest stable Debian 11 bullseye you'll get:
# ./mlc
./mlc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./mlc)
./mlc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./mlc)
./mlc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./mlc)
This is due to Debian 11 using glibc 2.31:
# ldd --version
ldd (Debian GLIBC 2.31-13+deb11u5) 2.31
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
The glibc version on Debian is not so old, so it would be nice to build the mlc
binary with compatibility for Debian stable.
A similar issue is discussed here:
atanunq/viu#68
Thank you
Describe the bug
A clear and concise description of what the bug is.
This is a valid URL link, yet mlc
cannot parse it correctly
https://en.wikipedia.org/wiki/Stack_(abstract_data_type)
To Reproduce
Steps to reproduce the behavior:
This md file:
[a Stack data structure](https://en.wikipedia.org/wiki/Stack_(abstract_data_type))
This command '....'
mlc /path/to/mdfile
See error
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ +
+ markup link checker - mlc v0.13.7-alpha.0 +
+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Err ] ../melvillian.github.io/posts/blah/blah.md (1, 26) => https://en.wikipedia.org/wiki/Stack_(abstract_data_type. 404 - Not Found
Result (1 links):
OK 0
Skipped 0
Warnings 0
Errors 1
The following links could not be resolved:
../melvillian.github.io/posts/blah/blah.md (1, 26) => https://en.wikipedia.org/wiki/Stack_(abstract_data_type.
Expected behavior
A clear and concise description of what you expected to happen.
I expect mlc
succeed and return 1 OK and 0 Errors
Desktop (please complete the following information):
OS: [e.g. iOS]
Ubuntu 18.04.5 LTS
Browser [e.g. chrome, safari]
firefox
Version [e.g. 22]
80.0 64-bit
Additional context
I am working on a PR for this right now. My approach is to keep a tally of any additional (
that appear, and only break out of the forward_until
when we've reached a matching )
.
Describe the bug
Binaries for 0.15.x releases are missing in the release page:
https://github.com/becheran/mlc/releases
To Reproduce
Run the command in the README:
$ curl -L https://github.com/becheran/mlc/releases/download/v0.15.2/mlc -o mlc
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 9 0 9 0 0 32 0 --:--:-- --:--:-- --:--:-- 32
$ file mlc
mlc: ASCII text, with no line terminators
$ cat mlc
Not Found
The line col index calculation for cmark lib is wrong. And inefficient. No additional Vec alloc is needed!
Is your feature request related to a problem? Please describe.
I'd love to have the ability to report redirections as failures. If there's a 301 redirect in place, I'd love to flag it as broken so that I can fix and link to the appropriate place.
Describe the solution you'd like
An extra flag --report-redirections
could work. Maybe it could even be --treat-status-codes-as-failure
where we could pass status codes that are not successful from our PoV. I'd do --treat-status-codes-as-failure "301,307"
Describe alternatives you've considered
I looked at the different cli options, but I couldn't find something that helps here
Additional context
I really like to reduce points of failure :) I have 0 experience with rust, but if this sounds like something you'd like to support I'd be happy to attempt at a PR.
Describe the bug
mlc currently fails to find a valid link if it's on the same line as an inline code block.
To Reproduce
Steps to reproduce the behavior:
`bug` [code](http://example.net/), link!.
Expected behavior
A link should be found
Additional context
Test case submitted in PR #33.
Is your feature request related to a problem? Please describe.
If you have more than a couple github.com links in your markdown files, then you get lots of 429 - Too Many Requests
responses when you run mlc.
Describe the solution you'd like
Limit the web requests so that there is at least N (milli)seconds between each request to each domain.
Describe alternatives you've considered
A global throttle would be simpler to implement, but at a much higher run time cost.
Support the .rst markup language. Need to write link extractor for .rst
Instead of the current output format, which is mainly human oriented, we might want to add support for more machine-friendly formats as well. The desired format could be selected through a command-line switch.
Describe the bug
In some cases the resulting "the following links could not be resolved" list includes valid links. For example:
The following links could not be resolved:
./os_info/README.md (6, 87) => https://deps.rs/repo/github/stanislav-tkach/os_info.
./cli/README.md (6, 87) => https://deps.rs/repo/github/stanislav-tkach/os_info.
./README.md (6, 87) => https://deps.rs/repo/github/stanislav-tkach/os_info.
[ OK ] ./CHANGELOG.md (222, 10) => https://github.com/stanislav-tkach/os_info/compare/v0.7.0...v1.0.
Result (90 links):
OK 87
Skipped 0
Warnings 0
Errors 3
There are only three failures in the log above, but the list includes four links.
To Reproduce
Unfortunately I'm unable to create a minimal reproducible example, moreover this issue isn't reproduced locally. However, it is stably reproduced on CI. Perhaps the MLC github action can affect this somehow?
Here are logs with and without debug output.
Expected behavior
I expect to see no "[ OK ]" links in the "the following links could not be resolved" section.
Desktop (please complete the following information):
Describe the bug
I tried to use the docker image to avoid having to install Rust on OSX, where there's no precompiled binaries.
I got this error from all the HTTP requests:
[Err ] /capi/docs/book/src/developer/providers/implementers-guide/controllers_and_reconciliation.md (201, 13) => https://godoc.org/sigs.k8s.io/cluster-api/util#GetOwnerMachine. Http(s) request failed. error sending request for url (https://godoc.org/sigs.k8s.io/cluster-api/util#GetOwnerMachine): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1915: (unable to get local issuer certificate)
I'd guess that something isn't configured correctly with system certificates in the docker image.
To Reproduce
docker run --rm becheran/mlc bash -c "echo [hi]\(https://google.com\) > foo.md && mlc foo.md"
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ +
+ markup link checker - mlc v0.13.1 +
+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Head request error: error sending request for url (https://google.com/): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify
failed:ssl/statem/statem_clnt.c:1915: (unable to get local issuer certificate). Retry with get-request.
[Err ] foo.md (1, 6) => https://google.com. Http(s) request failed. error sending request for url (https://google.com/): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1915: (unable to get local issuer certificate)
The following links could not be resolved:
foo.md (1, 6) => https://google.com.
Result (1 links):
OK 0
Skipped 0
Warnings 0
Errors 1
Describe the bug
Unlike what is described in the --help
, the directory parameter, at least in the following specific case, seems to be ignored when provided after an option.
To Reproduce
The following command reports broken links that are in <workdir>
, rather than just <workdir>/<myfolder>
:
mlc -i '<mylinktoignore>' <myfolder>
This one works as expected (only broken links from <workdir>/<myfolder>
):
mlc <myfolder> -i '<mylinktoignore>'
Expected behavior
Syntax indicated by:
mlc --help
...
USAGE:
mlc [FLAGS] [OPTIONS] [directory]
...
should work.
When running mlc on windows without any arguments, it will begin the path to files with ./
instead of the windows typical backslash \.
.
For example ./target\package\mlc-0.15.4\benches\throttle\same_ip.md (8, 1) => https://127.0.0.1/f5.
should be .\target\package\mlc-0.15.4\benches\throttle\same_ip.md (8, 1) => https://127.0.0.1/f5.
Using workflow commands would enhance the mlc
GitHub action output with helpful annotations at what code location a link is broken.
Is your feature request related to a problem? Please describe.
I don't want to duplicate the paths listed in .gitignore
in mlc
config. I would like the option (on by default) to use .gitignore
as a baseline, and then respect ignoring any additional paths defined in the config.
Describe the solution you'd like
Ideally detecting this present in the pwd
and/or target path to do check on and using it automatically would be great. Alternative patters:
Describe alternatives you've considered
Duplication of the ignore settings in this and other tools ๐
Is your feature request related to a problem? Please describe.
At the moment the links to ignore are provided as option parameters on the command line:
mlc -i '<mylinktoignore>'
This is inconvenient if mlc is run during a pipeline, as adding / removing a link to ignore requires changing the pipeline config file.
Describe the solution you'd like
Allow to specify a file that provides the list of links to ignore. This would allow to just edit the file to change the list of links to ignore, like when you edit .gitignore
to specify the files / dirs that should be ignored by git
.
With a command line switch, like --only-extract-anchors-targets
, only that second phase of the process would be done, and the list of anchor targets would be printed.
This can be useful for debugging, for using other tools for doing the other steps, gathering statistics about anchors or the changing of anchors over time, or .. who knows what!
It would move mlc
closer to the UNIX style of software, and give more power to the users of this software.
thread 'main' panicked at 'File could not be opened.: Os { code: 3, kind: NotFound, message: "Das System kann den angegebenen Pfad nicht finden." }', src\libcore\result.rs:1165:5
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace.
Support links such as:
Current output (v0.5.0) is:
[Err ] ./README.md (3, 2) => ](https://crates.io/crates/mlc). Could not determine link type of ](https://crates.io/crates/mlc).
Add cli argument for files which shall not be chacked with mlc
Could you maybe add a list of features to the README?
Things I'd be interested in knowing:
#
)Surely there is more that would make sense in the list, but these are most interesting for me.
I need a tool to check consistency of machine documentation, and local files and references are very important for that.
Describe the bug
See title.
To Reproduce
Try to validate either https://www.linkedin.com/in/jessemillar or https://open.spotify.com/playlist/1zl6faUb2RkH9eE6LwA0ae via mlc
.
Expected behavior
I'd expect the links to validate since they work in a browser.
Screenshots
[Err ] ./linkedin/index.html (12, 51) => https://www.linkedin.com/in/jessemillar. Http(s) request failed. error sending request for url (https://www.linkedin.com/in/jessemillar): invalid HTTP status-code parsed
[Err ] ./spotify/4-3-2020/index.html (12, 51) => https://open.spotify.com/playlist/1zl6faUb2RkH9eE6LwA0ae. 400 - Bad Request
Desktop (please complete the following information):
Describe the bug
A link with a space in the filename, such as <a href="test file.html">
is parsed as pointing to test
. A link with an escaped space is parsed as the full filename, but the escape is not converted back to a space so it expects test%20file.html
to exist on disk.
To Reproduce
mlc/jms_test % ls
test.html testing 2.html
mlc/jms_test % cat test.html
<html>
<body>
<a href="testing 2.html">hello</a>
<a href="testing%202.html">hello</a>
</body>
</html>
mlc/jms_test % mlc
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ +
+ markup link checker - mlc v0.16.1 +
+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Err ] ./test.html (4, 3) => testing%202.html - Target filename not found.
[Err ] ./test.html (3, 3) => testing - Target not found.
Result (2 links):
OK 0
Skipped 0
Warnings 0
Errors 2
The following links could not be resolved:
./test.html (4, 3) => testing%202.html
./test.html (3, 3) => testing
Expected behavior
Desktop (please complete the following information):
Additional context
I added the following test to validate my theory, but my rust is not good enough to fix the bug :(
diff --git a/src/link_extractors/html_link_extractor.rs b/src/link_extractors/html_link_extractor.rs
index f475514..1d83f8b 100644
--- a/src/link_extractors/html_link_extractor.rs
+++ b/src/link_extractors/html_link_extractor.rs
@@ -125,6 +125,20 @@ mod tests {
assert!(result.is_empty());
}
+ #[test]
+ fn space() {
+ let le = HtmlLinkExtractor();
+ let input = "blah <a href=\"some file.html\">foo</a>.";
+ let result = le.find_links(input);
+ let expected = MarkupLink {
+ target: "some file.html".to_string(),
+ line: 1,
+ column: 6,
+ source: "".to_string(),
+ };
+ assert_eq!(vec![expected], result);
+ }
+
failures:
---- link_extractors::html_link_extractor::tests::space stdout ----
thread 'link_extractors::html_link_extractor::tests::space' panicked at 'assertion failed: `(left == right)`
left: `[ => some file.html (line 1, column 6)]`,
right: `[ => some (line 1, column 6)]`', src/link_extractors/html_link_extractor.rs:139:9
In VS Code it is very easy to follow links from the integrated terminal. It would help a lot if mlc does format the links accordingly if ran inside a vscode console window.
Links should be formated inside vs code console like this ./file.md:8:1
instead of ./file.md (8, 1)
This can easily be detected via an env var.
Is your feature request related to a problem? Please describe.
Some URLs require specific HTTP request parameters.
One example is the github docs pages, for example this .md
will fail:
$ cat mdtest.md
= Test =
[Github docs link](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository)
$ mlc
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ +
+ markup link checker - mlc v0.15.2 +
+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Err ] ./mdtest.md (3, 1) => https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository - 403 - Forbidden
Result (1 links):
OK 0
Skipped 0
Warnings 0
Errors 1
The following links could not be resolved:
./mdtest.md (3, 1) => https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository.
The reason is that the page requires specific HTTP headers:
community/community#14773
Describe the solution you'd like
It would be nice to have a way to specify HTTP request parameters, possibly per-URL.
It would be great if this tool was able to check anchor links. It would really help to maintain references to internal MD docs. (But if it could also parse external html resources, that would be beyond awesomeness ๐ ).
I guess it can be introduced with a CLI option to enable or disable anchor link checks.
And thanks for creating this tool! Looking forward to replacing markdown-link-check
with it :)
With a command line switch, like --only-validate-links
, only that last phase of the process would be done, and the list of links would have to be provided, instead of an input directory.
This can be useful for debugging, for using other tools for doing the link-extraction step (e.g. pandoc or the like, which support more formats), or .. who knows what!
It would move mlc
closer to the UNIX style of software, and give more power to the users of this software.
The file is read correctly:
Lines 14 to 20 in 377d20f
I do get an error as expected with a bad TOML file passed in, but the settings are then not respected by the command. I suspect it's simply overridden by the opt
var getting clobbered by nothing passed into the cli ars?
Describe the bug
Usage of the --throttle
(or -T
) option causes a panic and fatal crash.
To Reproduce
Steps to reproduce the behavior:
mlc
:mlc --root-dir ./ --match-file-extension --ignore-links "http://localhost:8080" --throttle 15 ./docs/
Expected behavior
Expecting standard predicted output
Behavior observed
Instead of output, the following is printed:
thread 'main' panicked at C:\Users\PaulPEW\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.4.7\src\parser\error.rs:32:9:
Mismatch between definition and access of `throttle`. Could not downcast to TypeId { t: 25882202575019293479932656973818029271 }, need to downcast to TypeId { t: 96503125482807615452342895184004937604 }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Turning on RUST_BACKTRACE=1
and in PowerShell, I get the following more complete output:
thread 'main' panicked at C:\Users\PaulPEW\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.4.7\src\parser\error.rs:32:9:
Mismatch between definition and access of `throttle`. Could not downcast to TypeId { t: 25882202575019293479932656973818029271 }, need to downcast to TypeId { t: 96503125482807615452342895184004937604 }
stack backtrace:
0: 0x7ff73cd055ca - <unknown>
1: 0x7ff73cd218cb - <unknown>
2: 0x7ff73cd009e1 - <unknown>
3: 0x7ff73cd0534a - <unknown>
4: 0x7ff73cd07aea - <unknown>
5: 0x7ff73cd07758 - <unknown>
6: 0x7ff73cd0819e - <unknown>
7: 0x7ff73cd0808d - <unknown>
8: 0x7ff73cd05fb9 - <unknown>
9: 0x7ff73cd07d90 - <unknown>
10: 0x7ff73cd39b75 - <unknown>
11: 0x7ff73c9d05cc - <unknown>
12: 0x7ff73c99d2e3 - <unknown>
13: 0x7ff73c9bd8bc - <unknown>
14: 0x7ff73c9b1601 - <unknown>
15: 0x7ff73c9bb676 - <unknown>
16: 0x7ff73c9bb68c - <unknown>
17: 0x7ff73ccf9ee8 - <unknown>
18: 0x7ff73c9b178c - <unknown>
19: 0x7ff73cd29a7c - <unknown>
20: 0x7ff8f1d7257d - BaseThreadInitThunk
21: 0x7ff8f256aa78 - RtlUserThreadStart
Desktop (please complete the following information):
v0.16.2
Additional context:
It seems clear that the issue is being thrown by clap_builder-4.4.7
, but it is unclear whether this is a bug in that crate, or whether the bug exists in how mlc
implements usage of that crate.
It seems like this issue filed on clapper has some information in the thread about how to ameliorate the situation.
Is your feature request related to a problem? Please describe.
We have some links that are inside of code blocks, particularly to download config files as part of an installation process. It is important that these links are correct, but unfortunately mlc ignores them.
Describe the solution you'd like
A command line option to check links inside of code blocks.
Describe alternatives you've considered
Moving the links out of code blocks. This is not a good solution because its necessary to mention the specific commands for users who are not familiar with Linux/bash.
Additional context
Example:
# download default config files
wget https://raw.githubusercontent.com/LemmyNet/lemmy/release/v0.17/docker/prod/docker-compose.yml
wget https://raw.githubusercontent.com/LemmyNet/lemmy/release/v0.17/docker/lemmy.hjson
# start the server
docker-compose up -d
Describe the bug
v0.15.x Docker images are broken.
To Reproduce
Steps to reproduce the behavior:
โ docker run --rm -it becheran/mlc:0.15.2 mlc -h
mlc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by mlc)
Expected behavior
mlc
should continue to function in Docker like v0.14.3 does.
โ docker run --rm -it becheran/mlc:0.14.3 mlc -h
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ +
+ markup link checker - mlc v0.14.3 +
+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Desktop (please complete the following information):
โ docker --version
Docker version 20.10.14, build a224086
When supplying a switch, e.g. --offline
, mlc
would not try to access anything outside the local machine.
Might this also need a new, neutral result state?
This can be useful when working off-grid, and still wanting at least a preliminary (very fast) check of the docu, before committing.
It might only check local links, or use a cache of earlier checked links.
Priority: low
Describe the bug
Use the command cargo install mlc
to install failed
Expected behavior
Successful installation.
Desktop (please complete the following information):
Additional context
The error log:
error: cannot find derive macro `Deserialize` in this scope
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\logger.rs:6:30
|
6 | #[derive(Debug, Clone, Copy, Deserialize)]
| ^^^^^^^^^^^
|
= note: consider importing this derive macro:
serde_derive::Deserialize
note: `Deserialize` is imported here, but it is only a trait, without a derive macro
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\logger.rs:3:5
|
3 | use serde::Deserialize;
| ^^^^^^^^^^^^^^^^^^
error: cannot find derive macro `Deserialize` in this scope
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:36:26
|
36 | #[derive(Default, Debug, Deserialize)]
| ^^^^^^^^^^^
|
= note: consider importing this derive macro:
serde_derive::Deserialize
note: `Deserialize` is imported here, but it is only a trait, without a derive macro
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:13:5
|
13 | use serde::Deserialize;
| ^^^^^^^^^^^^^^^^^^
error: cannot find attribute `serde` in this scope
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:39:7
|
39 | #[serde(rename(deserialize = "markup-types"))]
| ^^^^^
|
= note: `serde` is in scope, but it is a crate, not an attribute
error: cannot find attribute `serde` in this scope
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:42:7
|
42 | #[serde(rename(deserialize = "match-file-extension"))]
| ^^^^^
|
= note: `serde` is in scope, but it is a crate, not an attribute
error: cannot find attribute `serde` in this scope
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:44:7
|
44 | #[serde(rename(deserialize = "ignore-links"))]
| ^^^^^
|
= note: `serde` is in scope, but it is a crate, not an attribute
error: cannot find attribute `serde` in this scope
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:46:7
|
46 | #[serde(rename(deserialize = "ignore-path"))]
| ^^^^^
|
= note: `serde` is in scope, but it is a crate, not an attribute
error: cannot find attribute `serde` in this scope
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:48:7
|
48 | #[serde(rename(deserialize = "root-dir"))]
| ^^^^^
|
= note: `serde` is in scope, but it is a crate, not an attribute
error: cannot find derive macro `Deserialize` in this scope
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:53:26
|
53 | #[derive(Default, Debug, Deserialize)]
| ^^^^^^^^^^^
|
= note: consider importing this derive macro:
serde_derive::Deserialize
note: `Deserialize` is imported here, but it is only a trait, without a derive macro
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\lib.rs:13:5
|
13 | use serde::Deserialize;
| ^^^^^^^^^^^^^^^^^^
error[E0277]: the trait bound `OptionalConfig: Deserialize<'_>` is not satisfied
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\mlc-0.16.0\src\cli.rs:15:30
|
15 | Ok(content) => match toml::from_str(&content) {
| ^^^^^^^^^^^^^^ the trait `Deserialize<'_>` is not implemented for `OptionalConfig`
|
= help: the following other types implement trait `Deserialize<'de>`:
&'a [u8]
&'a std::path::Path
&'a str
()
(T0, T1)
(T0, T1, T2)
(T0, T1, T2, T3)
(T0, T1, T2, T3, T4)
and 130 others
note: required by a bound in `toml::from_str`
--> C:\Users\\.cargo\registry\src\github.com-1ecc6299db9ec823\toml-0.5.10\src\de.rs:75:8
|
75 | T: de::Deserialize<'de>,
| ^^^^^^^^^^^^^^^^^^^^ required by this bound in `toml::from_str`
For more information about this error, try `rustc --explain E0277`.
error: could not compile `mlc` due to 9 previous errors
warning: build failed, waiting for other jobs to finish...
error: failed to compile `mlc v0.16.0`, intermediate artifacts can be found at `C:\Users\\AppData\Local\Temp\cargo-installR5kWcz`
The part after the #
called anchor link is currently not checked. A markdown link including an anchor link target is for example [anchor](#go-to-anchor-on-same-page)
, or [external](http://external.com#headline-1)
.
HTML defines anchors targets via the anchor name
tag (e.g. <a id="generator"></a>
). An anchor target can also be any html tag with an id
attribute (e.g. <div id="fol-bar"></div>
).
The official markdown spec does not define anchor targets. But most interpreters and renderer support the generation of default anchor targets in markdown documents. For example the github markdown flawor supports auto generated link targets for all headline (h1
to h6
) to with the following rules:
A first good step would be to add valid anchor links example markdown files to the benches dir which will be used for the [end-to-end unit tests[(./tests/end_to_end.rs).
The library run method is the most important method which will use all submodules and does the actual execution.
In the link extractor module the part after the #
needs to be extracted and saved in the MarkupLink
struct.
The lilnk validator module is responsible for the actual resolving and check whether a resource exists (either on disk or as URL. This code needs to be enhanced to not only check for existence if an anchor
link was extracted, but also actually parse the target file and extract all anchor targets. Same must be done for we links. Here a HEAD request is send right now and only of that failes a GET is send. If an achor link needs to be followed a GET request is needed and the resulting page needs to be parsed for all anchors.
Besides the already existing grouping of same links which are only checked once for performance boost, it would also make sense to parse an document wich contains an anchor to it only once and reuse the parse result for others references to the same doc, Also for performacne reasons it would be great to only download and parse documents which actually have an anchor link to them and not all docs for all links.
I stumbled over this by accident.
It also does not support anchors yet (and is unlikely to do so, as it is marked as "passively maintained" in its Cargo.toml); here a link to the anchors issue there:
Michael-F-Bryan/mdbook-linkcheck#52
I did not yet have a look at the code, but I imagine that it at least does not support only Markdown, not HTML, as there would be no need for that in mdbook, if not checking anchors.
Once one of us would have an understanding of that project, should we maybe mention it in the README, giving a short evaluation where it is better/worse then mlc?
Describe the bug
MLC reports that crates.io links could not be resolved. I'm not sure if it is related to #20, but this time the error is different: "403 - Forbidden" instead of "404 - Not Found". I'm not sure if this issue is on the MLC side.
To Reproduce
Steps to reproduce the behavior:
[Err ] ./e.md (1, 1) => https://crates.io/crates/syn. 403 - Forbidden
Result (1 links):
OK 0
Skipped 0
Warnings 0
Errors 1
The following links could not be resolved:
./e.md (1, 1) => https://crates.io/crates/syn.
Expected behavior
I expect to see no errors.
Desktop (please complete the following information):
Additional context
It should be easy to create a self contained snap for mlc using snapcraft. This would make it easier install and use mlc on various linux OS systems.
With a command line switch, like --only-extract-links
, only that first phase of the process would be done, and the list of links would be printed.
This can be useful for debugging, for using other tools for doing the other steps, gathering statistics about links or the changing of links over time, or .. who knows what!
It would move mlc
closer to the UNIX style of software, and give more power to the users of this software.
Outdated content often has http://...
links instead of the https://...
versions,
which almost all sites support by now. It would be good if they were changed.
Is the same possible in other cases?
Looking at ftp
vs sftp
, for example, I think we could not do the same,
as one can not generally assume that there is sftp
where there is ftp,
nor that it points to the same content.
With something like a --secure-only
flag,
mlc
issues a warning for each http
(unencrypted) link.
It would be quite trivial to do this manually using grep -e '^http://'
on a list of exported, gathered links.
Personally, I prefer this way of doing it.
I need to soft link files in places for compatibility within a project. Sadly mlc will not detect that the file is a "redirect" to another file it is going to check.
Describe the solution you'd like
As a first step, some option for how to handle ln
objects - to follow or not.
Ideally I would love for automatic detection of this, so not to duplicate work if the file is going to be checked again, and if it is not, default to check the linked file (once).
When we check links to https://hub.docker.com/r/<...>
we get a 406 - Not Acceptable
even though the links exist and work fine.
Describe the bug
mlc 1.71
panics for ignore-path = [""]
or in any case a file does not exist like ignore-path = ["node_modules", "html-slides", "html-book"]
(for me these are build artifacts, should be .gitignore
respected?)
To Reproduce
Steps to reproduce the behavior:
Expected behavior
If a file or path does not exist, it is simply skipped.
Ideally this would use a glob match, but that can be a separate feat req ๐ .
No crates.io links can be resolved. For example, a file only contains a single link:
https://crates.io/crates/tokio
mlc
gives the following output:
> mlc
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ +
+ markup link checker - mlc v0.13.4 +
+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Err ] ./test.md (1, 1) => https://crates.io/crates/tokio. 404 - Not Found
Result (1 links):
OK 0
Skipped 0
Warnings 0
Errors 1
The following links could not be resolved:
./test.md (1, 1) => https://crates.io/crates/tokio.
By the way, are all browser and smartphone details in the issue template really needed for this project?
It would be nice to have the github action that simplifies the usage of mlc on (github actions) CI.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.