Code Monkey home page Code Monkey logo

Comments (10)

becheran avatar becheran commented on June 5, 2024

I like this idea. Just don't know how exactly one would pass all the possible header fields to mlc? Via commandarg?

from mlc.

diegorondini avatar diegorondini commented on June 5, 2024

Probably the best option would be a config file, otherwise it would be impractical to specify different headers for different URLs.

See for example:
https://github.com/orgs/github-community/discussions/14773#discussioncomment-2679987
https://github.com/tcort/markdown-link-check#config-file-format

from mlc.

diegorondini avatar diegorondini commented on June 5, 2024

I think your pipeline has been hit by this bug:
https://github.com/becheran/mlc/actions/runs/3559864946/jobs/5979511630

[Err ] ./README.md (62, 22) => https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions - 403 - Forbidden
Error: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions. 403 - Forbidden

from mlc.

becheran avatar becheran commented on June 5, 2024

@diegorondini fun fact: It does not fail when I run it locally. Does github somehow prevent requests to GitHub.com from their own runners? You mention missing request parameters? What would that be in this case?

from mlc.

diegorondini avatar diegorondini commented on June 5, 2024

@becheran I think the first question is why the pipeline checks that link even if there's no such link in the README.md:

$ grep 'docs\.github' README.md

Returning to this bug, docs.github.com requires the Accept-Encoding: zstd, br, gzip, deflate header:

$ curl -i -X GET https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions
HTTP/2 403 
x-azure-ref: 0wn2EYwAAAACr4P2HgpUzTatC1/nj5XnyTU5aMjIxMDYwNjEzMDIxADU5NmQ3OGEyLWNhNWYtNDc5ZC1iY2RjLTA4MzU4MzMxNzRiMg==
accept-ranges: bytes
via: 1.1 varnish, 1.1 varnish
date: Mon, 28 Nov 2022 09:22:10 GMT
x-served-by: cache-iad-kiad7000135-IAD, cache-mrs10563-MRS
x-cache: MISS, MISS
x-cache-hits: 0, 0
x-timer: S1669627330.213655,VS0,VE92
strict-transport-security: max-age=31557600

$ curl -i -H "Accept-Encoding: zstd, br, gzip, deflate" -X GET https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions
HTTP/2 200 
cache-control: public, max-age=60
content-type: text/html; charset=utf-8
access-control-allow-origin: *
content-security-policy: default-src 'none';prefetch-src 'self';connect-src 'self';font-src 'self' data: githubdocs.azureedge.net;img-src 'self' github.com *.github.com *.githubusercontent.com *.githubassets.com data: githubdocs.azureedge.net placehold.it;object-src 'self';script-src 'self' data: githubdocs.azureedge.net;frame-src 'self' github.com *.github.com *.githubusercontent.com *.githubassets.com https://www.youtube-nocookie.com;frame-ancestors 'self' github.com *.github.com *.githubusercontent.com *.githubassets.com;style-src 'self' 'unsafe-inline' data: githubdocs.azureedge.net;child-src 'self';upgrade-insecure-requests;base-uri 'self';form-action 'self';script-src-attr 'none'
cross-origin-opener-policy: same-origin
cross-origin-resource-policy: same-origin
x-dns-prefetch-control: off
x-frame-options: SAMEORIGIN
x-download-options: noopen
x-content-type-options: nosniff
origin-agent-cluster: ?1
x-permitted-cross-domain-policies: none
referrer-policy: strict-origin-when-cross-origin
x-xss-protection: 0
x-powered-by: Next.js
x-azure-ref: 0hXyEYwAAAADMF8jkAx/XToTRxIg5u1m/UEhMMzBFREdFMDMxOQA1OTZkNzhhMi1jYTVmLTQ3OWQtYmNkYy0wODM1ODMzMTc0YjI=
content-encoding: br
via: 1.1 varnish, 1.1 varnish
accept-ranges: bytes
date: Mon, 28 Nov 2022 09:22:29 GMT
age: 335
x-served-by: cache-iad-kiad7000135-IAD, cache-mrs10583-MRS
x-cache: CONFIG_NOCACHE, HIT, HIT
x-cache-hits: 3, 1
x-timer: S1669627349.305248,VS0,VE1
vary: Accept-Encoding
strict-transport-security: max-age=31557600
content-length: 38324

Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.

from mlc.

diegorondini avatar diegorondini commented on June 5, 2024

Sorry, I just realized I should have checked out the github-action-output branch.
Now it fails for me as well with 0.15.4:

$ mlc ./README.md

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                                                          +
+            markup link checker - mlc v0.15.4             +
+                                                          +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

09:31:29 [WARN] Broken reference link: Borrowed("possible values: md, html")
09:31:29 [WARN] Strip everything after #. The chapter part '#ci-pipeline-integration' is not checked.
[ OK ] ./README.md (19, 8) => #ci-pipeline-integration - 
[ OK ] ./README.md (64, 1) => ./docs/FailingAnnotation.PNG - 
[ OK ] ./README.md (32, 28) => https://doc.rust-lang.org/cargo/ - 
[ OK ] ./README.md (4, 2) => https://badgen.net/crates/d/mlc?color=blue - 
[ OK ] ./README.md (46, 56) => https://github.com/marketplace/actions/markup-link-checker-mlc - 
[ OK ] ./README.md (20, 29) => https://rust-lang.github.io/async-book/ - 
[ OK ] ./README.md (3, 2) => https://img.shields.io/crates/v/mlc.svg?color=orange - 
[ OK ] ./README.md (9, 1) => https://asciinema.org/a/299100 - 
[ OK ] ./README.md (9, 2) => https://asciinema.org/a/299100.svg - 
[ OK ] ./README.md (6, 2) => https://img.shields.io/badge/License-MIT-yellow.svg - 
[ OK ] ./README.md (5, 2) => https://github.com/becheran/mlc/actions/workflows/rust.yml/badge.svg - 
[ OK ] ./README.md (7, 2) => https://img.shields.io/badge/PRs-welcome-brightgreen.svg - 
[ OK ] ./README.md (3, 1) => https://crates.io/crates/mlc - 
[ OK ] ./README.md (4, 1) => https://crates.io/crates/mlc - 
[ OK ] ./README.md (32, 92) => https://crates.io/crates/mlc - 
[Err ] ./README.md (62, 22) => https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions - 403 - Forbidden
[ OK ] ./README.md (144, 60) => https://github.com/becheran/mlc/blob/master/LICENSE - 
[ OK ] ./README.md (75, 32) => https://github.com/becheran/ntest/blob/master/.github/workflows/ci.yml - 
[ OK ] ./README.md (79, 37) => https://hub.docker.com/repository/docker/becheran/mlc - 
[ OK ] ./README.md (140, 14) => https://github.com/becheran/mlc/blob/master/CHANGELOG.md - 
[ OK ] ./README.md (6, 1) => https://opensource.org/licenses/MIT - 
[ OK ] ./README.md (112, 221) => https://github.com/becheran/wildmatch - 
[ OK ] ./README.md (40, 54) => https://github.com/becheran/mlc/releases - 
[ OK ] ./README.md (5, 1) => https://github.com/becheran/mlc/actions/workflows/rust.yml - 
[ OK ] ./README.md (7, 1) => https://github.com/becheran/mlc/blob/master/CONTRIBUTING.md - 

Result (25 links):

OK       24
Skipped  0
Warnings 0
Errors   1


The following links could not be resolved:

./README.md (62, 22) => https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions.

from mlc.

becheran avatar becheran commented on June 5, 2024

Ah, right. Did the same mistake and ran it on wrong branch locally 🤦‍♂️

from mlc.

becheran avatar becheran commented on June 5, 2024

@diegorondini would 'Accept-Encoding: *' help in this case? Might be a sane default since we don't care about the content anyways right now.

To make it configurable I think a map of links with wildcards and associated headers would make sense as config parameter. Will think about it.

from mlc.

diegorondini avatar diegorondini commented on June 5, 2024

@becheran well, not literally:

$ curl -i -H "Accept-Encoding: *" -X GET https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions
HTTP/2 403
[...]

The official way to mean any encoding should be Accept-Encoding: */*, but I don't know how much it works in pratice.
https://stackoverflow.com/questions/25182888/does-in-an-http-accepts-encoding-header-mean-gzip-is-supported

The library you're using (reqwest?) may support accepting all encodings. Libcurl does that:
https://curl.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html

Not sure though if servers that don't support compression / encoding peacefully decline the "Accept-Encoding" header.

from mlc.

becheran avatar becheran commented on June 5, 2024

Yes, I am using reqwest. I did turn on all supported encodings (brotli, gzip, deflate) and that did the trick for now. But I guess there are other cases where a custom request is still required. For example if a authentication token is required for a specific link.

from mlc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.