Code Monkey home page Code Monkey logo

deadlinkchecker's Introduction

Dead link checker

An app that checks for deadlinks on Wikipedia. its is currently deployed at https://deadlinkchecker.toolforge.org/
The documentation of the tool is available on Meta at https://meta.wikimedia.org/wiki/Dead_Link_Checker

Getting started:

  1. Clone the repository
git clone https://github.com/WikiMovimentoBrasil/deadlinkchecker.git
  1. Enter the project directory
cd deadlinkchecker
  1. Create a virtual environment
python -m venv venv
  1. Activate the virtual environmeny
.\venv\Scripts\activate
  1. Install project dependencies
pip install -r requirements.txt
  1. create a .env file in the root of your project and add to it the following variables
SOCIAL_AUTH_MEDIAWIKI_KEY= Your oauth key from the oauth consumer registration
SOCIAL_AUTH_MEDIAWIKI_SECRET= Your oauth secret from the oauth consumer registration
SOCIAL_AUTH_MEDIAWIKI_URL=https://meta.wikimedia.org/w/index.php
SESSION_SECRET= a randomly generated secret value
TOOL_TOOLSDB_USER= Tools database user
TOOL_TOOLSDB_PASSWORD= Tools database secret
REDIS_URL= Redis URL
REDIS_PREFIX= Redis key prefix
  1. Start the app in development
uvicorn app:app --reload

deadlinkchecker's People

Contributors

alwoch avatar albertoleoncio avatar justman100 avatar lnbelo avatar

Stargazers

 avatar Harshita Roonwal avatar Mike Peel avatar

Forkers

alwoch

deadlinkchecker's Issues

Script throws an error on pages with 100 or more links

The script splits the links into batches of 10 / 15 (This can be changed) when there are more than 15 links on a page. Notably, its only able to return responses for the first 2 when the batch size is set to 10 even though it sends the 3rd batch to the server. when given a batch size of 15, it only returns the first batch and the second batch fails with the response sent

Sometimes it returns 500

Traceback (most recent call last):
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/base.py", line 189, in __call__
    with collapse_excgroups():
  File "/layers/heroku_python/python/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/_utils.py", line 91, in collapse_excgroups
    raise exc
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/base.py", line 191, in __call__
    response = await self.dispatch_func(request, call_next)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/app.py", line 63, in session_middleware
    response = await call_next(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/base.py", line 165, in call_next
    raise app_exc
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/base.py", line 151, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/cors.py", line 91, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/cors.py", line 146, in simple_response
    await self.app(scope, receive, send)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/sessions.py", line 83, in __call__
    await self.app(scope, receive, send_wrapper)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/routing.py", line 762, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/routing.py", line 782, in app
    await route.handle(scope, receive, send)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/routes/linkchecker.py", line 102, in check_links
    results = await asyncio.gather(*tasks)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/routes/linkchecker.py", line 36, in make_request
    response = await client.head(url[1])
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpx/_client.py", line 1844, in head
    return await self.request(
           ^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpx/_client.py", line 1559, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpx/_client.py", line 1646, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpx/_client.py", line 1674, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpx/_client.py", line 1711, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpx/_client.py", line 1748, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpx/_transports/default.py", line 371, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 268, in handle_async_request
    raise exc
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 251, in handle_async_request
    response = await connection.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpcore/_async/connection.py", line 99, in handle_async_request
    raise exc
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpcore/_async/connection.py", line 76, in handle_async_request
    stream = await self._connect(request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpcore/_async/connection.py", line 156, in _connect
    stream = await stream.start_tls(**kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 78, in start_tls
    raise exc
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 69, in start_tls
    ssl_stream = await anyio.streams.tls.TLSStream.wrap(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/anyio/streams/tls.py", line 132, in wrap
    await wrapper._call_sslobject_method(ssl_object.do_handshake)
  File "/layers/heroku_python/dependencies/lib/python3.12/site-packages/anyio/streams/tls.py", line 172, in _call_sslobject_method
    raise EndOfStream from None
anyio.EndOfStream

Create a frontend page

We need a public page for the website, so when someone opens the simple URL (without get parameters) they can understand its functionality. It is important that it has some information:

  • Instructions for installation and use
  • It is a fork of deadlinkfinder (it is important to give credit to the original author)
  • It is a request from the Lusophone technological wishlist
  • It was developed within the scope of the Outreachy project
  • It was developed by Wiki Movimento Brasil

Add caching using sqlite3 database

Description

A cache to be set up using sqlite3 database such that when a link is provided in the query params, Its queried in the database and the result is returned.

Tasks:

  • Write a function that checks the database for the existence of the link in the database and if it exists, return the result in the database
  • Add the link to the database if it does not exist
  • Dynamically clear links that have lasted more than 24 hours in the database

Create a server for the script

Frame-work: Flask
create a server to accept a query string a parameter and run a script to make a request to the link to obtain the status code of the link

Define app.py as an entry point

The default uwsgi configuration for the uwsgi webservice backend expects to find the uwsgi entry point as the variable app loaded from the $HOME/www/python/src/app.py module. If your application has another entry point, the easiest thing to do is create a $HOME/www/python/src/app.py module, import your entry point, and expose it as app.

Source: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Python#Using_a_uWSGI_app_with_a_default_entry_point_that_is_not_app.py

Multiple statuses are indicated on a single link with more than one instance

Description

Where there is more than a single instance of a particular link on a page, there's a tendency to append the status codes of all its occurrences to the first instance for example in the picture below, link 14 is the same as link 17
Screenshot (17)

Screenshot (16)

TODO

pick the index of the link and append its status to it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.