Code Monkey home page Code Monkey logo

salted's People

Contributors

ruedigervoigt avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

salted's Issues

Switch to SQLAlchemy

As salted uses sqlite as a cache and does not do anything low-level, using SQLAlchemy seems a good choice.

Create an optional GUI

Create a simple Graphical User Interface that can be opened via the command line parameter --gui.

Due to the licenses and their deep integration into Python Tkinter or wxPython seem to be good choices.

Handle mailto: Links

Hyperlinks might contain mailto:mailadress in the href field.

  • Ignore those or do basic check for validity.
  • Beware: sometimes those mailto links are scrambeld in raw HTML to avoid that spammers extract mail addresses.

Set user agent

Some servers behave differently if the user agent discloses this is a Python script. For example they will not answer HTTP requests because they suspect malicious intent.

  1. salted should by default sent an user agent like "salted - validating hyperlinks"

  2. The user should be able to set the UA string to impersonate a browser if necessary (for example if the rate of false results is too high).

Case sensitive match

This code in parser.py:

        soup = BeautifulSoup(file_content, 'html.parser')
        for link in soup.find_all('a'):

does not match if code falsely uses upper case <A HREF='https://www.example.com/'>.. - like for example the export function from the Chrome bookmark manager.

The simple solution would be to convert everything to lowercase.
This would also mean faster matching.
Yet it looks wrong in the report.

Use a config file with CLI overrides

The number of features has grown and likely will grow even more. Therefore salted should use a config file. Simply running from the CLI should produces a result. The config file could be version controlled by the user.

  • Look for a config file in the current working directory.
  • Make it possible to provide the path to an alternative config file
  • If there is a config file, read it.
  • Catch reading error.
  • Catch if the config file is corrupted.
  • Check if CLI parameters overwrite parts of the config file.
  • Check if the config (file + CLI) misses flags. Set reasonable defaults.
  • Check if all strictly necessary parameters are set.
  • salted can be run from the CLI without creating an object.

Check markdown files

Salted should be able to check links in markdown files.

  • The GitHub markdown syntax [link text](url) is easy to parse.
  • The pandoc dialect of markdown additionally knows the automatic link <url> and it can include raw HTML into a markdown document.

Add support for .bib files

Salted has basic support for .tex files, but in scientific projects most links are stored in a BibTeX literature database.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.