Code Monkey home page Code Monkey logo

anchovy's Introduction

PyPI - Project Version PyPI - Python Version GitHub - Project License GitHub - Code Size codecov

Anchovy

Anchovy is a minimal, unopinionated file-processing framework equipped with a complete static website generation toolkit.

  • Minimal: Anchovy’s core is around a thousand lines of code and has no mandatory dependencies. Plus, Anchovy can be used for real projects with just a few pip-installable extras, even if you want to preprocess CSS.

  • Unopinionated: Anchovy offers a set of components which can be easily configured to your site’s exact requirements, without tediously ripping out or overriding entrenched behaviors. Anchovy does not assume you are building a blog or that you wish to design your templates in a specific way. You can even build things that aren’t websites! Plus, Anchovy operates on files, so it’s simple to integrate tools like imagemagick, dart-sass, or less.js if you need them.

  • Complete: Anchovy comes with a dependency auditing system, allowing you to grab any component you want without installing anything but Anchovy and find out what you will need to run your build. Choose from a wealth of Steps, Anchovy’s modular file processors, for everything from rendering Jinja templates and minifying CSS to unpacking archives and thumbnailing images. Plus, add a few extra parameters or lines of configuration to get automatic intelligent minimum builds based on input checksums, and get a reproducible run artifact to boot— even if you want to fetch HTTP resources or write your own Steps. Iterate quickly by launching a lightweight development-grade web server once the build is complete.

Installation

Anchovy has no essential prerequisites and can be installed with pip install anchovy to get just the framework and a few built-in components, but for typical usage pip install anchovy[base] is recommended. This will pull in support for Jinja2 templating, markdown, minification, and Anchovy’s CSS preprocessor. A full list of available extras may be found in the pyproject.toml file.

Alternatively, Anchovy may be installed directly from source with pip install git+https://github.com/pydsigner/anchovy or the corresponding pip install git+https://github.com/pydsigner/anchovy#egg=anchovy[base].

Command Line Usage

Anchovy operates on config files written in Python, or even modules directly.

  • python -m anchovy -h
  • anchovy -m mypackage.anchovyconf -o ../release/
  • python -m anchovy mysite/anchovy_site.py -- -h

Show Me

Run anchovy examples/code_index.py -s -p 8080, then open a browser to localhost:8080 (or click the link in the console). This example offers the most extensive demonstration of Anchovy’s functionality as of version 1.0.

What’s the Baseline?

Here’s minimal example performing about what the staticjinja markdown example offers:

from pathlib import Path

from anchovy import (
    DirectCopyStep,
    InputBuildSettings,
    JinjaMarkdownStep,
    OutputDirPathCalc,
    REMatcher,
    Rule,
)


# Optional, and can be overridden with CLI arguments.
SETTINGS = InputBuildSettings(
    input_dir=Path('site'),
    working_dir=Path('working'),
    output_dir=Path('build'),
    custody_cache=Path('build-cache.json'),
)
RULES = [
    # Ignore dotfiles found in either the input_dir or the working dir.
    Rule(
        (
            REMatcher(r'(.*/)*\..*', parent_dir='input_dir')
            | REMatcher(r'(.*/)*\..*', parent_dir='working_dir')
        ),
        None
    ),
    # Render markdown files, then stop processing them.
    Rule(
        REMatcher(r'.*\.md'),
        [OutputDirPathCalc('.html'), None],
        JinjaMarkdownStep()
    ),
    # Copy everything else in static/ directories through.
    Rule(
        REMatcher(r'(.*/)*static/.*', parent_dir='input_dir'),
        OutputDirPathCalc(),
        DirectCopyStep()
    ),
]

This example is very simple, but it’s legitimately enough to start with for a small website, and offers an advantage over other minimal frameworks by putting additional batteries within an arm’s reach. If we stored the configuration in config.py and added a raw site like this:

site/
    static/
        styles.css
        toolbar.js
    base.jinja.html
    index.md
    about.md
    contact.md

python -m anchovy config.py would produce output like this:

output/
    static/
        styles.css
        toolbar.js
    index.html
    about.html
    contact.html

This example can be found in runnable form as examples/basic_site.py in the source distribution. Available command line arguments can be seen by passing -h: python -m anchovy examples/basic_site.py -- -h. The -- is required because anchovy itself also accepts the flag.

Programmatic Usage

Anchovy is very usable from the command line, but projects desiring to customize behavior, for example by running tasks before or after pipeline execution, may utilize anchovy.cli.run_from_rules():

import time
from pathlib import Path

from anchovy.cli import run_from_rules
from anchovy.core import Context

from my_site.config import SETTINGS, RULES


class MyContext(Context):
    def find_inputs(path: Path):
        # Only process files modified in the last hour.
        hour_ago = time.time() - 3600
        for candidate in super().find_inputs(path):
            if candidate.stat().st_mtime > hour_ago:
                yield candidate


def main():
    print('Pretending to run pre-pipeline tasks...')
    run_from_rules(SETTINGS, RULES, context_cls=MyContext)
    print('Pretending to run post-pipeline tasks...')


if __name__ == '__main__':
    main()

anchovy's People

Contributors

pydsigner avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

anchovy's Issues

Add support for general matching functions

Right now the only way to match paths is to use regular expressions. Let's generalize this behavior so we can unlock the power of the Paths we're matching against.

  • Figure out a way to make this matching useful for output path determination.

Add system for reporting/discovering Step dependencies

As currently structured, dependencies are managed by isolating import dependencies into separate modules. This only works for Python dependencies, and limits opportunities to co-locate code with similar behaviors but different dependencies or even to expose groups of Steps through __init__.py modules. Let's move hard requirement checks into the processing stage where possible, and into Step __init__ methods where not, and offer a standardized methodology for exposing and reporting dependencies.

Development server does not recognize a mimetype for WEBP images

The development server detects mimetypes using the stdlib mimetypes guesser in strict mode. However, .webp is not included as a strict mimetype in any Python version at this time. Switching the guesser to non-strict mode will provide behavior more in line with user expectation and make our gallery example work better with the built-in server.

Add a standard method for general configuration of Anchovy/Step behavior

anchovy.simple.BaseStandardStep has encoding and newline attributes that some users might like to override. There is no simple way to do so without monkey-patching the class or overriding those attributes with a subclass everywhere they're inherited. Likewise, there are CLI arguments for Anchovy that cannot be specified in the Settings class. Supporting a more generalized dictionary — perhaps reworking Settings to be that more generalized option — would allow an obvious way to implement those features in Anchovy itself as well as similar features for those writing their own Steps.

Standardize markdown support

  • Drop support for markdown libraries other than markdown-it-py
  • Add support for standard YAML frontmatter in Markdown
  • Add flag to JinjaExtendedMarkdownStep to enable footnotes plugin

Add smart purge functionality

  • Checksum inputs, parameters, and anchovy version
  • Skip input files that match checksums
  • Make chain of custody work with aggregator steps

Add a font minifier Step

  • Use cssselect, tinycss2, and lxml/beautifulsoup4 to identify which glyphs are used for each font
  • Beware of glyphs that may be inserted from JavaScript
  • Use fonttools.subset to reduce font files to the identified glyphs

Link rewriter step

Use custody information to track how file names change, then rewrite links to those files accordingly in HTML, CSS, and perhaps JS.

Catch up on documentation

  • Add missing docstrings for steps
  • Adding missing docstrings for modules
  • Improve internal documentation

Add a CSS pruning step

Use cssselect, tinycss2, and lxml/beautifulsoup4 to identify unused CSS rules and exclude them.

Split anchovy_css out into its own package

These steps offer functionality with applicability outside of anchovy. Split the core functionality out into packages and make them dependencies, retaining provision of the Steps themselves inside anchovy.

Anchovy CSS preprocessor consumes essential whitespaces in media queries

Description:

Any media query with a declaration as an immediate child will have all the whitespace within the declaration removed, leading to problems when more then one value is supplied to the declaration. tinycss will supply a comment to force compliance, but minifiers may strip these out.

Minimal Input:

@media (max-width: 700px) {
    test: 1 2;
}

Expected:

@media (max-width: 700px) {
    test: 1 2;
}

Actual:

@media (max-width: 700px) {
test:1/**/2;}

Add a development webserver

Use http.server.ThreadingHTTPServer to offer a built-in webserver so something extra like nginx isn't needed for testing.

  • Directly runnable server with configurable directory and port
  • Automatic index support
  • Automatic mime type detection
  • Option to run from anchovy cli post-build
  • Automatically detect root directory from anchovy settings
  • Etag support
  • Some sort of way to configure port through anchovy config file?
  • Add some tests

Custody tracking does not check output files for UnpackArchiveStep

Related to #60, #66, and #67. After these three custody fixes, only the target directory for UnpackArchiveStep is being checked for existence, not the the files that come from the the archive. This means that when the purge flag is supplied but the UnpackArchiveStep is outputting into a non-transient directory like a build-cache working dir or output dir, the Step is not marked as needing to be refreshed and the entry_from_path call when skip_step is run errors out on the missing files.

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "<snip>\anchovy\.venv\Scripts\anchovy.exe\__main__.py", line 7, in <module>
  File "<snip>\anchovy\src\anchovy\cli.py", line 225, in main
    run_from_rules(settings, rules, custodian, argv=remaining, prog=f'anchovy {label}')
  File "<snip>\anchovy\src\anchovy\cli.py", line 119, in run_from_rules
    context.run()
  File "<snip>\anchovy\src\anchovy\core.py", line 190, in run
    self.process(input_paths)
  File "<snip>\anchovy\src\anchovy\core.py", line 168, in process
    output_paths = self.custodian.skip_step(path, output_paths)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 234, in skip_step
    for o_entry in map(self.entry_from_path, prior_outputs):
  File "<snip>\anchovy\src\anchovy\custody.py", line 142, in entry_from_path
    stat = path.stat()
           ^^^^^^^^^^^
  File "<snip>\pathlib.py", line 1013, in stat
    return os.stat(self, follow_symlinks=follow_symlinks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'working\\code_index\\basic_site.py'

Custody tracker can crash on UnpackArchiveStep with customized PathCalc

A Rule like

    Rule(
        REMatcher(r'.*\.zip'),
        [WorkingDirPathCalc(transform=lambda p: p.parent), None],
        UnpackArchiveStep()
    )

Will cause Custody checking to crash on reruns because taking the parent of the target dir of an archive that's being unpacked into the working dir root will result in going outside the working dir. It doesn't appear we need the target dir of the archive to actually be included in the UnpackArchiveStep's custom custody information.

match_re() can unintuitively capture portions of context.input_dir or context.working_dir

If context.input_dir has a directory starting with . anywhere between its children and the current working directory (for example, .venv/Lib/site-packages/mysite/package_data), it will be filtered out by expressions like match_re(r'(.*/)*\..*') which are intended to exclude dot files only within the input_dir. We should offer an out-of-the-box method for users to limit what the regular expression processes to the contents of input/working directories.

Add featureful Matcher class

Matcher functions are fairly useful, but it would be often helpful to combine them with and or or. We could make a PathMatcher class with __and__ and __or__ implemented so we can get these behaviors. Follow the behavior of and and or for determining which match is sent to the PathCalcs.

Add comprehensive automated regression tests

Right now the testbed for Anchovy consists of manual example site runs with ocular diffs. We need:

  • More example site configurations.
    • Example using working_dir, archive, advanced markdown, minification, and anchovy-css.
  • A pytest harness to execute configs and diff outputs.
    • Find a way to mark outputs as text or binary.
    • Find a consistently available diff tool for text outputs.
    • Make a tool to store and check hashes for binary outputs.
  • Updates to the GitHub CI script to execute the tests and check coverage.

Debugger mode

  • Replace various prints with a more powerful logger
  • Add DEBUG level logs to Matchers and PathCalcs
  • Add CLI option to enable DEBUG level logs

Rework Markdown support to enable swappable engines

Replace md_parser and md_renderer arguments to JinjaMarkdownStep with parse and render methods. Default behavior could be to use commonmark, or to detect several different markdown libraries and use them. Good choices are commonmark, markdown-it-py, markdown, and mistletoe.

Anchovy CSS processor produces empty rules

Description:

Any qualified rule whose only children are other qualified rules or at rules will be present as an empty rule in the final output. This is not strictly harmful but is undesirable.

Minimal Input:

.cls {
    @media (max-width: 700px) {
        test: 1;
    }
}
a {
    b {
        padding: 0;
    }
}

Expected:

@media (max-width: 700px) {
    .cls {
        test:1;
    }
}
a b {
    padding: 0;
}

Actual:

.cls {
}
@media (max-width: 700px) {
.cls {
test:1;
}
}
a {
}
a b {
padding: 0;
}

Custody tracking incorrect for UnpackArchiveStep

#61 worked around the issue identified in #60, but actually broke the staleness checker for UnpackArchiveStep, causing it to always get unpacked with this message:

Missing upstream record (examples\code_index\code.zip)...

We will need to revert #61 and pursue a proper fix for #66 instead.

Smooth off edges of Dependency system

anchovy.dependencies.Dependency is currently an instantiable class, but does lookups in anchovy.dependencies.DEPENDENCY_TYPES, has inheritance conflicts with its Or/And children, and is only used through constructor functions anyways. Let's cut down on the functionality of Dependency and make the constructor functions subclasses.

PathCalc for removing file extensions from webpages

It would be useful to have a PathCalc that turned paths like projects/anchovy.html into projects/anchovy/index.html. This would then effectively hide the .html from the URL, while otherwise maintaining the same structure.

Indexing steps are not marked stale when new indexable files are added

The variation of processed files for most Steps consists in the files gathered by the containing Rule's Matcher, with any other dependencies remaining consistent from one run to another or defined within the entry-point file itself. For example, JinjaExtendedMarkdownStep will enter for each markdown file matched, then pull in either the default template from the Step configuration or the template defined in the markdown file's frontmatter. This explicit connection ensures that the output HTML will be regenerated any time either the markdown or the Jinja template changes.

In contrast, a Step that generates an index of files, like the CodeIndexStep proposed in #59, enters from the Jinja template itself, and establishes custody connections only to markdown files that exist when a fresh run occurs. If the only change that occurs is an added markdown file, the Step will not have a chance to gather the new file, leading to false up-to-date decisions.

Add a feature-rich Markdown processor

Take advantage of MarkdownIt features and plugins to provide a more complete markdown experience.

  • toml frontmatter
  • attrs
  • tables
  • custom containers
  • code highlighting with pygments
  • optional wordcount
  • optional templating/variable substitution
  • optional typography substitutions

Custodian.degenericize_path cannot process bare ContextDirs

The issue mentioned in #60, and worked around by removing directories from custody tracking, can still appear in other places, such as in testing for #65, where a glob in the root directory is resulting in a key of glob_manifest:working_dir:*.py, which includes a 'working_dir' component that must be degenericized. Presently, that results in explosions:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "<snip>\.venv\Scripts\anchovy.exe\__main__.py", line 7, in <module>
  File "<snip>\anchovy\src\anchovy\cli.py", line 232, in main
    run_from_rules(settings, rules, custodian, argv=remaining, prog=f'anchovy {label}')
  File "<snip>\anchovy\src\anchovy\cli.py", line 119, in run_from_rules
    context.run()
  File "<snip>\anchovy\src\anchovy\core.py", line 190, in run
    self.process(input_paths)
  File "<snip>\anchovy\src\anchovy\core.py", line 176, in process
    self.process(further_processing)
  File "<snip>\anchovy\src\anchovy\core.py", line 159, in process
    stale, msg = self.custodian.refresh_needed(path, output_paths)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 310, in refresh_needed
    if not self.check_prior(up_key):
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 277, in check_prior
    return checker(CustodyEntry(ptype, key, pmeta))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "examples\code_index.py", line 77, in glob_manifest_stale
    parent = context.custodian.degenericize_path(path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 119, in degenericize_path
    dir_key = t.cast('ContextDir', str(path.parents[-2]))
                                       ~~~~~~~~~~~~^^^^
  File "<snip>\pathlib.py", line 445, in __getitem__
    raise IndexError(idx)
IndexError: -2

Unify DirPathCalc and OutputDirPathCalc/WorkingDirPathCalc inheritance structure

It is surprising that DirPathCalc is not the superclass of OutputDirPathCalc and WorkingDirPathCalc. This structure is the result of DirPathCalc not supporting ContextDir keys, which itself has rendered DirPathCalc essentially useless in common operations, which will not care to produce any files beyond the walls of working_dir and output_dir. Both the structural issue and the usefulness issue can be resolved by changing DirPathCalc's path parameter to support Paths or ContextDirs.

Allow Steps to export explicit chain of custody data at runtime

The way the Step API is designed, Steps receive one input path and a variable number of output paths, already identified at the beginning of each processing cycle. This is generally effective, but the edges have been pushed really from the beginning with JinjaMarkdownStep, which reads the input markdown file but also a Jinja template referenced by the markdown file. More recently, ResourcePackerStep completely does away with the input file meaning anything at all except for gathering other files. This all works well enough, but as we look to features like #30, which will require knowing all files that could affect the output of the step, we see that the engine's knowledge is insufficient. Further, proposed functionalities like #27, #28, #34, and #35 are all really going to require output paths calculated by the Steps themselves at runtime. #29 offers a possible workaround, but leans towards duplication of effort. Instead, let's take advantage of the fact we don't return anything from Steps currently. Step.__call__() can be extended to allow either the current None, in which case custody will be handled as at present, or a tuple of two lists of Paths, the first being input and the second being output. Steps which wish to declare only one of the two can easily return the paths that go in as input; this will make Step.__call__() totally symmetrical apart from the input path parameter being singular and the input path return value being plural.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.