pydsigner / anchovy Goto Github PK

A minimal, unopinionated framework for static website generation.

License: Apache License 2.0

Python 100.00%

anchovy's Introduction

Anchovy

Anchovy is a minimal, unopinionated file-processing framework equipped with a complete static website generation toolkit.

Minimal: Anchovy’s core is around a thousand lines of code and has no mandatory dependencies. Plus, Anchovy can be used for real projects with just a few pip-installable extras, even if you want to preprocess CSS.
Unopinionated: Anchovy offers a set of components which can be easily configured to your site’s exact requirements, without tediously ripping out or overriding entrenched behaviors. Anchovy does not assume you are building a blog or that you wish to design your templates in a specific way. You can even build things that aren’t websites! Plus, Anchovy operates on files, so it’s simple to integrate tools like imagemagick, dart-sass, or less.js if you need them.
Complete: Anchovy comes with a dependency auditing system, allowing you to grab any component you want without installing anything but Anchovy and find out what you will need to run your build. Choose from a wealth of Steps, Anchovy’s modular file processors, for everything from rendering Jinja templates and minifying CSS to unpacking archives and thumbnailing images. Plus, add a few extra parameters or lines of configuration to get automatic intelligent minimum builds based on input checksums, and get a reproducible run artifact to boot— even if you want to fetch HTTP resources or write your own Steps. Iterate quickly by launching a lightweight development-grade web server once the build is complete.

Installation

Anchovy has no essential prerequisites and can be installed with pip install anchovy to get just the framework and a few built-in components, but for typical usage pip install anchovy[base] is recommended. This will pull in support for Jinja2 templating, markdown, minification, and Anchovy’s CSS preprocessor. A full list of available extras may be found in the pyproject.toml file.

Alternatively, Anchovy may be installed directly from source with pip install git+https://github.com/pydsigner/anchovy or the corresponding pip install git+https://github.com/pydsigner/anchovy#egg=anchovy[base].

Command Line Usage

Anchovy operates on config files written in Python, or even modules directly.

python -m anchovy -h
anchovy -m mypackage.anchovyconf -o ../release/
python -m anchovy mysite/anchovy_site.py -- -h

Show Me

Run anchovy examples/code_index.py -s -p 8080, then open a browser to localhost:8080 (or click the link in the console). This example offers the most extensive demonstration of Anchovy’s functionality as of version 1.0.

What’s the Baseline?

Here’s minimal example performing about what the staticjinja markdown example offers:

from pathlib import Path

from anchovy import (
    DirectCopyStep,
    InputBuildSettings,
    JinjaMarkdownStep,
    OutputDirPathCalc,
    REMatcher,
    Rule,
)


# Optional, and can be overridden with CLI arguments.
SETTINGS = InputBuildSettings(
    input_dir=Path('site'),
    working_dir=Path('working'),
    output_dir=Path('build'),
    custody_cache=Path('build-cache.json'),
)
RULES = [
    # Ignore dotfiles found in either the input_dir or the working dir.
    Rule(
        (
            REMatcher(r'(.*/)*\..*', parent_dir='input_dir')
            | REMatcher(r'(.*/)*\..*', parent_dir='working_dir')
        ),
        None
    ),
    # Render markdown files, then stop processing them.
    Rule(
        REMatcher(r'.*\.md'),
        [OutputDirPathCalc('.html'), None],
        JinjaMarkdownStep()
    ),
    # Copy everything else in static/ directories through.
    Rule(
        REMatcher(r'(.*/)*static/.*', parent_dir='input_dir'),
        OutputDirPathCalc(),
        DirectCopyStep()
    ),
]

This example is very simple, but it’s legitimately enough to start with for a small website, and offers an advantage over other minimal frameworks by putting additional batteries within an arm’s reach. If we stored the configuration in config.py and added a raw site like this:

site/
    static/
        styles.css
        toolbar.js
    base.jinja.html
    index.md
    about.md
    contact.md

python -m anchovy config.py would produce output like this:

output/
    static/
        styles.css
        toolbar.js
    index.html
    about.html
    contact.html

This example can be found in runnable form as examples/basic_site.py in the source distribution. Available command line arguments can be seen by passing -h: python -m anchovy examples/basic_site.py -- -h. The -- is required because anchovy itself also accepts the flag.

Programmatic Usage

Anchovy is very usable from the command line, but projects desiring to customize behavior, for example by running tasks before or after pipeline execution, may utilize anchovy.cli.run_from_rules():

import time
from pathlib import Path

from anchovy.cli import run_from_rules
from anchovy.core import Context

from my_site.config import SETTINGS, RULES


class MyContext(Context):
    def find_inputs(path: Path):
        # Only process files modified in the last hour.
        hour_ago = time.time() - 3600
        for candidate in super().find_inputs(path):
            if candidate.stat().st_mtime > hour_ago:
                yield candidate


def main():
    print('Pretending to run pre-pipeline tasks...')
    run_from_rules(SETTINGS, RULES, context_cls=MyContext)
    print('Pretending to run post-pipeline tasks...')


if __name__ == '__main__':
    main()

anchovy's People

Contributors

Stargazers

Watchers

anchovy's Issues

Add support for general matching functions

Right now the only way to match paths is to use regular expressions. Let's generalize this behavior so we can unlock the power of the Paths we're matching against.

Figure out a way to make this matching useful for output path determination.

Add system for reporting/discovering Step dependencies

As currently structured, dependencies are managed by isolating import dependencies into separate modules. This only works for Python dependencies, and limits opportunities to co-locate code with similar behaviors but different dependencies or even to expose groups of Steps through __init__.py modules. Let's move hard requirement checks into the processing stage where possible, and into Step __init__ methods where not, and offer a standardized methodology for exposing and reporting dependencies.

Allow rules to directly control output path calculation

Development server does not recognize a mimetype for WEBP images

The development server detects mimetypes using the stdlib mimetypes guesser in strict mode. However, .webp is not included as a strict mimetype in any Python version at this time. Switching the guesser to non-strict mode will provide behavior more in line with user expectation and make our gallery example work better with the built-in server.

Unit test coverage for core/custody

Add a standard method for general configuration of Anchovy/Step behavior

anchovy.simple.BaseStandardStep has encoding and newline attributes that some users might like to override. There is no simple way to do so without monkey-patching the class or overriding those attributes with a subclass everywhere they're inherited. Likewise, there are CLI arguments for Anchovy that cannot be specified in the Settings class. Supporting a more generalized dictionary — perhaps reworking Settings to be that more generalized option — would allow an obvious way to implement those features in Anchovy itself as well as similar features for those writing their own Steps.

Standardize markdown support

Drop support for markdown libraries other than markdown-it-py
Add support for standard YAML frontmatter in Markdown
Add flag to JinjaExtendedMarkdownStep to enable footnotes plugin

Add smart purge functionality

Checksum inputs, parameters, and anchovy version
Skip input files that match checksums
Make chain of custody work with aggregator steps

Add a font minifier Step

Use cssselect, tinycss2, and lxml/beautifulsoup4 to identify which glyphs are used for each font
Beware of glyphs that may be inserted from JavaScript
Use fonttools.subset to reduce font files to the identified glyphs

Add a JS minifier Step

Right now we don't have any minifiers for JavaScript. minify-html comes with minify-js, but minify-js is known to have correctness issues: https://github.com/privatenumber/minification-benchmarks.

Link rewriter step

Use custody information to track how file names change, then rewrite links to those files accordingly in HTML, CSS, and perhaps JS.

Catch up on documentation

Add missing docstrings for steps
Adding missing docstrings for modules
Improve internal documentation

Remove legacy support for non-class Steps

Add Steps for working with images

Add a CSS pruning step

Use cssselect, tinycss2, and lxml/beautifulsoup4 to identify unused CSS rules and exclude them.

Switch CSS minifier backend to lightningcss

Depending on how quickly attention is given to issues like sprymix/csscompressor#13, sprymix/csscompressor#14, and sprymix/csscompressor#9, we may wish to vendor a custom version of csscompressor for our minification.

See also #32, which this would really require.

Split anchovy_css out into its own package

These steps offer functionality with applicability outside of anchovy. Split the core functionality out into packages and make them dependencies, retaining provision of the Steps themselves inside anchovy.

Fix broken mistletoe markdown support

Class renaming in miyuchina/mistletoe#182 has broken our support for mistletoe 1.2.0+.

Anchovy CSS preprocessor consumes essential whitespaces in media queries

Description:

Any media query with a declaration as an immediate child will have all the whitespace within the declaration removed, leading to problems when more then one value is supplied to the declaration. tinycss will supply a comment to force compliance, but minifiers may strip these out.

Minimal Input:

@media (max-width: 700px) {
    test: 1 2;
}

Expected:

@media (max-width: 700px) {
    test: 1 2;
}

Actual:

@media (max-width: 700px) {
test:1/**/2;}

Add a development webserver

Use http.server.ThreadingHTTPServer to offer a built-in webserver so something extra like nginx isn't needed for testing.

Directly runnable server with configurable directory and port
Automatic index support
Automatic mime type detection
Option to run from anchovy cli post-build
Automatically detect root directory from anchovy settings
Etag support
~~Some sort of way to configure port through anchovy config file?~~
Add some tests

Custody tracking does not check output files for UnpackArchiveStep

Related to #60, #66, and #67. After these three custody fixes, only the target directory for UnpackArchiveStep is being checked for existence, not the the files that come from the the archive. This means that when the purge flag is supplied but the UnpackArchiveStep is outputting into a non-transient directory like a build-cache working dir or output dir, the Step is not marked as needing to be refreshed and the entry_from_path call when skip_step is run errors out on the missing files.

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "<snip>\anchovy\.venv\Scripts\anchovy.exe\__main__.py", line 7, in <module>
  File "<snip>\anchovy\src\anchovy\cli.py", line 225, in main
    run_from_rules(settings, rules, custodian, argv=remaining, prog=f'anchovy {label}')
  File "<snip>\anchovy\src\anchovy\cli.py", line 119, in run_from_rules
    context.run()
  File "<snip>\anchovy\src\anchovy\core.py", line 190, in run
    self.process(input_paths)
  File "<snip>\anchovy\src\anchovy\core.py", line 168, in process
    output_paths = self.custodian.skip_step(path, output_paths)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 234, in skip_step
    for o_entry in map(self.entry_from_path, prior_outputs):
  File "<snip>\anchovy\src\anchovy\custody.py", line 142, in entry_from_path
    stat = path.stat()
           ^^^^^^^^^^^
  File "<snip>\pathlib.py", line 1013, in stat
    return os.stat(self, follow_symlinks=follow_symlinks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'working\\code_index\\basic_site.py'

Live rebuilds

Detect added/changed input files, then rebuild.

Custody tracker can crash on UnpackArchiveStep with customized PathCalc

A Rule like

    Rule(
        REMatcher(r'.*\.zip'),
        [WorkingDirPathCalc(transform=lambda p: p.parent), None],
        UnpackArchiveStep()
    )

Will cause Custody checking to crash on reruns because taking the parent of the target dir of an archive that's being unpacked into the working dir root will result in going outside the working dir. It doesn't appear we need the target dir of the archive to actually be included in the UnpackArchiveStep's custom custody information.

Add Step for fetching external resources

Add a step that uses a config file to pull in an external resource, either from the internet or from the local filesystem.

match_re() can unintuitively capture portions of context.input_dir or context.working_dir

If context.input_dir has a directory starting with . anywhere between its children and the current working directory (for example, .venv/Lib/site-packages/mysite/package_data), it will be filtered out by expressions like match_re(r'(.*/)*\..*') which are intended to exclude dot files only within the input_dir. We should offer an out-of-the-box method for users to limit what the regular expression processes to the contents of input/working directories.

Add featureful Matcher class

Matcher functions are fairly useful, but it would be often helpful to combine them with and or or. We could make a PathMatcher class with __and__ and __or__ implemented so we can get these behaviors. Follow the behavior of and and or for determining which match is sent to the PathCalcs.

Refactor install extras?

Add comprehensive automated regression tests

Right now the testbed for Anchovy consists of manual example site runs with ocular diffs. We need:

More example site configurations.
- Example using working_dir, archive, advanced markdown, minification, and anchovy-css.
A pytest harness to execute configs and diff outputs.
- Find a way to mark outputs as text or binary.
- Find a consistently available diff tool for text outputs.
- Make a tool to store and check hashes for binary outputs.
Updates to the GitHub CI script to execute the tests and check coverage.

Debugger mode

Replace various prints with a more powerful logger
Add DEBUG level logs to Matchers and PathCalcs
Add CLI option to enable DEBUG level logs

Rework Markdown support to enable swappable engines

Replace md_parser and md_renderer arguments to JinjaMarkdownStep with parse and render methods. Default behavior could be to use commonmark, or to detect several different markdown libraries and use them. Good choices are commonmark, markdown-it-py, markdown, and mistletoe.

Anchovy CSS processor produces empty rules

Description:

Any qualified rule whose only children are other qualified rules or at rules will be present as an empty rule in the final output. This is not strictly harmful but is undesirable.

Minimal Input:

.cls {
    @media (max-width: 700px) {
        test: 1;
    }
}

a {
    b {
        padding: 0;
    }
}

Expected:

@media (max-width: 700px) {
    .cls {
        test:1;
    }
}

a b {
    padding: 0;
}

Actual:

.cls {
}
@media (max-width: 700px) {
.cls {
test:1;
}
}

a {
}
a b {
padding: 0;
}

Custody tracking incorrect for UnpackArchiveStep

#61 worked around the issue identified in #60, but actually broke the staleness checker for UnpackArchiveStep, causing it to always get unpacked with this message:

Missing upstream record (examples\code_index\code.zip)...

We will need to revert #61 and pursue a proper fix for #66 instead.

Add Step for unpacking archives

Smooth off edges of Dependency system

anchovy.dependencies.Dependency is currently an instantiable class, but does lookups in anchovy.dependencies.DEPENDENCY_TYPES, has inheritance conflicts with its Or/And children, and is only used through constructor functions anyways. Let's cut down on the functionality of Dependency and make the constructor functions subclasses.

Gallery example thumbnails are broken

caa89b7 in #59 switched from thumbnail optimization with optipng to conversion to webp, but the HTML in the gallery index markdown file was not updated to match.

PathCalc for removing file extensions from webpages

It would be useful to have a PathCalc that turned paths like projects/anchovy.html into projects/anchovy/index.html. This would then effectively hide the .html from the URL, while otherwise maintaining the same structure.

Indexing steps are not marked stale when new indexable files are added

The variation of processed files for most Steps consists in the files gathered by the containing Rule's Matcher, with any other dependencies remaining consistent from one run to another or defined within the entry-point file itself. For example, JinjaExtendedMarkdownStep will enter for each markdown file matched, then pull in either the default template from the Step configuration or the template defined in the markdown file's frontmatter. This explicit connection ensures that the output HTML will be regenerated any time either the markdown or the Jinja template changes.

In contrast, a Step that generates an index of files, like the CodeIndexStep proposed in #59, enters from the Jinja template itself, and establishes custody connections only to markdown files that exist when a fresh run occurs. If the only change that occurs is an added markdown file, the Step will not have a chance to gather the new file, leading to false up-to-date decisions.

Add dynamic PathCalc that enumerates files from a repository or archive

Add a feature-rich Markdown processor

Take advantage of MarkdownIt features and plugins to provide a more complete markdown experience.

Custodian.degenericize_path cannot process bare ContextDirs

The issue mentioned in #60, and worked around by removing directories from custody tracking, can still appear in other places, such as in testing for #65, where a glob in the root directory is resulting in a key of glob_manifest:working_dir:*.py, which includes a 'working_dir' component that must be degenericized. Presently, that results in explosions:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "<snip>\.venv\Scripts\anchovy.exe\__main__.py", line 7, in <module>
  File "<snip>\anchovy\src\anchovy\cli.py", line 232, in main
    run_from_rules(settings, rules, custodian, argv=remaining, prog=f'anchovy {label}')
  File "<snip>\anchovy\src\anchovy\cli.py", line 119, in run_from_rules
    context.run()
  File "<snip>\anchovy\src\anchovy\core.py", line 190, in run
    self.process(input_paths)
  File "<snip>\anchovy\src\anchovy\core.py", line 176, in process
    self.process(further_processing)
  File "<snip>\anchovy\src\anchovy\core.py", line 159, in process
    stale, msg = self.custodian.refresh_needed(path, output_paths)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 310, in refresh_needed
    if not self.check_prior(up_key):
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 277, in check_prior
    return checker(CustodyEntry(ptype, key, pmeta))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "examples\code_index.py", line 77, in glob_manifest_stale
    parent = context.custodian.degenericize_path(path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 119, in degenericize_path
    dir_key = t.cast('ContextDir', str(path.parents[-2]))
                                       ~~~~~~~~~~~~^^^^
  File "<snip>\pathlib.py", line 445, in __getitem__
    raise IndexError(idx)
IndexError: -2

PathCalc that adds hashes to filenames

Add hashes to the ends of filenames to improve caching. Depends on #63 to be useful.

Integration tests utilizing CLI

Unify DirPathCalc and OutputDirPathCalc/WorkingDirPathCalc inheritance structure

It is surprising that DirPathCalc is not the superclass of OutputDirPathCalc and WorkingDirPathCalc. This structure is the result of DirPathCalc not supporting ContextDir keys, which itself has rendered DirPathCalc essentially useless in common operations, which will not care to produce any files beyond the walls of working_dir and output_dir. Both the structural issue and the usefulness issue can be resolved by changing DirPathCalc's path parameter to support Paths or ContextDirs.

Allow Steps to export explicit chain of custody data at runtime

The way the Step API is designed, Steps receive one input path and a variable number of output paths, already identified at the beginning of each processing cycle. This is generally effective, but the edges have been pushed really from the beginning with JinjaMarkdownStep, which reads the input markdown file but also a Jinja template referenced by the markdown file. More recently, ResourcePackerStep completely does away with the input file meaning anything at all except for gathering other files. This all works well enough, but as we look to features like #30, which will require knowing all files that could affect the output of the step, we see that the engine's knowledge is insufficient. Further, proposed functionalities like #27, #28, #34, and #35 are all really going to require output paths calculated by the Steps themselves at runtime. #29 offers a possible workaround, but leans towards duplication of effort. Instead, let's take advantage of the fact we don't return anything from Steps currently. Step.__call__() can be extended to allow either the current None, in which case custody will be handled as at present, or a tuple of two lists of Paths, the first being input and the second being output. Steps which wish to declare only one of the two can easily return the paths that go in as input; this will make Step.__call__() totally symmetrical apart from the input path parameter being singular and the input path return value being plural.

pydsigner / anchovy Goto Github PK

anchovy's Introduction

Anchovy

Installation

Command Line Usage

Show Me

What’s the Baseline?

Programmatic Usage

anchovy's People

Contributors

Stargazers

Watchers

anchovy's Issues

Description:

Minimal Input:

Expected:

Actual:

Description:

Minimal Input:

Expected:

Actual:

Recommend Projects

Recommend Topics

Recommend Org