whatwg / html-build Goto Github PK

View Code? Open in Web Editor NEW

61.0 61.0 61.0 3.15 MB

Build scripts for https://github.com/whatwg/html

License: Other

Perl 0.15% Python 0.70% Shell 6.99% HTML 72.61% Dockerfile 0.51% NASL 1.23% Rust 17.82%

html-build's People

Contributors

Stargazers

Watchers

Forkers

hober wakaba ritsyy riverspirit lunaraven13 theaulait runt18 defunctzombie xidachen kublaj kleopatra999 digideskio cgnkev alpah7 markac100 htmlparseerrorwg inikulin homemaker1963 acidburn0zzz juanbzpy zakerinasab surinder83singh mustaqahmed omunroe-com kenrussell dongseok1985 dalavancloud pebsconsulting 0xrustlang andronwashere acpassarella yiyix ms2ger tomtt leobalter andreubotella davidpeche imhele surma-dump global-localhost global19 global19-atlassian-net johnrommelcarrascal1924 kisaragi-hiu rcocco seanpm2001 seanwallawalla-forks yourc0der var52yt josepharhar anhy999 samkenxstream phoenixdigitalfx jeremyroman rakhithjk samb dbaron mangwu tawandamoyo

html-build's Issues

Add a lint check for trailing whitespace

Some was introduced recently. It would be good to avoid it.

Spell out all command line flags

I once saw some advice that really stuck with me. It went something like, "when writing commands meant to be read later by others, use the spelled out version. Your future readers, including yourself, will better be able to understand it, and it costs nothing." The idea being that the short versions are to save you time when typing manually on the command line, but not as appropriate when writing a script.

This would be good to keep in mind as we edit the build script.

Drop wget dependency

Currently, we depend on both curl and wget. This is redundant, so we should pick just one. Personally, I'd prefer we drop wget since curl ships by default with OS X.

I'll take a stab at this later this week.

Fix MathML/HTML entity divergence

html5lib/html5lib-tests#71 showed up the fact that MathML defines &tdot;, &TripleDot;, &DotDot;, and &DownBreve; differently to HTML.

This seems to be because we don't match the behaviour of https://github.com/w3c/xml-entities/blob/gh-pages/entities.xsl#L174 (the template starting with <xsl:template match="entity">; note this is XSLT 2 so isn't supported in that many places!) in .entity-processor.py and .entity-processor-json.py (why oh why do we have two different files with so much duplicated code?).

/cc @fred-wang @davidcarlisle

Workflow is not good for people without push access to whatwg/html who want to do PRs

Because it default clones whatwg/html, if I directed a first-time contributor here, the instructions in the readme would get them stuck, unable to push after making changes.

I'd suggest a prompt in build.sh that asks them where to check out from. I am thinking:

Didn't find the HTML source on your system...
Enter where you would like to clone it from (GitHub username or URL):

(URL detection can be done by looking for :.)

Alternately a quick fix is just to update the readme to suggest checking out your fork in a sibling directory before building for the first time.

Restoring print.pdf

Given that we now use rsync we'd have to exclude the .cgi specifically or maybe we can exclude *.cgi to avoid revealing the filename? And I guess after rsync we should wget/curl the relevant remote URL? Is that okay to be public?

(And sorry for breaking this again without an upfront plan.)

cc @domenic @izh1979

Integrate with Shepherd

Given https://lists.w3.org/Archives/Public/www-archive/2015Aug/0013.html this might actually be doable, but would probably be quite a bit of work.

Get a copy of wattsi and put the resulting binary in your PATH

Should the wattsi installer not take care of this?

Server folders are not properly cleared

@zcorpan discovered that we have multiple 404.html and .htaccess resources, despite us having removed those from the build script a while back.

It seems this is due to us using rsync htmlbuild/update-spec.sh without the --remove argument.

@domenic would it be safe to add that comment? Do you want to do that?

Also, should we have htmlbuild/ in version control somewhere?

Integrate build scripts with whatwg/html.

It seems strange to me that the build scripts are separate from the only source file that will ever use them. I'd suggest integrating them with the main HTML repository (perhaps in a build/ subdirectory) so that symlinks or copying of source files is no longer a necessary step in the process.

Can we move the whole HTML build process to Travis

I've lost track a bit about how the various pieces fit together, but it seems whatwg/html already uses Travis to build the HTML Standard. Why not add some rsync/scp at the end of that and remove all scripts from the server? Having the server just host static resources seems much better.

Reduce file duplication

Due to

ln -s ../images $HTML_TEMP/wattsi-output/multipage-html/
ln -s ../link-fixup.js $HTML_TEMP/wattsi-output/multipage-html/
ln -s ../entities.json $HTML_TEMP/wattsi-output/multipage-html/

these files end up duplicated on our server. I would prefer we stop doing that and just refer to /{resource} instead as we already do for /404.html.

entities.json seems to require a change in whatwg/html. images/ too. link-fixup.js might require a change in wattsi since it's only used by multipage.

Thoughts?

Replace egrep dependency with grep -E

Seems like it would work https://unix.stackexchange.com/questions/17949/what-is-the-difference-between-grep-egrep-and-fgrep

.post-process-partial-backlink-generator.pl hard codes "NavigatorID"

See whatwg/html#882 and #86. It should instead do something similar to detecting a  comment, but @zcorpan says that such comments are stripped out at an earlier stage in the pipeline, so this needs some investigating.

When using Wattsi server, on success, Wattsi warnings are not conveyed to the user

IRC chatter says so, at least.

PRs to https://github.com/domenic/wattsi-server welcome. I'll probably work on this tomorrow otherwise. Currently I'm thinking the best approach is to include an output.txt in the zip file produced by the server?

Build tools should check to see if they are outdated before running

Given all the recent changes, I am a bit worried about people with outdated build tools running into problems.

If not run with --no-update, I think we should check for updates. I see a few options on how to implement this:

Find some GitHub website or API endpoint that will tell us the latest commit hash. Pro: doesn't modify the user's local checkout at all. Con: I haven't found one in 30 seconds of searching so maybe it doesn't exist.
Do git fetch then check against origin/master's HEAD revision. Potential minor cons: doesn't work if you checked out the build tools with a different remote name (like whatwg/master instead of origin/master), and does modify your local git checkout, which might be unexpected.

If we find an update or any other mismatch with origin/master's HEAD I think we should warn and give instructions. (Not error, and not auto-update.)

Cut down on build dependencies (data files and binaries)

A few things are currently less than ideal:

A lot of data needs to be downloaded, and it's all thrown away when "Build tools have been updated since last run; clearing the cache."
.entity-proccessor.py says "this uses 658 MB and in fact I cannot run it on my VPS.
Users need to install Subversion and Perl's XML::Parser, which are likely not pre-installed.
(minor) We don't track dependencies, so builds are not reproducible.

Wouldn't it be nice if building were just blazing fast by default, and rebuilding dependencies was an option that should rarely be used?

It looks like the files that are eventually used are only these 6:

caniuse.json
cldr.inc
entities-dtd.url
entities.inc
entities.json
w3cbugs.csv

Together they are only 1.7 MB, or 282 kB gzipped. That's a lot of room for saving.

Rough proposal:

Separate out the scripts for building these dependencies so that they can easily be built without also building the spec.
Set up an automatically updated html-build-deps repo that has the output.
In build.sh, by default use the html-build-deps repo, but have an option to generate from scratch.
(maybe) Track the exact html-build-deps commit to use, using either submodules or a DEPS file.

Related issues:
#24
#38
#55
#60 (would be made obsolete)

ID generation seems wrong in new build process??

The dev edition subdfns are taking over the main IDs. See e.g. https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#dom-fae-form-2.

Ugh, I can tell this is not going to be fun to fix.

Ideas for better continuous integration and deployment

Our current CI/CD situation works surprisingly well, but is rather hacked together. It consists basically of using GitHub webhooks against some hand-crafted CGI which then git pulls from master, does a build and deploy.

What I would like to improve:

Contributors should be able to see the build results, including any errors, and the output html file. This should be linked in the GitHub interface, with a green checkmark or red X, like Travis does.
If the build/deployment gets broken on master, the team (= editors + MikeSmith + anyone else interested) should get an email.
It should be easy for the team to view the build status over time in a dashboard, similar to Travis CI.
Built results for master should be committed to a separate repository (whatwg/html-output?) for easy change-tracking, as has been requested a few times.
(Stretch goal) commit snapshots should be uploaded to html.spec.whatwg.org/commit-snapshots/, similar to https://streams.spec.whatwg.org/commit-snapshots/.

I think to do this properly we are probably going to want to learn about Jenkins or TeamCity and use one of those. I would love to use Travis CI, because the UI is great and I'm familiar with it, but we have so much tooling to install on each build that it doesn't seem terribly feasible, and my desire to e.g. deploy output snapshots for PRs seems beyond Travis's capabilities. Maybe their paid plan has this flexibility, but it's costly if I recall.

If people in the community are knowledgeable about this kind of CI/CD work and would like to help guide us, or even help us get it set up, we'd be very grateful.

Minor cleanups after #62

Opening this so we don't forget.

Stop copying things into $HTML_CACHE, and just use them directly
Add a small section to the top-level README.md explaining what the quotes/ and entities/ directories are about.
Stop using HTML_* environment variables, pass arguments instead. (a natural part of "stop copying")
Figure out how to monitor for changes to import to quotes/ and entities/

... anything else?

Look into HTML diffs

Once #103 is fixed we should have another look at integrating with @tobie's tool to provide diffs for changes to the HTML Standard. One way we could do this is offer only diffs for the multipage documents that changed. That will require a somewhat custom setup unfortunately, but I don't think there's a way around that for the HTML Standard at this point.

Make build.sh clear the cache folder if the build scripts have been updated

See whatwg/html#324 for details. I guess the problem is that when we make fixes to the build script we don't clear the cache. So maybe this is just a simple fix on the server.

interface Document lost its "// also has obsolete members"

I bet it was pattern matching looking for /* sealed */.

realpath doesn't exist on OSX

But it can be installed via

brew install coreutils

Design for output/input/etc.-splitting

We kind of mentioned this in #3 and @sideshowbarker is working on it. But let's outline what I envision a bit more explicitly.

Build script parameters:

source, corresponding to https://github.com/whatwg/html
cache, a cache directory where we store cached things (see below) to make incremental builds faster and to avoid network access
output, where it will put the final output files (currently listed in the readme)

Cache should contain:

cldr checkout. Currently in .cldr-data
w3cbugs.csv. Currently re-fetched every time as w3cbugs.csv
caniuse.json. Currently re-fetched every time as caniuse.json
entities.inc, entities-dtd.url, cldr.inc: currently created via fun scripts. These are included in the spec via  comments.
entities.json: Currently created via fun scripts. This is one of the output files.

All other files in https://github.com/whatwg/html-build/blob/master/.gitignore seem to be intermediate build files, and should ideally go in a temporary directory, not in the cache folder (these are distinct concepts).

Record the log somewhere visible

In particular it would be interesting if we could get alerts somehow when something is checked in that results in new errors or new XXX comments.

redact location.ancestorOrigins according to Referrer Policy

@bzbarsky @dakami and I had a hallway discussion at the end of TPAC about the possibility of adding location.ancestorOrigins to Firefox. bz has had longstanding concerns about the information this leaks to child frames. We arrived at a local consensus that any leakage is roughly equivalent to what happens already with referrer, so it would make sense to redact ancestorOrigins according to referrer policy. (and this could resolve that objection to a Mozilla implementation of ancestorOrigins)

/cc @smaug---- @annevk

All interfaces not generated

https://html.spec.whatwg.org/multipage/indices.html#all-interfaces has "INSERT INTERFACES HERE" rather than a list of interfaces. Works fine single-page.

Optionally validate the build output

If we at least validate the output when merging pull requests, we would not accumulate small issues like in whatwg/html#649

Would make a lot of sense together with #46

Serious errors will be caught quickly anyway, this is mostly a matter of appearances.

Use unicode.xml from w3c/xml-entities

https://github.com/w3c/xml-entities

Probably best to keep a checkout of that repo in .cache, cd in, git pull, then use it. Alternately we could continue curling, using https://raw.githubusercontent.com/w3c/xml-entities/gh-pages/unicode.xml as the source now.

Move source linting to a separate script, and run it before compilation

html-build/build.sh

Lines 339 to 348 in 594dd34

    
           $QUIET || echo 
        
           $QUIET || echo "Linting the output..." 
        
           # show potential problems 
        
           # note - would be nice if the ones with \s+ patterns actually cross lines, but, they don't... 
        
           grep -ni 'xxx' $HTML_SOURCE/source| perl -lpe 'print "\nPossible incomplete sections:" if $. == 1' 
        
           egrep -ni '( (code|span|var)(>| data-x=)|[^<;]/(code|span|var)>)' $HTML_SOURCE/source| perl -lpe 'print "\nPossible copypasta:" if $. == 1' 
        
           grep -ni 'chosing\|approprate\|occured\|elemenst\|\bteh\b\|\blabelled\b\|\blabelling\b\|\bhte\b\|taht\|linx\b\|speciication\|attribue\|kestern\|horiontal\|\battribute\s\+attribute\b\|\bthe\s\+the\b\|\bthe\s\+there\b\|\bfor\s\+for\b\|\bor\s\+or\b\|\bany\s\+any\b\|\bbe |be\b\|\bwith\s\+with\b\|\bis\s\+is\b' $HTML_SOURCE/source| perl -lpe 'print "\nPossible typos:" if $. == 1' 
        
           perl -ne 'print "$.: $_" if (/\ban (<[^>]*>)*(?!(L\b|http|https|href|hgroup|rb|rp|rt|rtc|li|xml|svg|svgmatrix|hour|hr|xhtml|xslt|xbl|nntp|mpeg|m[ions]|mtext|merror|h[1-6]|xmlns|xpath|s|x|sgml|huang|srgb|rsa|only|option|optgroup)\b|html)[b-df-hj-np-tv-z]/i or /\b(?<![<\/;])a (?!<!--grammar-check-override-->)(<[^>]*>)*(?!&gt|one)(?:(L\b|http|https|href|hgroup|rt|rp|li|xml|svg|svgmatrix|hour|hr|xhtml|xslt|xbl|nntp|mpeg|m[ions]|mtext|merror|h[1-6]|xmlns|xpath|s|x|sgml|huang|srgb|rsa|only|option|optgroup)\b|html|[aeio])/i)' $HTML_SOURCE/source| perl -lpe 'print "\nPossible article problems:" if $. == 1' 
        
           grep -ni 'and/or' $HTML_SOURCE/source| perl -lpe 'print "\nOccurrences of making Ms2ger unhappy and/or annoyed:" if $. == 1' 
        
           grep -ni 'throw\s\+an\?\s\+<span' $HTML_SOURCE/source| perl -lpe 'print "\nException marked using <span> rather than <code>:" if $. == 1'

all operates on the source file, which I never noticed until today. It should probably run before we try to compile. Also it would be good to have it as a standalone script so I could do ./lint.sh html/source or whatever.

Mysterious ">" resource

With the latest changes a ">" resource is created that contains the following:

/dev/null: Scheme missing.
--2015-09-05 06:30:45--  https://www.w3.org/Bugs/Public/buglist.cgi?columnlist=bug_file_loc,short_desc&query_format=advanced&resolution=---&ctype=csv
Resolving www.w3.org (www.w3.org)... 128.30.52.100
Connecting to www.w3.org (www.w3.org)|128.30.52.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘w3cbugs.csv’

     0K .......... .......... .......... .......... ..........  115K
    50K .......... .......... .......... .......... ..........  274K
   100K .......... .......... .......... .......... ..........  155K
   150K .......... .......... .......... .......... ..........  182K
   200K .......... .......... .......... .......... ..........  186K
   250K .....                                                  4.72M=1.5s

2015-09-05 06:30:50 (172 KB/s) - ‘w3cbugs.csv’ saved [261414]

FINISHED --2015-09-05 06:30:50--
Total wall clock time: 4.7s
Downloaded: 1 files, 255K in 1.5s (172 KB/s)

Use a Git mirror of CLDR to get rid of Subversion dependency

I've set up https://github.com/foolip/cldr-data and a cron job using https://github.com/foolip/cldr-data-updater.

Edit: These repos have been removed, let me know if you want them back for some reason.

For me, checking out using svn takes 55 seconds, while a --depth 1 clone using git takes 32 seconds. Not amazing, but depending on how we do it, the incremental updating should be faster.

Does everyone hate git submodules? I think it'd be kind of nice to get all dependencies into Git and explicitly update dependencies, even if it's done by a roll bot.

Find and link to GitHub issues inline in the spec

We already include legacy bugs: whatwg/html#619

The simplest possible implementation would be to check for links to the spec in open issues, as the bug filing tool already includes a link. If that becomes too error-prone, we could limit it to either URLs in the first comment, or have a format like loc:https://html.spec.whatwg.org/#html-vs-xhtml:mime-type that's only for the bug scraping script.

Add a one-step build script.

It would be lovely if we could put together something like a Makefile so that there was a single command that would ensure that dependencies (like wattsi) were downloaded/updated/built, and would execute the various build commands to produce the generated documents.

Tweak commit snapshot production

We should align HTML's commit snapshots with the ones for Bikeshed-produced specs, such as https://streams.spec.whatwg.org/commit-snapshots/e75b9841572ae6153a167eea471433c55d06258e/.

Commit snapshots should have a big scary warning
Commit snapshots should say "Commit Snapshot" instead of "Living Standard" as their subtitle
Commit snapshots should have an appropriately-modified <title>
Commit snapshots should have a link back to the living spec in their header somewhere
The living standard should have a link to the current commit snapshot in its header somewhere
The commit-snapshots-shortcut-key.js file should be introduced to both the commit snapshot and the living standard

Explore creating a Docker image with wattsi and other build dependencies

It seems like having an html-build Docker image could help solve problems for some contributors.

Using wattsi-server is probably the easiest solution for most contributors, but for contributors with less-stable or less-reliable Internet connections, a better solution might be to have the ability to run the build locally—but without also needing to deal with the not-so-easy steps of needing to build fpc and wattsi from the sources.

So my limited understanding of Docker makes me believe that it may provide a good solution in this case.

Getting rid of the Perl libxml prerequisite

I think this plan is good:

Add an endpoint to watti-server, called cldr.inc, which when pinged does the svn checkout of cldr and the .cldr-processor.pl step and returns the result. This will generally be fast except the very first time.

We then add a guard in the build script (not sure on how) so that if we find out XML::Parser is not installed, we skip the cldr checkout and the .cldr-processor.pl step, instead just downloading it from the server.

@sideshowbarker, anything I'm missing? Seems like it should work.

After the build script clones, I cannot get any remote branches

If you do a clean clone and build.sh, then cd into the html subdirectory and do git branch -r, there are no remote branches. git fetch --all does not help. You can't do things like git checkout fetch to get the fetch branch.

Not really sure how to fix this. Presumably it's a result of the --depth 1? But even doing the unshallow doesn't fix it.

Don't build twice unless there actually are error messages with line numbers in the output

This seems like a nice potential savings. @sideshowbarker are you familiar enough with the various things Wattsi can output to write a regex to detect such line numbers? I recall them being in parenthesis; maybe just $\d+$?

Add built-time syntax highlighting

In whatwg/html#2751 @sideshowbarker proposes adding client-side HTML syntax highlighting. We may want to merge that sooner instead of blocking on what I propose below. But the below proposal avoids some of the issues there and has some other benefits, so we should do it eventually.

The proposal is to have @tabatkins extract his syntax highlighter from Bikeshed and then the html-build process and/or wattsi can shell out to it. The exact shape of this is TBD, see below.

Bikeshed's syntax highlighter consists of:

Pygments as the base
Support for highlighting even code that has interspersed markup, which we use a decent amount in HTML---such as <mark>, <ins>, <del>, or <a>
Web IDL syntax highlighting, as that is not a Pygments-supported language
Line numbering/highlighting (not relevant to us)

The benefits of this over the client-side solution are:

No potential startup jank for users
Consistency with other WHATWG specs (which use Bikeshed directly)
Allows interspersed markup as described above
Web IDL syntax highlighting

Also, I think we'd want to have this easily disabled during the build process, to get faster local builds. For deploys/in CI we would enable it of course.

This would probably all work best if we can shell out to a script extracted from Bikeshed. It would presumably written in Python, Bikeshed/Pygments's language. There are a few possibilities for the overall workflow:

Preprocess the spec before feeding it to wattsi; the syntax highlighter is responsible for finding all code blocks
- Probably won't work: Wattsi input source is not real HTML
Postprocess each page of the the spec after building it; the syntax highlighter is responsible for finding all code blocks
- Probably will work, although a second pass might be slow
- Might be more work for @tabatkins
Shell out each code fragment to be highlighted to the syntax highlighter tool
- Would require Wattsi integration, not html-build integration
- Would require a format for passing the data; @tabatkins prefers a [tagname, {attrs}, ...contents]-style tree instead of HTML, I believe so that he then doesn't have to include a HTML parser

After writing this, I am leaning toward (2) right now, although that didn't align with @tabatkins's thoughts in IRC (he was thinking more along the lines of (3)), so I am curious what the right approach is.

/multipage/ .htaccess

https://github.com/whatwg/html-build/blob/master/build.sh#L49 creates a .htaccess for /multipage/ which overlaps a bit with https://github.com/whatwg/html/blob/master/.htaccess. Also, seems weird to generate it like this rather than do the same as with multipage-404...

Wattsi is not re-run

I just hit a parse error and had to modify the build script to stop looking for the 65 thing to get the accurate location of the error.

I had just updated and build Wattsi, so something else seems amiss.

Add option to prime the cache (and do nothing else)

Inspired by #82 (comment).

I'd like ./build.sh --prime-cache or similar to download the w3cbugs.csv and caniuse.json, then bail. This helps the docker use case in complicated ways, but you can imagine it being useful e.g. before you get on a plane or similar.

An alternate approach: allow --no-update to be truly no-update, so that if you use that option with an empty cache, it will generate empty caniuse and w3cbugs files to use.

Error: Could not find ID telephone-state-%28type=tel%29

Error: Could not find ID telephone-state-%28type=tel%29 for annotation that uses URLs: http://caniuse.com/#feat=input-email-tel-url

@Hixie how can we fix this?

Prereqs section should mention wattsi

As an option for faster compilation, get wattsi installed, something like that.

Add option to emit version number

As noted in #83 (comment) the problem @annevk ran into with trying to determine which wattsi was getting called on his system would be easier to troubleshoot if we had a way to know what version of wattsi was getting called.

Linting fails if <dfn> content has `a`, `e`, `i` or `o` as start character

If the first character in <dfn> content is a, e, i or o then linting fails with Possible article problems error. I guess this is due to [aeio] part of this regexp. I wonder it it's intentional and, if so why it's required?

hard-coded match strings make this project language dependent

I'm trying to translate whatwg/html here: https://whatwg-cn.github.io/html/multipage/

To sync this fork (especially not-yet-translated sections), the build tools (html-build, watssi) are also used in that repo. While I find out the hard coded match strings (in .pre-process-annotate-attributes.pl, .pre-process-tag-omission.pl, and maybe others) will break the build process, for example:

<dt><span data-x=\"concept-element-attributes\">Content attributes</span>:</dt>

https://github.com/whatwg/html-build/blob/master/.pre-process-annotate-attributes.pl#L18

Now I also translated these perl source files locally. Could there be better solutions to make this tool language in-dependent? Or should I push the zh-Hans version to this project, which may require localization mechanism to be implemented.

Split `build.sh` into separately-executable steps.

Currently, build.sh does everything every time. It would be lovely if we could at least split component updates (caniuse, unicode, etc) from the actual spec generation process such that they could be executed independently. There's no reason to require network access for spec generation, and hitting the network significantly slows things down.

	$QUIET \|\| echo
	$QUIET \|\| echo "Linting the output..."
	# show potential problems
	# note - would be nice if the ones with \s+ patterns actually cross lines, but, they don't...
	grep -ni 'xxx' $HTML_SOURCE/source\| perl -lpe 'print "\nPossible incomplete sections:" if $. == 1'
	egrep -ni '( (code\|span\|var)(>\| data-x=)\|[^<;]/(code\|span\|var)>)' $HTML_SOURCE/source\| perl -lpe 'print "\nPossible copypasta:" if $. == 1'
	grep -ni 'chosing\\|approprate\\|occured\\|elemenst\\|\bteh\b\\|\blabelled\b\\|\blabelling\b\\|\bhte\b\\|taht\\|linx\b\\|speciication\\|attribue\\|kestern\\|horiontal\\|\battribute\s\+attribute\b\\|\bthe\s\+the\b\\|\bthe\s\+there\b\\|\bfor\s\+for\b\\|\bor\s\+or\b\\|\bany\s\+any\b\\|\bbe \|be\b\\|\bwith\s\+with\b\\|\bis\s\+is\b' $HTML_SOURCE/source\| perl -lpe 'print "\nPossible typos:" if $. == 1'
	perl -ne 'print "$.: $_" if (/\ban (<[^>]>)(?!(L\b\|http\|https\|href\|hgroup\|rb\|rp\|rt\|rtc\|li\|xml\|svg\|svgmatrix\|hour\|hr\|xhtml\|xslt\|xbl\|nntp\|mpeg\|m[ions]\|mtext\|merror\|h[1-6]\|xmlns\|xpath\|s\|x\|sgml\|huang\|srgb\|rsa\|only\|option\|optgroup)\b\|html)[b-df-hj-np-tv-z]/i or /\b(?<![<\/;])a (?!<!--grammar-check-override-->)(<[^>]>)(?!&gt\|one)(?:(L\b\|http\|https\|href\|hgroup\|rt\|rp\|li\|xml\|svg\|svgmatrix\|hour\|hr\|xhtml\|xslt\|xbl\|nntp\|mpeg\|m[ions]\|mtext\|merror\|h[1-6]\|xmlns\|xpath\|s\|x\|sgml\|huang\|srgb\|rsa\|only\|option\|optgroup)\b\|html\|[aeio])/i)' $HTML_SOURCE/source\| perl -lpe 'print "\nPossible article problems:" if $. == 1'
	grep -ni 'and/or' $HTML_SOURCE/source\| perl -lpe 'print "\nOccurrences of making Ms2ger unhappy and/or annoyed:" if $. == 1'
	grep -ni 'throw\s\+an\?\s\+<span' $HTML_SOURCE/source\| perl -lpe 'print "\nException marked using <span> rather than <code>:" if $. == 1'