First of all, I'd like to state that I think Alan should stick to use single-char encoded strings internally (ISO-8859-1 or Mac), for there are no valid reasons to make Alan Unicode aware — the Latin1 charset is more than sufficient to cover needs for most adventures in any western language, and the exceptions are not worth the change and the overhead that Unicode support would introduce in terms of memory and performance.
Having said that, I think that Alan should accept UTF-8 source code, and ARun should handle I/O text streams in UTF-8 by default. Here follows my rationale for this.
Basically, the Alan compiler should be able to read UTF-8 source files, and transcode them to ISO-8859-1 before parsing them. This shouldn't be a huge addition, for ALAN sources are still expected to contain only characters from the ISO range (although comments could contain Unicode, which is cool and wouldn't affect compilation). Basically, any character above 127 should be be decoded back to a single char, which is rather quick operation to do on any input stream — I've seen code examples for this in various languages, and they rarely exceeded 6 lines of code.
Also, ARun should be able to handle I/O streams in UTF-8 too, which would allow to it accept commands scripts in UTF-8, as well as generating transcripts in UTF-8 (which would simplify interactions with other tools).
As for Glk based terps, only the commands scripts and transcripts would have to deal with UTF-8, for the input player would be handled by the Glk interface there.
Whether UTF-8 streams should be the default or optional, ISO and Mac could still be supported (by a CLI option, or by default), but I'm sure that most users would choose UTF-8, especially when working on their favourite editor or IDE. (A similar faith can be seen in TADS3, one of the first IF tools to introduce an push UTF-8, which initially had UTF-8 as an option, but today is mostly used in UTF-8 only).
Editors Support
Most modern code editors assume UTF-8 as the default encoding, and many don't support well ISO-8859-1 — and those which do, they usually don't offer good protection against encoding break-up with paste operations, where pasting from the clipboard often leads to Unicode chars introduction in the document, automatically converting it to UTF-8.
When I was initially working on the Italian Library translation, I've faced frequent code corruptions during editing, until I created an Alan dedicated package for Sublime Text to enforce ISO-8859-1 (and had to open a feature request for it too, to prevent UTF-8 pasting from corrupting the source files). So I'm well aware how this can easily become a disruptive and frustrating issue.
Maybe those who work with English only don't notice this, but anyone writing in a language with accented letters and/or diacritics is soon going to bump into many issues.
LSP Support
The Language Server Protocol is proving a successful idea, and is being adopted everyday by more editors and languages as the protocol of choice for syntax highlighting, linting and even code refactoring.
The problem is that LSP requires all JSON-RPC messages to be sent in UTF-8, and LSP plug-ins working on other encodings are starting to report lot's of bugs and problems, especially in relation to text ranges and positions (which result in wrong coordinates).
See microsoft/language-server-protocol#376 for a long discussion on this.
If ALAN could handle UTF-8 source files, it would open itself to a world of possibilities via LSP — writing a language server for ALAN wouldn't be all that difficult, its syntax is simple enough to allow writing an error tolerant parser to provide syntax highlighting. From there, we could have a binary Language Server for ALAN that could work with any editor and IDE supporting LSP, which would allow to focus all energies on a single package for all editors — and eventually even support the new LSP features for code refactoring.
LSP is a growing standard, with new feature being added in the course of time. The future is clearly heading in that direction, and even if LSP were to be replaced by another protocol in the future, most features will remain similar.
Tools Support
The above also applies to many tools, especially tools related to version control — Git doesn't offer any specific settings for ISO encodings, and most diffing tools also expect UTF-8 text streams today.
In some cases, pipelines could actually corrupt ISO sources.
Toolchains Support
Just take as an example Asciidoctor, which we are using for the Alan documentation, and all the problems we're facing due to ISO-8859-1 encoding in Alan sources and game transcripts.
EDIT: Now Asciidoctor supports use of include::
with ISO files! — Because Asciidoctor doesn't support use of include::
with ISO files (cf. asciidoctor/asciidoctor#3248), the documentation toolchains for the various libraries need to first convert all Alan sources, commands scripts and generated transcript to UTF-8, so that they might be usable in the documentation.
Similar issues are going to be everyday more common, especially with modern tools, for usage of any encodings beside UTF-8 is strongly discourages nowadays (not only in text files, but also for internal string representations):