The sorting of words on the complete word list page is not applicable to Na'vi (in particular, it mixes a and ä, etc.)
There are several reasonable ways to do sorting for Na'vi words. Most importantly, one could argue that ll / rr / diphthongs count as distinct letters that should be sorted separately. In my opinion, this is not very useful if someone wants to search for a word manually (how are they supposed to know if awaiei is a-wa-i-e-i or aw-a-i-e-i?) so I'll just sort these in the English sort order.
If someone searches for tekreìl then Reykunyu should still respond with tekre + -l but add a warning. Right now it doesn't find anything which is unhelpful.
Right now multiword phrases don't have a pronunciation defined because syllables are separated by dashes, not spaces. Also, we cannot designate more than one syllable as stressed. This should be changed.
Also, this would remove the need to special-case si-verbs.
then we can also send these to the client side for displaying the conjugated forms table
Parse attributive forms of adjectives.
Parse nouns that are productively created from verbs (tulyu, tìtusaron).
Implement external lenition. This should be simple in principle. But there are many special cases:
doubly-lenited words (we shouldn't parse hitx as (ay-)kitx with external lenition, but on the other hand, we can parse saykitx as tsay-kitx with external lenition;
words with a non-lenitable initial consonant... naively parsing them would return in two results for every such word (one with and one without external lenition);
similarly when parsing kitx we don't want to output kxitx twice; instead we want to output it once with two possible derivations;
many external lenitions are extremely unlikely to actually occur and can detract from the way-more-likely other result (heyn could be from keyn with external lenition, but it is way more likely just from heyn); we should probably have a scoring system and sort the results afterwards (or alternatively, just put the external lenitions always at the end);
if we are in sentence search mode, we should check if there is a leniting preposition before the word, to see if external lenition is even applicable (and in word search mode, we should clearly mark external lenitions).
The current format for the source field is either:
a single string (legacy from the original import), or
an array with three elements: source name, source URL and date (yyyy-mm-dd)
To support multiple sources, we should migrate to a new format that is an array of three-element arrays as above. Actually I would also want to add an optional fourth element to the array, namely for remarks.
Things to do here:
Migrate words.json:
for all "single string" source fields "...", change them into [["..."]]
for all three-element array source fields ["...", "...", "..."], change them into [["...", "...", "..."]]
Add support to the web frontend
Add support to the Discord bot
Add support to the editor for reading and writing this format (including the remarks field)
Add support to the editor for actually adding more than one source
Reykunyu currently implements vtrm. verbs as two separate verbs vtr. and vm. This turns out to be confusing for users. Implement the merged types and then merge these in the database.
It is annoying that users have to switch between search directions manually. The default should be some combination mode. Or alternatively, auto-switch the mode when the current mode gives no results. (Is this intuitive, however?)
Reykunyu forgets the query's capitalization and punctuation. It would be good to maintain this.
Unfortunately, this is one of these "it's buried deep in two-year-old code and changing this will probably break ten other things" features. But it would be useful to have for the new corpus feature.
Argh, because sko is a leniting adposition, Reykunyu wouldn't understand a sentence like sko tsìk sunu oeru, because it tries to unlenite tsìk which doesn't yield any results, obviously.
Some words have multiple allowed pronunciations, for example tsa-kem / tsa-kem. Make the pronunciation field an array. Include support for marking a pronunciation as colloquial.
When searching for apxa Reykunyu happily gives the same result thrice because it tries to be helpful by suggesting that it can be apxa, a-apxa and apxa-a. Unfortunately this is not helpful at all in practice, so we should filter these out.
This already kind of works but many things still need to be fixed:
The Discord bot doesn't show multiple definitions yet.
English → Na'vi search doesn't show in more than one definition.
Translations are annoying, because many translations may not have the split-up definitions yet. In that case Reykunyu shouldn't show the same definition multiple times for these languages.
Txo fko fwivew lì'ut alu meylltxep, tsakrr Reykunyul ke run kea tì'eyngit. Run lu fwa fpìl Reykunyul futa fìlì'uä famrelvi lu m4LTep tup meyLTep. Tsaw kezemplltxe längu tìngäzìk a zeyko fyin ke lu...
Tsun oe tsive'a mefya'o a zeyko:
fmi frapamrelfya a tsunslu (slä ke fkan lor oeru fìfya'o)
ke sar pamrelfyat apup, ki nìyey sar pamrelfyat letrrtrr (slä tsaw lu ep'ang nì'ul)