Reproduction
When converting some strings to greek, the output string can be wrong even if it looks good:
toGreek('huḯdion', keyType.TRANSLITERATION);
// Expected: "ὑΐδιον"; Received: "ὑΐδιον"
Fact
The fact is that the accented iota received a tonos
rather than an oxia
(tonos & oxia are, visually, some kinds of acute accent specific to greek).
In the example, we got ΐ (U+0390)
Greek Small Letter Iota with Dialytika and Tonos rather than ΐ (U+1FD3)
Greek Small Letter Iota with Dialytika and Oxia.
Even if the two diacritics look the same - in most fonts -, the Unicode norm has chosen to separate these diacritics. So, the tonos is intended for the 'modern' monotonic greek & the oxia for the 'ancient' polytonic greek.
Resolution
It seems that an oxia is replaced by a tonos when the Unicode normalization is applied (e. g. str.normalize('NFC')
). This is due to the poor canonical equivalences defined by the Unicode norm.
If some other characters can be replaced during the conversion process using utils/normalizeGreek()
(e.g. tilde to Combining Greek Perispomeni), this one seems to be more tricky: it's a diacritic, so if we want to process it individually we need to use NFD
mode; but as soon as we come back to NFC
, the conversion will be broken. If we want to process the tonos in NFC
mode, then we must define all the combinations of diacritics that can be made for all the characters that can take a tonos on them.
Before solving this issue, it might be wise to weigh the pros and cons of trying to force the usage of the oxia rather than the tonos.