Hi there. I'm Matt. You can find more info about me on my website ❤️
Views are my own and not of my employer.
This project forked from life4/homoglyphs
Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.
License: MIT License
Hi there. I'm Matt. You can find more info about me on my website ❤️
Views are my own and not of my employer.
Hi @yamatt ,
Happy new year!
Thanks for maintaining this fork.
However it seems the PyPI page on https://pypi.org/project/homoglyphs_fork/ still shows the readme of the old original version as the project description, not loading your README.md contents specified by the pyproject.toml
Not sure quite what's causing that, but maybe the setup.py needs to be updated ?
I'm not familiar with pyproject.toml specification yet - it seems something is causing PyPI to load the description from the readme of the original version, even though the other pieces are correct.
p.s. do you think it would make sense to make this project an org, move it to homoglyphs/homoglyphs ?
That way there could be several maintainers added to the project.
There is also a forker at AnatolyTimakov@a70aa9f that added some content.
Thanks,
It's my understanding that STRATEGY_IGNORE should "add characters to result", which to me sounds like it should retain the character in the output if it isn't matched.
However, I cannot seem to retain my complete original input
import homoglyphs_fork as hgf
hg = hgf.Homoglyphs(strategy=hgf.STRATEGY_IGNORE)
'ß' in hgf.Categories.get_alphabet(['LATIN'])
>>> True
hg.to_ascii('ß')
>>> []
This is an issue because there are characters that, while not true homoglyphs, can still be used as them. Consider the German eszett, ß
, which is a common stand-in for 'B' online.
Because of this, I'm unable to properly detect (as an example) the string 'Сaptchaß𝗈t' -- Cyrillic ES (homoglyph of latin C
), German Eszett (leet-speak for latin B
), and Mathematical o (normalized to latin o
). The best I've been able to achieve is Captchaot
with strategy LOAD and ascii_strategy REMOVE.
Is there a way to have homoglyphs simply pass-through any character that isn't matched?
import homoglyphs as hg
hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD).to_ascii('\x00')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "homoglyphs/core.py", line 240, in to_ascii
return self.uniq_and_sort(self._to_ascii(text))
File "homoglyphs/core.py", line 169, in uniq_and_sort
result = list(set(data))
File "homoglyphs/core.py", line 235, in _to_ascii
for variant in self._get_combinations(text, ascii=True):
File "homoglyphs/core.py", line 218, in _get_combinations
alt_chars = self._get_char_variants(char)
File "homoglyphs/core.py", line 195, in _get_char_variants
if not self._update_alphabet(char):
File "homoglyphs/core.py", line 182, in _update_alphabet
category = Categories.detect(char)
File "homoglyphs/core.py", line 66, in detect
category = unicodedata.name(char).split()[0]
ValueError: no such name
I guess it should rather return []
.
(BTW, is this fork still maintained?)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.