Code Monkey home page Code Monkey logo

homoglyphs's Introduction

homoglyphs's People

Contributors

ariutta avatar inokenty90 avatar jordiae avatar orsinium avatar porfanid avatar tapplencourt avatar typerslow avatar vadimych avatar yamatt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

homoglyphs's Issues

PyPI page still shows readme of old version

Hi @yamatt ,

Happy new year!
Thanks for maintaining this fork.

However it seems the PyPI page on https://pypi.org/project/homoglyphs_fork/ still shows the readme of the old original version as the project description, not loading your README.md contents specified by the pyproject.toml

Not sure quite what's causing that, but maybe the setup.py needs to be updated ?
I'm not familiar with pyproject.toml specification yet - it seems something is causing PyPI to load the description from the readme of the original version, even though the other pieces are correct.

p.s. do you think it would make sense to make this project an org, move it to homoglyphs/homoglyphs ?
That way there could be several maintainers added to the project.

There is also a forker at AnatolyTimakov@a70aa9f that added some content.

Thanks,

Some Latin characters cause to_ascii to return an empty result.

It's my understanding that STRATEGY_IGNORE should "add characters to result", which to me sounds like it should retain the character in the output if it isn't matched.

However, I cannot seem to retain my complete original input

import homoglyphs_fork as hgf
hg = hgf.Homoglyphs(strategy=hgf.STRATEGY_IGNORE)

'ß' in hgf.Categories.get_alphabet(['LATIN'])
>>> True

hg.to_ascii('ß')
>>> []

This is an issue because there are characters that, while not true homoglyphs, can still be used as them. Consider the German eszett, ß, which is a common stand-in for 'B' online.

Because of this, I'm unable to properly detect (as an example) the string 'Сaptchaß𝗈t' -- Cyrillic ES (homoglyph of latin C), German Eszett (leet-speak for latin B), and Mathematical o (normalized to latin o). The best I've been able to achieve is Captchaot with strategy LOAD and ascii_strategy REMOVE.

Is there a way to have homoglyphs simply pass-through any character that isn't matched?

Character \x00 in to_ascii() raises an exception

import homoglyphs as hg
hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD).to_ascii('\x00')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "homoglyphs/core.py", line 240, in to_ascii
    return self.uniq_and_sort(self._to_ascii(text))
  File "homoglyphs/core.py", line 169, in uniq_and_sort
    result = list(set(data))
  File "homoglyphs/core.py", line 235, in _to_ascii
    for variant in self._get_combinations(text, ascii=True):
  File "homoglyphs/core.py", line 218, in _get_combinations
    alt_chars = self._get_char_variants(char)
  File "homoglyphs/core.py", line 195, in _get_char_variants
    if not self._update_alphabet(char):
  File "homoglyphs/core.py", line 182, in _update_alphabet
    category = Categories.detect(char)
  File "homoglyphs/core.py", line 66, in detect
    category = unicodedata.name(char).split()[0]
ValueError: no such name

I guess it should rather return [].

(BTW, is this fork still maintained?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.