Code Monkey home page Code Monkey logo

Comments (7)

dvirsky avatar dvirsky commented on May 19, 2024

Fixed as far as I can test (haven't run your test code though). Feel free to verify.

from redisearch.

mannol avatar mannol commented on May 19, 2024

Fix confirmed! Thanks!

from redisearch.

dvirsky avatar dvirsky commented on May 19, 2024

Going over this, it looks like working with 32 bit runes will not be too hard to do, and will allow full fuzzy support in unicode. but it will make memory consumption terrible. Is having unicode supported fuzzy matching critical for you?

from redisearch.

mannol avatar mannol commented on May 19, 2024

It's essential, yes. How much would the memory requirements increase?

from redisearch.

dvirsky avatar dvirsky commented on May 19, 2024

The idea is not to use variable length encoding like UTF-8 and UTF-16, and use a fixed length encoding, probably 32 bit - per letter.

So for pure latin text, which in utf-8 is represented by 1 byte, you would have to use 4 bytes. Some of the memory consumption is of course pointers and metadata, so we are talking about x2-x3 the amount of RAM.

For purely non-latin, which takes up 2-3 bytes per letter in utf-8, it would probably be x1.5-x2.

If I manage to get away with 16 bit per letter, which won't cover all languages but will cover the most popular ones IIRC, it won't be so bad.

from redisearch.

mannol avatar mannol commented on May 19, 2024

Yeah, 16 bits should cover all the languages we are aiming at so that's okay.

from redisearch.

dvirsky avatar dvirsky commented on May 19, 2024

yeah, range 0x0000-0xFFFF covers everything a sane person would need. https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane

Although Unicode and sanity don't go hand in hand, in this case it might be a good compromise.

from redisearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.