Comments (7)
Fixed as far as I can test (haven't run your test code though). Feel free to verify.
from redisearch.
Fix confirmed! Thanks!
from redisearch.
Going over this, it looks like working with 32 bit runes will not be too hard to do, and will allow full fuzzy support in unicode. but it will make memory consumption terrible. Is having unicode supported fuzzy matching critical for you?
from redisearch.
It's essential, yes. How much would the memory requirements increase?
from redisearch.
The idea is not to use variable length encoding like UTF-8 and UTF-16, and use a fixed length encoding, probably 32 bit - per letter.
So for pure latin text, which in utf-8 is represented by 1 byte, you would have to use 4 bytes. Some of the memory consumption is of course pointers and metadata, so we are talking about x2-x3 the amount of RAM.
For purely non-latin, which takes up 2-3 bytes per letter in utf-8, it would probably be x1.5-x2.
If I manage to get away with 16 bit per letter, which won't cover all languages but will cover the most popular ones IIRC, it won't be so bad.
from redisearch.
Yeah, 16 bits should cover all the languages we are aiming at so that's okay.
from redisearch.
yeah, range 0x0000-0xFFFF covers everything a sane person would need. https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane
Although Unicode and sanity don't go hand in hand, in this case it might be a good compromise.
from redisearch.
Related Issues (20)
- [BUG] Is There a offset limit when using `limit offset num` HOT 3
- [BUG] Redis Search silently fails to Sort when the index schema is too large HOT 6
- [BUG] ft.aggregate slowdown with high frequency updates HOT 3
- [BUG] Wildcard redisearch on TEXT field does not return result HOT 2
- [BUG] Unable to do full-text exact search with a colon in the text HOT 2
- [BUG] simple ft.create/ft.search with <100 bytes of data is leaking 1300 bytes of memory. HOT 3
- [BUG] FT.AGGREGATE performance problem HOT 6
- [BUG] Order of precedence not honored in APPLY functions with exponents HOT 1
- [Feature Request] Add FT.ALIASGET command
- L2 distance computation misunderstanding in documentation HOT 4
- [BUG] I can't run "make build" command successfully HOT 3
- Document Distributed Search (RSCoordinator) build/installation HOT 7
- Boost File Error when building 2.8.13 with Bullseye HOT 1
- [BUG] RediSearch HNSW indexing deadlock? HOT 2
- Facing build issue on PPC64LE architecture. HOT 3
- [BUG] APPLY substr function not using -1 count as documented - [MOD-6959]
- the results obtained after indexing are incomplete HOT 29
- Please Help Fix RSCoordinator So that Redis Search (RediSearch) Can Be Used Across Redis Cluster - module-oss.so initialization failed HOT 17
- Configuration of Custom Tokenizer HOT 1
- [BUG] Redis freezes and stops responding with 100% CPU Utilization while using redissearch with HNSW vector indexes HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from redisearch.