liuying / coreseek-change Goto Github PK
View Code? Open in Web Editor NEWThis project forked from nd791899/coreseek-change
change the sphinx weight
This project forked from nd791899/coreseek-change
change the sphinx weight
关于coreseek中的权重并没有按照BM25的算法来计算,导致了很多的小问题,在使用短查询的时候会出现如下的情况,比如数据库中存在,qq,qq音乐,当我们查询qq的时候,并没有办法区分两者的权重。而根据我们自己的查询习惯,这个时候qq的权重应该比qq音乐的权重大,排在前面一位。 从BM25的公式可以看到,通过使用不同的语素分析方法、语素权重判定方法,以及语素与文档的相关性判定方法,我们可以衍生出不同的搜索相关性得分计算方法,这就为我们设计算法提供了较大的灵活性。 这里有一个简单的要素,就是这个标题越长的话,就是字段越长,权重就应该越小,也就符合我们的需求,但bm25的公式太复杂了,我们可以变相的优化一下,不必完全一样。 通过修改了coreseek中的算法,我们可以进行优化,优化的效果如下。 数据库里的数据: 1 qq音乐 据国外媒体报道 2010-04-01 22:20:07 1 2 2 qq 4月1日消息,据国外媒体报道, 2010-04-01 12:01:00 2 3 对mysql做索引,测试数据如下: -> search phase 0 <- Query 'qq' retrieved 2 of 2 matches in 0.113 sec. Query stats: 'qq' found 2 times in 2 documents Matches: 1. doc_id=2, weight=499999, group_id=3, date_added=1270135548 2. doc_id=1, weight=499998, group_id=2, date_added=1270131607 可以看到doc_id=2的权重比doc_id=1大。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.