Code Monkey home page Code Monkey logo

coreseek-change's Introduction

关于coreseek中的权重并没有按照BM25的算法来计算,导致了很多的小问题,在使用短查询的时候会出现如下的情况,比如数据库中存在,qq,qq音乐,当我们查询qq的时候,并没有办法区分两者的权重。而根据我们自己的查询习惯,这个时候qq的权重应该比qq音乐的权重大,排在前面一位。


 
从BM25的公式可以看到,通过使用不同的语素分析方法、语素权重判定方法,以及语素与文档的相关性判定方法,我们可以衍生出不同的搜索相关性得分计算方法,这就为我们设计算法提供了较大的灵活性。
这里有一个简单的要素,就是这个标题越长的话,就是字段越长,权重就应该越小,也就符合我们的需求,但bm25的公式太复杂了,我们可以变相的优化一下,不必完全一样。
    
通过修改了coreseek中的算法,我们可以进行优化,优化的效果如下。
数据库里的数据:

1	qq音乐	据国外媒体报道	2010-04-01 22:20:07	1	2
2	qq	4月1日消息,据国外媒体报道,	2010-04-01 12:01:00	2	3

 
对mysql做索引,测试数据如下:
-> search phase 0 <-

Query 'qq' retrieved 2 of 2 matches in 0.113 sec.
Query stats:
        'qq' found 2 times in 2 documents

Matches:
1. doc_id=2, weight=499999, group_id=3, date_added=1270135548
2. doc_id=1, weight=499998, group_id=2, date_added=1270131607
 
可以看到doc_id=2的权重比doc_id=1大。

coreseek-change's People

Contributors

nd791899 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.