Code Monkey home page Code Monkey logo

genetic-ranker's Introduction

Genetic Ranker

Genetic Ranker is a framework using power of genetic algorithms to find optimal search weights for Elasticsearch or Solr queries.

The reason

Finding optimal search weights for fields is long-lasting and ungrateful process. Imagine an index with millions of documents, every with tens of fields, and hundred of queries that should be tested to ensure how good actual configuration is. Your job is to determine how important are particular fields. It can take a lifetime.

I wrote a post about that on my blog, take a look if you need more information about this process.

Why genetic algorithm?

Firstly, it is easy to define problem as set of numeric weights that can be altered during processing. Mutation, crossover & reproduction parts of the algorithm can easily be done on numbers.

Secondly, genetic algorithms are able to preserve solutions that are promising, even without actual understanding of a problem. Better species (individuals) will be kept and be an ancestors for even better ones.

Thirdly, due to mutation & crossover parts, algorithm adds a bit of randomness to the process. It may help to find the best answer even if actual is good.

More on that, again, here.

How to run

To start GeneticRanker you need to run a ranker.py file. Project contains all data required to demonstrate simple use case.

Basic configuration

In ranker.py you should define how many fields you want to use (gene_size in Ranker). Then specify those fields in Evaluator class (fields).

In queries.csv you need to specify use cases you want to test. The format is:

[query],[document_id]:[expected_position]:[good_enough_position]

In properties.ini you can define which search engine you want to use: Elasticsearch or Solr. Queries should be defined in essearcher.py and solrsearcher.py respectively.

Test it yourself

Imagine you are a search engineer who have to find optimal weights for fields upon which the queries are run. The only things you have is file queries.csv, containing queries, and documents in the index. Before running GeneticRanker take a while to read these csv file and analyze documents from queries-es.txt.

There are only 8 documents and 10 queries but I guarantee that you can spend a while on this task.

In data directory there are files containing documents. You can use any Elasticsearch and Solr ways to index them, but for the former I used Postman tool and for the later - Solr admin panel.

Having the server up and running should be enough to run a script and see the results.

Used Python libs

deap
elasticsearch
pysolr

You should check their licenses and decide whether you want use them in your software. Genetic algorithm is implemented using deap framework.

Closing notes

Every use case may need alteration of genetic algorithm values or algorithm itself, but proposed configuration should be a good starting point. Happy using!

genetic-ranker's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.