Code Monkey home page Code Monkey logo

msmarco-search-engine's Introduction

MSMARCO-Search-Engine

Project developed for the MIRVC course of the Master of Artificial Intelligence and Data Engineering at the University of Pisa.

This project consists in the design and implementation of a Search Engine for MSMARCO dataset. Checkout the assignment and the report for all the information about the project.

To run this project you need to download in the main folder the MSMARCO dataset.

Compiling on Windows

  1. Import the project in Visual Studio/Visual Studio Code

  2. Build the project using Cmake

  3. Execute app.exe

Compiling on UNIX

  1. Install the required software
$ sudo apt-get install git cmake build-essential zlib1g-dev libboost-all-dev
  1. Download the source code
$ git clone --recursive https://github.com/edoardoruffoli/MSMARCO-Search-Engine
  1. Generate the build files
$ cd MSMARCO-Search-Engine
$ mkdir build && cd build
$ cmake ..
  1. Build
$ make
  1. Run
$ cd bin
$ ./app

Running

*** Started MSMARCO Search Engine ***
Available commands:
  help - display a list of commands
  query - perform a query
  eval - execute a queries dataset, saving the result file for trec_eval
  index - create the inverted index
  exit - exit the program

Enter a command:
>query
Enter the query execution mode:
    0 : CONJUNCTIVE_MODE
    1 : DISJUNCTIVE_MODE
    2 : DISJUNCTIVE_MODE_MAX_SCORE

>2
Select how many documents return:
>10
Enter the query:

>manhattan project

Results for: "manhattan project"
The elapsed time was 15 milliseconds, 15293700 nanoseconds.

RESULTS:
Doc Id  Score
2036644 4.31715
3870080 4.30079
2       4.29498
3615618 4.28213
2395250 4.27013
4404039 4.25136
3607205 4.23599
7243450 4.20026
3689999 4.1146
3870082 4.09159

Repository

The repository is organized as follows:

  • apps/ contains the main of the programs
  • docs/ contains the project report and the assignment
  • evaluation/ contains the dataset used to evaluate the search engine with trec_eval
  • include/ contains the header files
  • src/ contains the source files
  • tests/ contains the unit tests
  • thirdparty/ contains the thirdparty dependencies

Contributors

msmarco-search-engine's People

Contributors

edoardoruffoli avatar mrfransis avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.