Code Monkey home page Code Monkey logo

search-engine's Introduction

Simple Search Engine Project

In this project, we want to create a search engine for retrieving text documents in such a way that the user enters his query and the system displays the related documents. For a more complete description of the project in Farsi, refer to Project explanation.

Phase 1

In this phase of the project, in order to create a simple information retrieval model, it is necessary to index the documents so that when the query is received, the positional index can be used to retrieve related documents. In short, the things to be done in this phase are as follows:

  • Data preprocessing
  • Creating a positional index
  • Answering the user's queries

The jupyter notebook file is located at Phase 1

Phase 2

At this stage, we want to expand the information retrieval model and represent the documents in vector form so that we can rank the search results based on their relationship with the user's query. In this way, a numerical vector is extracted for each document, which is the representation of that document in the vector space, and these vectors are stored. At the time of receiving a query, first the vector corresponding to that query is created in the same vector space and then using a suitable similarity criterion, the similarity of the numerical vector of the query with the vector of all documents in the vector space is calculated and finally the output results are sorted based on the amount similarity.
To increase the response speed of the information retrieval model, various methods can be used, which are described in details in project explanation file.

The jupyter notebook file is located at Phase 2

Notes

  • We use two important libraries in this porject. you can click on them for more information:
  • If you want to read more about the topics discussed in this course, you can refer to Course slides.

search-engine's People

Contributors

mahdi-rahmani avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.