Code Monkey home page Code Monkey logo

document-retrieval-system's Introduction

Document retrieval system

Project description

This exercise is about developing a document retrieval system to return titles of scientific papers containing the answer to a given user question. You will use the first version of the COVID-19 Open Research Dataset (CORD-19) in your work (articles in the folder comm use subset).

For example, for the question “What are the coronaviruses?”, your system can return the paper title “Distinct Roles for Sialoside and Protein Receptors in Coronavirus Infection” since this paper contains the answer to the asked question.

To achieve the goal of this exercise, you will need first to read the paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, in order to understand how you can create sentence embeddings. In the related work of this paper, you will also find other approaches for developing your model. For example, you can using Glove embeddings, etc. In this link, you can find the extended versions of this dataset to test your model, if you want. You are required to:

  1. Preprocess the provided dataset. You will decide which data of each paper is useful to your model in order to create the appropriate embeddings. You need to explain your decisions.
  2. Implement at least 2 different sentence embedding approaches (see the related work of the Sentence-BERT paper), in order for your model to retrieve the titles of the papers related to a given question.
  3. Compare your 2 models based on at least 2 different criteria of your choice. Explain why you selected these criteria, your implementation choices, and the results. Some questions you can pose are included here. You will need to provide the extra questions you posed to your model and the results of all the questions as well.

document-retrieval-system's People

Contributors

myrto-iglezou avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.