Code Monkey home page Code Monkey logo

msmarco-document-ranking's Introduction

msmarco-document-ranking's People

Contributors

bmitra-msft avatar microsoftopensource avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

msmarco-document-ranking's Issues

how to get train qrels dataset?

Hey,
I am confused about the meaning of "offset_trec" and "offset_tsv" in corpus msmarco-docs-lookup.tsv , how to use this corpus to get a train dataset having format like "qid \t docid \t label"?

Understanding of dataset structure

Hello,
I had hard time to understand dataset structure and I was not able to find it anywhere.

I guess that:

  • fulldocs.tsv are urls and documents and there are 3213835 lines (so each line is one document)
  • docleaderboard-queries.tsv are ids and queries.
  • docleaderboard-top100.tsv are query to document relevancy dataset.
    Structure:
355339 Q0 D2612180 1 -5.21639 IndriQueryLikelihood

355339 Q0 D2906504 2 -5.22419 IndriQueryLikelihood

355339 Q0 D2076296 3 -5.39244 IndriQueryLikelihood

355339 Q0 D894964 4 -5.46692 IndriQueryLikelihood

355339 Q0 D260320 5 -5.47119 IndriQueryLikelihood

I assume first column is question ID, the second I don't know, the third possibly document id and next row is relevancy from 1-10, rest I don't care.
If DXXXXXX is document ID and as long as fulldocs.tsv has no ID in row, then I assume it is row number.
But if this is true, maximum DXXXXX number from docleaderboard-top100.tsv is D3563531 which is more than lines in fulldocs.tsv.

I am lost. Can you please help me. Is it possible that fulldocs.tsv is missing some docs ?

Thank you

Broken download links

Hi, I am trying to download the ms-marco dataset from here.
However, it doesn't matter what version I want, I get an error: 'ResourceNotFound'.

Is there a mirror link to the dataset's file?

Test queries and records count

  1. It seems like test queries point to TREC DL 19 queries? Is this correct and if so I feel like the records count needs to be updated to 200?
  2. I feel like the records count for the dev set needs to be 5,193?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.