Light

microsoft / msmarco-document-ranking Goto Github PK

MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage/document ranking

License: Creative Commons Attribution 4.0 International

Python 100.00%

msmarco-document-ranking's Introduction

To participate in the MS MARCO Document Ranking leaderboard, please go here: https://microsoft.github.io/msmarco/Submission.

msmarco-document-ranking's People

Contributors

Stargazers

Watchers

Forkers

spacemanidol mohan-zhang-u 134130u joskid lipanpanpanpan anujay-ds standardgalactic moataztaz luyug der-ofenmeister test-mass-forker-org-1 03tarunthakur09 techthiyanes

msmarco-document-ranking's Issues

how to get train qrels dataset?

Hey,
I am confused about the meaning of "offset_trec" and "offset_tsv" in corpus msmarco-docs-lookup.tsv , how to use this corpus to get a train dataset having format like "qid \t docid \t label"?

Understanding of dataset structure

Hello,
I had hard time to understand dataset structure and I was not able to find it anywhere.

I guess that:

fulldocs.tsv are urls and documents and there are 3213835 lines (so each line is one document)
docleaderboard-queries.tsv are ids and queries.
docleaderboard-top100.tsv are query to document relevancy dataset.
Structure:

355339 Q0 D2612180 1 -5.21639 IndriQueryLikelihood

355339 Q0 D2906504 2 -5.22419 IndriQueryLikelihood

355339 Q0 D2076296 3 -5.39244 IndriQueryLikelihood

355339 Q0 D894964 4 -5.46692 IndriQueryLikelihood

355339 Q0 D260320 5 -5.47119 IndriQueryLikelihood

I assume first column is question ID, the second I don't know, the third possibly document id and next row is relevancy from 1-10, rest I don't care.
If DXXXXXX is document ID and as long as fulldocs.tsv has no ID in row, then I assume it is row number.
But if this is true, maximum DXXXXX number from docleaderboard-top100.tsv is D3563531 which is more than lines in fulldocs.tsv.

I am lost. Can you please help me. Is it possible that fulldocs.tsv is missing some docs ?

Thank you

It seems like test queries point to TREC DL 19 queries? Is this correct and if so I feel like the records count needs to be updated to 200?
I feel like the records count for the dev set needs to be 5,193?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

microsoft / msmarco-document-ranking Goto Github PK

msmarco-document-ranking's Introduction

msmarco-document-ranking's People

Contributors

Stargazers

Watchers

Forkers

msmarco-document-ranking's Issues

how to get train qrels dataset?

Understanding of dataset structure

Broken download links

How long will the learnboard update?

Test queries and records count

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent