The re-move from furkanyesiler

Da-Tacos Training subset

Where can I find the the Da-Tacos training subset?
Thanks in advance!

Some bug in evaluation metrics with null positives

We were looking at the evaluation metrics implemented here, and ran into what I think is a pretty subtle bug in the MRR calculation.

re-move/utils/metrics.py

Lines 27 to 32 in 5ddd5df

    
           _, spred = torch.topk(ypred, k, dim=1) 
        
           found = torch.gather(ytrue, 1, spred) 
        
           temp = torch.arange(k).float() * 1e-6 
        
           _, sel = torch.topk(found - temp, 1, dim=1) 
        
           mrr = torch.mean(1/(sel+1).float())

The above code identifies the position of the first positive result if one exists. However, if no positive result exists, then the topk call will just return the first position because found = [0, 0, 0, 0, ...] so found - temp = [0, -1e-6, -2e-6, -3e-6, ...]. This results in an inflated metric for queries with null ytrue sets.

Looking a bit further down in the evaluation, I noticed that this case is handled correctly(*) in meanAP:

re-move/utils/metrics.py

Lines 40 to 41 in 5ddd5df

    
           ap = torch.sum(prec*mask, 1)/(torch.sum(ytrue, 1)+eps) 
        
           ap = ap[torch.sum(ytrue, 1) > 0]

where the mean over AP scores is restricted only to those queries where sum(ytrue) > 0.

There are two caveats to this:

meanAP is potentially averaged over a different query set than the other metrics, which seems not ideal.
when return_mean=False is passed in, the returned vector of per-query AP scores is the one already conditioned on having positive results, but the evaluator has lost track of the corresponding indices. This will make it difficult to link back to the input data later on.

So I have a couple of proposed modifications:

    # Identify queries with positive results
    has_positives = torch.sum(ytrue, 1) > 0

    _, spred = torch.topk(ypred, k, dim=1)
    found = torch.gather(ytrue, 1, spred)

    temp = torch.arange(k).float() * 1e-6
    _, sel = torch.topk(found - temp, 1, dim=1)

    # Knock out queries with no positives
    sel = sel.float()
    sel[~has_positives] = torch.nan

    mrr = torch.nanmean(1/(sel+1).float())
    mr = torch.nanmean((sel+1).float())
    top1 = torch.sum(found[:, 0])
    top10 = torch.sum(found[:, :10])

    pos = torch.arange(1, spred.size(1)+1).unsqueeze(0).to(ypred.device)
    prec = torch.cumsum(found, 1)/pos.float()
    mask = (found > 0).float()
    ap = torch.sum(prec*mask, 1)/(torch.sum(ytrue, 1)+eps)
    ap[~has_positives] = torch.nan

    if print_metrics:
        print('mAP: {:.3f}'.format(ap.nanmean().item()))
        print('MRR: {:.3f}'.format(mrr.item()))
        print('MR: {:.3f}'.format(mr.item()))
        print('Top1: {:.0f}'.format(top1.item()))
        print('Top10: {:.0f}'.format(top10.item()))
    return ap.nanmean() if reduce_mean else ap

Quick summary:

replace means by nanmeans
populate nans for MR and MRR results on queries with null result sets

As an alternative / suggestion: you might consider accepting a fill_nan= parameter here, which could replace nans by zeros. The logic here being that 0 could be a reasonable limiting value for MRR and meanAP (not MR though) on null result queries, and it could be reasonable to include them in some situations.

Update of the readme

The readme states that it will be updated soon. Are these updates ready?

furkanyesiler / re-move Goto Github PK

re-move's People

Contributors

Stargazers

Watchers

Forkers

re-move's Issues

Da-Tacos Training subset

Some bug in evaluation metrics with null positives

Update of the readme

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	_, spred = torch.topk(ypred, k, dim=1)
	found = torch.gather(ytrue, 1, spred)

	temp = torch.arange(k).float() * 1e-6
	_, sel = torch.topk(found - temp, 1, dim=1)
	mrr = torch.mean(1/(sel+1).float())

	ap = torch.sum(prec*mask, 1)/(torch.sum(ytrue, 1)+eps)
	ap = ap[torch.sum(ytrue, 1) > 0]