Code Monkey home page Code Monkey logo

sed_eval's People

Contributors

toni-heittola avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sed_eval's Issues

EventBasedMetrics documentation is misleading

The documentation for EventBasedMetrics is misleading.

It says:

     

t_collar : float (0,]

    Time collar used when evaluating validity of the onset and offset, in seconds. Default value 0.2

percentage_of_length : float in [0, 1]

    Second condition, percentage of the length within which the estimated offset has to be in order to be consider valid estimation. Default value 0.5

The documentation for percentage_of_length suggests that is an "AND" condition, not an "OR" condition. However, the code indicates that the max is being used, which is also what mir_eval does:

        # Detect field naming style used and validate onset
        if 'event_offset' in reference_event and 'event_offset' in estimated_event:
            annotated_length = reference_event['event_offset'] - reference_event['event_onset']

            return math.fabs(reference_event['event_offset'] - estimated_event['event_offset']) <= max(t_collar, percentage_of_length * annotated_length)

        elif 'offset' in reference_event and 'offset' in estimated_event:
            annotated_length = reference_event['offset'] - reference_event['onset']

            return math.fabs(reference_event['offset'] - estimated_event['offset']) <= max(t_collar, percentage_of_length * annotated_length)

I would suggest adapting the documentation for EventBasedMetrics based upon the mir_eval documentation for offset_ratio and offset_min_tolerance to make the sed_eval documentation clearer.

Confusion matrix implemented ?

Hi ! first thanks for this nice toolbox , very helpful! I was just wondering whether you had implemented a confusion matrix somewhere ? I would be particularly interested in having one for event-based metrics

thanks
Dorian

evaluated_length_seconds, evaluated_files variables in SegmentBasedMetrics and EventBasedMetics classes are not initialized to 0 when running reset() function.

The reset() function initializes all internal states.
So isn't it natural to set these variables be zero, too?

in SegmentBasedMetrics class,

def reset(self):
    """Reset internal state
    """
    self.evaluated_length_seconds = 0
    self.evaluated_files = 0

    self.overall = {
        'Ntp': 0.0,
        'Ntn': 0.0,
        'Nfp': 0.0,
        'Nfn': 0.0,
        'Nref': 0.0,
        'Nsys': 0.0,
        'ER': 0.0,
        'S': 0.0,
        'D': 0.0,
        'I': 0.0,
    }

    self.class_wise = {}
    for class_label in self.event_label_list:
        self.class_wise[class_label] = {
            'Ntp': 0.0,
            'Ntn': 0.0,
            'Nfp': 0.0,
            'Nfn': 0.0,
            'Nref': 0.0,
            'Nsys': 0.0,
        }

    return self

Thank you for your hard work!

What's the difference between accuracy in SED's metrics and binary_accuracy in tensorflow?

Great library!
I have use this library for a while, but still I can't tell the difference between segment-based (set segment length to 1 frame) accurary used in SED and binary accuracy in tensorflow. My understanding is that 1 frame segment-based equal to binary accuracy in tensorflow, since they both calculate the acc in each frame. However, these two value are not equal in my model. I just don't know why.

The accuracy I mean is

accuracy = ( TP + TN ) / ( TP + TN + FP + FN )

Binary accuracy in tensorflow is

def binary_accuracy(y_true, y_pred):
return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)

Two formula above actually mean the same thing. My output is frame by frame, and each frame has num_of_class scalars to present existence of each class.

Please tell me whether I misunderstand it. Thanks!

sed_eval.io.load_event_list does not support .csv

Hi Toni,

I found the sed_eval.io.load_event_list function works well for .txt file. But when I change the suffix to .csv, it is parsing the events wrongly. Do you have any ideas? Many thanks! 

Qiuqiang

Unified output types?

There's some inconsistency in the types of the computed metrics: error rate is float, but f-score is numpy.float64. Nsubs is float, but Ntp is numpy.float64. Should they maybe be unified?

F1 metric is nan when should be 0

In a particular test, my system failed to produce any event of one of the reference classes (C). As expected, its recall is 0, and its precision NaN on that particular class (see below). Yet, the F1 score on that class should, IMHO, be 0 (yes pr/p+r is 0/0 fine but at a higher level it's a full miss, the denominator really is ε).

In turn the class-wise average should take that C class into account as 0, rather than as NaN as currently, and should be the average of A and C instead of just A.

  Class-wise metrics
  ======================================
    Event label  | Nref    Nsys  | F        Pre      Rec    | ER       Del      Ins    | Sens     Spec     Bacc     Acc     
    ------------ | -----   ----- | ------   ------   ------ | ------   ------   ------ | ------   ------   ------   ------  
    A            | 37      30    | 74.6%    83.3%    67.6%  | 0.46     0.32     0.14   | 67.6%    97.2%    82.4%    92.1%   
    B            | 0       0     | nan%     nan%     nan%   | 0.00     0.00     0.00   | 0.0%     100.0%   50.0%    100.0%  
    C            | 33      0     | nan%     nan%     0.0%   | 1.00     1.00     0.00   | 0.0%     100.0%   50.0%    84.7%   
    D            | 0       0     | nan%     nan%     nan%   | 0.00     0.00     0.00   | 0.0%     100.0%   50.0%    100.0%  
  Class-wise average metrics (macro-average)
  ======================================
  F-measure
    F-measure (F1)                  : 74.63 %
    Precision                       : 83.33 %
    Recall                          : 33.78 %

SegmentBasedMetrics constructor doesn't type check

The SegmentBasedMetrics constructor take a list of valid event labels, which must be of type list. Giving it an numpy.ndarray will work during construction, but will cause the code to crash when calling evaluate() with a not-so-easy-to-parse error. It would be helpful if the constructor checked for the types of its two input args (list and float > 0) and raised errors if they are incorrect.

Examples in docs for evaluating results directly in python without loading from files

Currently the only example (I could find) in the documentation for using sed_eval in python assumes the reference and estimate events are saved to disk as lab files. It would be helpful to have an example showing how to use sed_eval to compare ref/est that live in memory directly, including the expected data format (EventList?). Thanks!

Key error 'file'

Hey there,
I just noticed an error, that during evaluation for scene detection, the following error happens:

Traceback (most recent call last):
  File "../../../../system/evaluate17.py", line 36, in <module>
    file_pair['estimated_scene_list'])
  File ".local/lib/python3.6/site-packages/sed_eval/scene.py", line 159, in evaluate
    if estimated_item['file'] == reference_item['file']:
KeyError: 'file'

I checked out the source for .load() in dcase_util and it seems that all variables have been renamed to filename not file. A simple fix would be to just replace the name.

Or did I misunderstand some of the usage of this script?

dcase_util dependency?

Hey all, thanks for providing this package!

I noticed that there is now a dependence on dcase_util, even though only two pieces of that package are used in sed_eval: MetaDataContainer and FancyStringifier. I understand that these packages are developed together, but I wonder if it's worth refactoring the design a bit to reverse the direction of the dependency? That is, make dcase_util depend on sed_eval instead?

I bring this up because dcase_util itself has a rather heavy dependency chain, while sed_eval's is comparatively light. As far as I can tell, there's nothing in sed_eval that requires any audio or signal processing, but dcase_util brings over a load of otherwise unused dependencies (librosa, youtube-dl, etc). More to the point, there are many contexts outside of dcase where sed_eval could be useful, so it would be beneficial to keep the footprint as small as possible.

Graph-based matching for optimal (and correct) reference to estimate event matching

Just found out about this library, great work! (was about to start writing basically the same but decided to have one final look to see if there's anything out there already - glad I found it).

For the event-based metrics the evaluate function matches events by iterating over all reference and estimate events in a nested loop. I have two questions about this:

  1. Doesn't this mean that the same estimated event can be matched against more than one reference event? I'm not sure that's a desired behavior - presumably every reference event should be matched again at most one estimated event and vice versa?
  2. Assuming the goal of (1) is indeed to match every reference event against at most one estimated event (and vice versa), then using a nested loop means the matching is greedy and not necessarily optimal (i.e. it might not find the optimal pairing of matching events) leading to an underestimation of performance. This is exactly the case for note transcription (which is basically the same problem: matching pairs of events based on onset, offset and label (pitch) criteria). To get around this in mir_eval we match notes using bi-partite graph matching, which guarantees to find the optimal pairing of reference and estimated events. The same approach is also used everywhere else in mir_eval where to sets of events needs to be matched into pairs.

Let me know what you think, cheers!

Process to generate Event list is CSV formatted text-file

Hi @toni-heittola,
I am first time working on the SED task and got stuck in the generation of the Event list is CSV formatted text-file during the evaluation in Development dataset and Evaluation dataset of DCASE 2017 task 2 and 3. I can get the test predictions but am not sure about how to convert these frame-wise predictions to the time intervals (onset and offset time interval) for the Event list is CSV formatted text-file. Kindly help me to get any reference where I can learn this process from.

Stay Safe
Best Regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.