tut-arg / sed_eval Goto Github PK
View Code? Open in Web Editor NEWEvaluation toolbox for Sound Event Detection
Home Page: http://tut-arg.github.io/sed_eval
License: MIT License
Evaluation toolbox for Sound Event Detection
Home Page: http://tut-arg.github.io/sed_eval
License: MIT License
The documentation for EventBasedMetrics is misleading.
It says:
t_collar : float (0,]
Time collar used when evaluating validity of the onset and offset, in seconds. Default value 0.2
percentage_of_length : float in [0, 1]
Second condition, percentage of the length within which the estimated offset has to be in order to be consider valid estimation. Default value 0.5
The documentation for percentage_of_length
suggests that is an "AND" condition, not an "OR" condition. However, the code indicates that the max is being used, which is also what mir_eval does:
# Detect field naming style used and validate onset
if 'event_offset' in reference_event and 'event_offset' in estimated_event:
annotated_length = reference_event['event_offset'] - reference_event['event_onset']
return math.fabs(reference_event['event_offset'] - estimated_event['event_offset']) <= max(t_collar, percentage_of_length * annotated_length)
elif 'offset' in reference_event and 'offset' in estimated_event:
annotated_length = reference_event['offset'] - reference_event['onset']
return math.fabs(reference_event['offset'] - estimated_event['offset']) <= max(t_collar, percentage_of_length * annotated_length)
I would suggest adapting the documentation for EventBasedMetrics based upon the mir_eval documentation for offset_ratio
and offset_min_tolerance
to make the sed_eval documentation clearer.
Hi ! first thanks for this nice toolbox , very helpful! I was just wondering whether you had implemented a confusion matrix somewhere ? I would be particularly interested in having one for event-based metrics
thanks
Dorian
sed_eval/sed_eval/sound_event.py
Line 92 in e8b9690
The reset() function initializes all internal states.
So isn't it natural to set these variables be zero, too?
in SegmentBasedMetrics class,
def reset(self):
"""Reset internal state
"""
self.evaluated_length_seconds = 0
self.evaluated_files = 0
self.overall = {
'Ntp': 0.0,
'Ntn': 0.0,
'Nfp': 0.0,
'Nfn': 0.0,
'Nref': 0.0,
'Nsys': 0.0,
'ER': 0.0,
'S': 0.0,
'D': 0.0,
'I': 0.0,
}
self.class_wise = {}
for class_label in self.event_label_list:
self.class_wise[class_label] = {
'Ntp': 0.0,
'Ntn': 0.0,
'Nfp': 0.0,
'Nfn': 0.0,
'Nref': 0.0,
'Nsys': 0.0,
}
return self
Thank you for your hard work!
Great library!
I have use this library for a while, but still I can't tell the difference between segment-based (set segment length to 1 frame) accurary used in SED and binary accuracy in tensorflow. My understanding is that 1 frame segment-based equal to binary accuracy in tensorflow, since they both calculate the acc in each frame. However, these two value are not equal in my model. I just don't know why.
The accuracy I mean is
accuracy = ( TP + TN ) / ( TP + TN + FP + FN )
Binary accuracy in tensorflow is
def binary_accuracy(y_true, y_pred):
return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
Two formula above actually mean the same thing. My output is frame by frame, and each frame has num_of_class scalars to present existence of each class.
Please tell me whether I misunderstand it. Thanks!
Hi Toni,
I found the sed_eval.io.load_event_list function works well for .txt file. But when I change the suffix to .csv, it is parsing the events wrongly. Do you have any ideas? Many thanks!
Qiuqiang
There's some inconsistency in the types of the computed metrics: error rate is float, but f-score is numpy.float64. Nsubs is float, but Ntp is numpy.float64. Should they maybe be unified?
In a particular test, my system failed to produce any event of one of the reference classes (C). As expected, its recall is 0, and its precision NaN on that particular class (see below). Yet, the F1 score on that class should, IMHO, be 0 (yes pr/p+r is 0/0 fine but at a higher level it's a full miss, the denominator really is ε).
In turn the class-wise average should take that C class into account as 0, rather than as NaN as currently, and should be the average of A and C instead of just A.
Class-wise metrics
======================================
Event label | Nref Nsys | F Pre Rec | ER Del Ins | Sens Spec Bacc Acc
------------ | ----- ----- | ------ ------ ------ | ------ ------ ------ | ------ ------ ------ ------
A | 37 30 | 74.6% 83.3% 67.6% | 0.46 0.32 0.14 | 67.6% 97.2% 82.4% 92.1%
B | 0 0 | nan% nan% nan% | 0.00 0.00 0.00 | 0.0% 100.0% 50.0% 100.0%
C | 33 0 | nan% nan% 0.0% | 1.00 1.00 0.00 | 0.0% 100.0% 50.0% 84.7%
D | 0 0 | nan% nan% nan% | 0.00 0.00 0.00 | 0.0% 100.0% 50.0% 100.0%
Class-wise average metrics (macro-average)
======================================
F-measure
F-measure (F1) : 74.63 %
Precision : 83.33 %
Recall : 33.78 %
The SegmentBasedMetrics
constructor take a list of valid event labels, which must be of type list
. Giving it an numpy.ndarray
will work during construction, but will cause the code to crash when calling evaluate()
with a not-so-easy-to-parse error. It would be helpful if the constructor checked for the types of its two input args (list and float > 0) and raised errors if they are incorrect.
Currently the only example (I could find) in the documentation for using sed_eval
in python assumes the reference and estimate events are saved to disk as lab files. It would be helpful to have an example showing how to use sed_eval
to compare ref/est that live in memory directly, including the expected data format (EventList
?). Thanks!
Hey there,
I just noticed an error, that during evaluation for scene detection, the following error happens:
Traceback (most recent call last):
File "../../../../system/evaluate17.py", line 36, in <module>
file_pair['estimated_scene_list'])
File ".local/lib/python3.6/site-packages/sed_eval/scene.py", line 159, in evaluate
if estimated_item['file'] == reference_item['file']:
KeyError: 'file'
I checked out the source for .load()
in dcase_util and it seems that all variables have been renamed to filename
not file. A simple fix would be to just replace the name.
Or did I misunderstand some of the usage of this script?
Hey all, thanks for providing this package!
I noticed that there is now a dependence on dcase_util
, even though only two pieces of that package are used in sed_eval
: MetaDataContainer
and FancyStringifier
. I understand that these packages are developed together, but I wonder if it's worth refactoring the design a bit to reverse the direction of the dependency? That is, make dcase_util
depend on sed_eval
instead?
I bring this up because dcase_util
itself has a rather heavy dependency chain, while sed_eval
's is comparatively light. As far as I can tell, there's nothing in sed_eval
that requires any audio or signal processing, but dcase_util
brings over a load of otherwise unused dependencies (librosa, youtube-dl, etc). More to the point, there are many contexts outside of dcase where sed_eval could be useful, so it would be beneficial to keep the footprint as small as possible.
Just found out about this library, great work! (was about to start writing basically the same but decided to have one final look to see if there's anything out there already - glad I found it).
For the event-based metrics the evaluate function matches events by iterating over all reference and estimate events in a nested loop. I have two questions about this:
mir_eval
we match notes using bi-partite graph matching, which guarantees to find the optimal pairing of reference and estimated events. The same approach is also used everywhere else in mir_eval
where to sets of events needs to be matched into pairs.Let me know what you think, cheers!
Hi @toni-heittola,
I am first time working on the SED task and got stuck in the generation of the Event list is CSV formatted text-file during the evaluation in Development dataset and Evaluation dataset of DCASE 2017 task 2 and 3. I can get the test predictions but am not sure about how to convert these frame-wise predictions to the time intervals (onset and offset time interval) for the Event list is CSV formatted text-file. Kindly help me to get any reference where I can learn this process from.
Stay Safe
Best Regards
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.