In order to have "the best dataset we can at a time t", we have chosen some constrains

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Sound with length < 10 sec are prioritized <a class="issue-link js-issue-link" data

Add strategy for prioritizing annotations to be voted about freesound-datasets HOT 5 CLOSED

mtg commented on June 8, 2024

Add strategy for prioritizing annotations to be voted

from freesound-datasets.

Comments (5)

ffont commented on June 8, 2024

I agree with what you propose.
You probably want to have these scores precomputed in a property of Annotation model because otherwise its probably complicated to compute all the scores in real time (specially if score is complex to compute).

I suggest you to start implementing a function which given an annotation returns a "priority score". This could be a method of Annotation class.

from freesound-datasets.

xavierfav commented on June 8, 2024

For now, the prioritization is based on votes:
Annotations that have at least one vote are prioritized.

In order to include the other constrains listed in this post, we need some Freesound metadata that we don't have in the current platform (ratings, nb of donwloads).

@ffont
Should we use the API to get this data. Or should I load it into our model so we have it in FSD platform?

Moreover, about the first point: vote all annotation candidate for a sound (in order to get closer to "complete" annotation for a sound)
I would say that as it is now, it is not worth to do it: because we did not work on population and prioritizing leaf nodes, we would prioritize annotation that are not worth voting (eg. voting "dog bark", "dog" and "animal"). We should first work on how to populate whenever an annotation is considered as ground truth.
We have been inspecting ambiguous cases with edufonseca (categories with more than one parent) to see whether or not it make sense to distinguish two categories and if it make sense to populate to the different parents or not.

from freesound-datasets.

ffont commented on June 8, 2024

@xavierfav We should use the API to load the data in the FSD platform ;)
There has always been the idea to write this management command that iterates over all sounds and gets data from freesound to store in the JSON field of each sound. I'm not sure if something similar was ever implemented (I guess not). I think this is the way to go, have this command that you can run from time to time to re-sync with Freesound.

When implementing the command, I'd iterate over all sounds in groups of N, and then use the API to make a search restricting the results to the IDs of these sounds (you can "OR" sound IDs in the search filter). Then using the fields param you decide which information you want to get returned and store in the FSD platform. In this way, the number of requests needed is n_sounds/N instead of n_sounds. N could be theoretically set to 150 (max number of search results per page), but the limitation here is the length of the URL (as all filter sound IDs will be in the URL). I think with N=50 should be fine. Otherwise try lower or higher values.

from freesound-datasets.

edufonseca commented on June 8, 2024

In the constraints listed in the first comment, it was suggested to prioritize sounds with length < 30 sec. I think we should specify further in this direction. How about prioritizing (apart from the other aforementioned constraints):

sounds with length < 10s (just as in AudioSet). This will presumably imply having more PP and also shorter sounds that, at this point, may be more useful.
when the above are over, sounds with length < 20s
when the above are over, sounds with length < 30s

from freesound-datasets.

xavierfav commented on June 8, 2024

Sound with length < 10 sec are prioritized #70

from freesound-datasets.

Add strategy for prioritizing annotations to be voted about freesound-datasets HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent