Marc found a situation where silence splitting made the first word span half the DNA r

Since soundswallower would not have seen that range at all, with <code class="notransl

This is a possible fix: <a class="commit-link" data-hovercard-type="commit" data-hover

DNA audio and silence splitting interaction about studio HOT 5 OPEN

readalongs commented on August 14, 2024

DNA audio and silence splitting interaction

from studio.

Comments (5)

roedoejet commented on August 14, 2024

Are we sure this is an issue though and not just a coincidence? Using remove will remove the first 4576 ms of the audio, and so it's possible that the first word does align at 2.288s in the new audio. What happens if you change the dna method to mute instead? Also, when visualizing the readalong, is it wrong?

from studio.

joanise commented on August 14, 2024

Since soundswallower would not have seen that range at all, with remove, it can't have been aligned to that timestamp, it has to have happened in some of the postprocessing we do with the results from soundswallower.
Yes, the readalong is wrong when looking at it.
Good idea to try and see what happens with mute, I'll test that.

from studio.

roedoejet commented on August 14, 2024

Right. It looks like this is an interaction with the way we're adjoining silence between words:

if not bare:
        # Split adjoining silence/noise between words
        last_end = 0.0
        last_word = dict()
        for word in results["words"]:
            silence = word["start"] - last_end
            midpoint = last_end + silence / 2
            if silence > 0:
                if last_word:
                    last_word["end"] = midpoint
                word["start"] = midpoint
            last_word = word
            last_end = word["end"]
        silence = final_end - last_end
        if silence > 0:
            if last_word is not None:
                last_word["end"] += silence / 2

I'm not really sure what the intended functionality should be here. Maybe we should include dna segments as possible last_end values?

from studio.

roedoejet commented on August 14, 2024

This is a possible fix: 752553f

from studio.

joanise commented on August 14, 2024

752553f: just reading the code, I think it should fix the case where the silence goes back into the previous dna segment, so that probably works. I don't have a test case so I cannot check right now, but will it also avoid pushing the silence at the end of a word into a dna segment that follows it?
I'll have to test this fix anyway, but I'm not ready to do that right now, although maybe Marc would be able to.

from studio.

Recommend Projects

DNA audio and silence splitting interaction about studio HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent