Code Monkey home page Code Monkey logo

Comments (5)

roedoejet avatar roedoejet commented on August 14, 2024

Are we sure this is an issue though and not just a coincidence? Using remove will remove the first 4576 ms of the audio, and so it's possible that the first word does align at 2.288s in the new audio. What happens if you change the dna method to mute instead? Also, when visualizing the readalong, is it wrong?

from studio.

joanise avatar joanise commented on August 14, 2024

Since soundswallower would not have seen that range at all, with remove, it can't have been aligned to that timestamp, it has to have happened in some of the postprocessing we do with the results from soundswallower.
Yes, the readalong is wrong when looking at it.
Good idea to try and see what happens with mute, I'll test that.

from studio.

roedoejet avatar roedoejet commented on August 14, 2024

Right. It looks like this is an interaction with the way we're adjoining silence between words:

if not bare:
        # Split adjoining silence/noise between words
        last_end = 0.0
        last_word = dict()
        for word in results["words"]:
            silence = word["start"] - last_end
            midpoint = last_end + silence / 2
            if silence > 0:
                if last_word:
                    last_word["end"] = midpoint
                word["start"] = midpoint
            last_word = word
            last_end = word["end"]
        silence = final_end - last_end
        if silence > 0:
            if last_word is not None:
                last_word["end"] += silence / 2

I'm not really sure what the intended functionality should be here. Maybe we should include dna segments as possible last_end values?

from studio.

roedoejet avatar roedoejet commented on August 14, 2024

This is a possible fix: 752553f

from studio.

joanise avatar joanise commented on August 14, 2024

752553f: just reading the code, I think it should fix the case where the silence goes back into the previous dna segment, so that probably works. I don't have a test case so I cannot check right now, but will it also avoid pushing the silence at the end of a word into a dna segment that follows it?
I'll have to test this fix anyway, but I'm not ready to do that right now, although maybe Marc would be able to.

from studio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.