Code Monkey home page Code Monkey logo

Comments (8)

palonso avatar palonso commented on June 16, 2024 1

From StartStopSilence's doc:

Note: In standard mode the algorithm is to be run iteratively on a sequence of frames. The outputs are updated on each iteration, and the final result is produced at the end of the sequence.

For example:

from essentia.standard import MonoLoader, FrameGenerator, StartStopSilence

audio = MonoLoader(filename="your_audio.mp3")()
startStopSilence = StartStopSilence()

for frame in FrameGenerator(audio):
    start, stop = startStopSilence(frame)

print("start:", start, "stop:", stop)
start: 81 stop: 11061

note that the output is frame indices, not seconds.

For FadeDetection, the algorithm expects a vector of RMS values, computed with a frame rate of 4 frames per second by default:

from essentia.standard import MonoLoader, FrameGenerator, FadeDetection, RMS

audio = MonoLoader(filename="rock.mp3")()
rms = RMS()
fadeDetection = FadeDetection()

rms_values = [rms(frame) for frame in FrameGenerator(audio, frameSize=11025, hopSize=11025)]
fade_ins, fade_outs = fadeDetection(rms_values)

print("fade-ins:", fade_ins, "fade-outs:", fade_outs)
fade-ins: [] fade-outs: [[122. 129.]

No that in my example, the algorithm did not detect any fade-in.

The output matrices are:

  • fadeIn (matrix_real) - 2D-array containing start/stop timestamps corresponding to fade-ins [s] (ordered chronologically)
  • fadeOut (matrix_real) - 2D-array containing start/stop timestamps corresponding to fade-outs [s] (ordered chronologically)

from essentia.

Galvo87 avatar Galvo87 commented on June 16, 2024

Any update on this one please?

from essentia.

palonso avatar palonso commented on June 16, 2024

@Galvo87,
note that StartStopSilence expects to process a sequence of frames, not the full audio vector.
Check this Python example

from essentia.

Galvo87 avatar Galvo87 commented on June 16, 2024

Thanks, so I cannot work directly with Loaders, but I must generate frames with FameGenerator or similar?
I suppose I must input frame (vector_real) (the actual input audio frames) to StartStopSilence then... what is the best approach?

from essentia.

palonso avatar palonso commented on June 16, 2024

By checking StartStopCut's doc I realized your result could be correct.
An output of (0, 0) means that there are no cuts at the start nor the end of the audio (i.e., there is no audio signal in the first 10 nor the last 10 ms of audio). This means that your audio has a healthy silence margin.

You can check that the algorithm works correctly by trying the opposite case, for example with a continuous noise signal:

import numpy as np
from essentia.standard import StartStopCut

audio = np.random.randn(44100)  # one second of noise
print(StartStopCut()(audio))
>>> (1,1)

note that the output should be interpreted as pair of flags: contains cut or not

from essentia.

Galvo87 avatar Galvo87 commented on June 16, 2024

Ok got it, thanks.
What about algos like StartStopSilence that expect frame (vector_real) as input?
Is a FrameGenerator necessary in that case?
Also algos like FadeDetection, that expects a rms values array...

from essentia.

Galvo87 avatar Galvo87 commented on June 16, 2024

Thank you, understood.
Is there any handy Essentia method for converting frames to actual audio timestamps?

Also, is it possible for StartStopSilence to have ms precision?

from essentia.

palonso avatar palonso commented on June 16, 2024

Frames to seconds: frame_index * hop_size / sample_rate
Default hop_size: 512
Default sample_rate: 44100

StartStopSilence's resolution depends on the analysis hop size.
By default, 1000 * 512 / 44100 = 11.6ms, you can increase the time resolution by decreasing the analysis hop size.

from essentia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.