Code Monkey home page Code Monkey logo

twardoch / audiostretchy Goto Github PK

View Code? Open in Web Editor NEW
31.0 2.0 2.0 2.26 MB

AudioStretchy is a Python wrapper around the `audio-stretch` C library, which performs fast, high-quality time-stretching of WAV/MP3 files without changing their pitch. Works well for speech, can time-stretch silence separately.

Home Page: https://pypi.org/project/audiostretchy/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
audio-speed audio-stretching time-domain-harmonic-scaling wav-audio

audiostretchy's Introduction

AudioStretchy

AudioStretchy is a Python library and CLI tool that which performs fast, high-quality time-stretching of WAV/MP3 files without changing their pitch. Works well for speech, can time-stretch silence separately. The library is a wrapper around David Bryant’s audio-stretch C library.

Version: 1.3.5

Features

  • Fast, high-quality time stretching of audio files without changing their pitch
  • Adjustable stretching ratio from 0.25 to 4.0
  • Cross-platform: Windows, macOS, and Linux
  • Supports WAV files and file-like objects. With [all] installation, also supports MP3 files and file-like objects
  • With [all] installation, also supports resampling

Time-domain harmonic scaling (TDHS) is a method for time-scale modification of speech (or other audio signals), allowing the apparent rate of speech articulation to be changed without affecting the pitch-contour and the time-evolution of the formant structure. TDHS differs from other time-scale modification algorithms in that time-scaling operations are performed in the time domain (not the frequency domain).

The core functionality of this package is provided by David Bryant’s excellent audio-stretch C library that performs fast, high-quality TDHS on WAV in the ratio range of 0.25 (4× slower) to 4.0 (4× faster).

The library gives very good results with speech recordings, especially with modest stretching at the ratio between 0.9 (10% slower) and 1.1 (10% faster). AudioStretchy is a Python wrapper around that library. The Python package also offers some additional, optional functionality: supports MP3 (in addition to WAV), and allows you to preform resampling.

Demo

Below are links to a short audio file (as WAV and MP3), with the same file stretched at 1.2 (20% slower):

Input Stretched
audio.wav audio-1.2.wav
audio.mp3 audio-1.2.mp3

Installation

Full installation

To be able to stretch and resample both WAV and MP3 files, install AudioStretchy using pip like so:

python3 -m pip install audiostretchy[all]

This installs the package and the pre-compiled audio-stretch libraries for macOS, Windows and Linux.

This also installs optional dependencies:

  • for MP3 support: pydub on macOS, pymp3 on Linux and Windows
  • for resampling: soxr

On macOS, you also need to install HomeBrew and then in Terminal run:

brew install ffmpeg

Minimal installation

To only be able to stretch WAV files (no resampling, no MP3 support), install AudioStretchy with minimal dependencies like so:

python3 -m pip install audiostretchy

This only installs the package and the pre-compiled audio-stretch libraries for macOS, Windows and Linux.

Full development installation

To install the development version, use:

python3 -m pip install git+https://github.com/twardoch/audiostretchy#egg=audiostretchy[all]

Usage

CLI

audiostretchy INPUT_WAV OUTPUT_WAV <flags>

POSITIONAL ARGUMENTS
    INPUT_PATH
        The path to the input WAV or MP3 audio file.
    OUTPUT_PATH
        The path to save the stretched WAV or MP3 audio file.

FLAGS
    -r, --ratio=RATIO
        The stretch ratio, where values greater than 1.0 will extend the audio and 
        values less than 1.0 will shorten the audio. From 0.5 to 2.0, or with `-d` 
        from 0.25 to 4.0. Default is 1.0 = no stretching.
    -g, --gap_ratio=GAP_RATIO
        The stretch ratio for gaps (silence) in the audio. 
        Default is 0.0 = uses ratio.
    -u, --upper_freq=UPPER_FREQ
        The upper frequency limit for period detection in Hz. Default is 333 Hz.
    -l, --lower_freq=LOWER_FREQ
        The lower frequency limit. Default is 55 Hz.
    -b, --buffer_ms=BUFFER_MS
        The buffer size in milliseconds for processing the audio in chunks 
        (useful with `-g`). Default is 25 ms.
    -t, --threshold_gap_db=THRESHOLD_GAP_DB
        The threshold level in dB to determine if a section of audio is considered 
        a gap (for `-g`). Default is -40 dB.
    -d, --double_range=DOUBLE_RANGE
        If set, doubles the min/max range of stretching.
    -f, --fast_detection=FAST_DETECTION
        If set, enables fast period detection, which may speed up processing but 
        reduce the quality of the stretched audio.
    -n, --normal_detection=NORMAL_DETECTION
        If set, forces the algorithm to use normal period detection instead 
        of fast period detection.
    -s, --sample_rate=SAMPLE_RATE
        The target sample rate for resampling the stretched audio in Hz (if installed 
        with `[all]`). Default is 0 = use sample rate of the input audio.

Python

from audiostretchy.stretch import stretch_audio

stretch_audio("input.wav", "output.wav", ratio=1.1)

In this example, the input.wav file will be time-stretched by a factor of 1.1, meaning it will be 10% longer, and the result will be saved in the output.wav file.

For advanced usage, you can use the AudioStretch class that lets you open and save files provided as paths or as file-like BytesIO objects:

from audiostretchy.stretch import AudioStretch

audio_stretch = AudioStretch()
# This needs [all] installation for MP3 support
audio_stretch.open(file=MP3DataAsBytesIO, format="mp3") 
audio_stretch.stretch(
    ratio=1.1,
    gap_ratio=1.2,
    upper_freq=333,
    lower_freq=55,
    buffer_ms=25,
    threshold_gap_db=-40,
    dual_force=False,
    fast_detection=False,
    normal_detection=False,
)
# This needs [all] installation for soxr support
audio_stretch.resample(sample_rate=44100) 
audio_stretch.save(file=WAVDataAsBytesIO, format="wav")

Changelog

  • v1.3.5: fix for MP3 writing
  • v1.3.2: fix for MP3 opening
  • v1.3.0: actually working on Windows as well
  • v1.2.x: working on macOS and Linux

License

audiostretchy's People

Contributors

twardoch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

audiostretchy's Issues

Error when trying to use audiostretchy

Hello, I m getting this error when trying to run audiostretchy 1.3.1.
I tried uninstalling and reinstalling it with "pip install audiostretchy[all]" but it doesn't seem to help could you tell me how to fix this please?
I am getting this error when I run : audiostretchy m1.mp3 m12.mp3 -r1.25
Traceback (most recent call last):
"C:\Users\bekaba\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return run_code(code, main_globals, None, File "C:\Users\bekaba\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Users\bekaba\AppData\Local\Programs\Python\Python310\Scripts\audiostretchy.exe_main.py", line 7, in File "C:\Users\bekaba\AppData\Local\Programs\Python\Python310\lib\site-packages\audiostretchy_main.py", line 10, in cli fire.Fire(stretch_audio) File "C:\Users\bekaba\AppData\Local\Programs\Python\Python310\lib\site-packages\fire\core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\bekaba\AppData\Local\Programs\Python\Python310\lib\site-packages\fire\core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "C:\Users\bekaba\AppData\Local\Programs\Python\Python310\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "C:\Users\bekaba\AppData\Local\Programs\Python\Python310\lib\site-packages\audiostretchy\stretch.py", line 336, in stretch_audio audio_stretch.open(input_path) File "C:\Users\bekaba\AppData\Local\Programs\Python\Python310\lib\site-packages\audiostretchy\stretch.py", line 57, in open self.open_mp3(audio_file) File "C:\Users\bekaba\AppData\Local\Programs\Python\Python310\lib\site-packages\audiostretchy\stretch.py", line 72, in open_mp3 with open(BytesIO(), "wb") as wav_io: TypeError: expected str, bytes or os.PathLike object, not BytesIO

_Stretch.so file not found

I ran a linux docker image where I was using audiostretchy package and got this error:

translation-audio-worker-1 | File "/app/libs/voice_clone.py", line 6, in
translation-audio-worker-1 | from audiostretchy.stretch import stretch_audio
translation-audio-worker-1 | File "/usr/local/lib/python3.10/site-packages/audiostretchy/stretch.py", line 9, in
translation-audio-worker-1 | from .interface.tdhs import TDHSAudioStretch
translation-audio-worker-1 | File "/usr/local/lib/python3.10/site-packages/audiostretchy/interface/tdhs.py", line 35, in
translation-audio-worker-1 | stretch_lib = ctypes.cdll.LoadLibrary(str(lib_path))
translation-audio-worker-1 | File "/usr/local/lib/python3.10/ctypes/init.py", line 452, in LoadLibrary
translation-audio-worker-1 | return self._dlltype(name)
translation-audio-worker-1 | File "/usr/local/lib/python3.10/ctypes/init.py", line 374, in init
translation-audio-worker-1 | self._handle = _dlopen(self._name, mode)
translation-audio-worker-1 | OSError: /usr/local/lib/python3.10/site-packages/audiostretchy/interface/linux/_stretch.so: cannot open shared object file: No such file or directory

How can I fix this such that it works? The audiostretchy package was working on Mac, so it must be an OS / system package issue.

Ran an ldd in the machine and got: not a dynamic executable

@twardoch

How does gap_ratio works?

Hi,
First of all, thank you so much for your work, your lib is super useful for my project with ElevenLabs.

I can't wrap my head on the gap_ratio parameters. If I understand well, it's to increase the silence length in audio, isn't it?
So if I want to slow down a speech by 30% and increase the silence length by 2 (or 100%) I would use this command:
audiostretchy tests_audio.wav test_audio-1.3.wav -r 1.3 -g 2
But it doesn't seem to increase the silence. I feel I'm missing something...
Thanks for your help!

OSError: exception: access violation writing 0x00000284C1557000

hello, i keep getting this error when using audiostretchy for a wav to wav speedup. any help would be appreciated. I am running it on windows

code:
from audiostretchy.stretch import stretch_audio
stretch_audio("input.wav", "output.wav", ratio=1.1)

error:
OSError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_16828\3037458555.py in
1 from audiostretchy.stretch import stretch_audio
2
----> 3 stretch_audio("input.wav", "output.wav", ratio=1.1)

~\anaconda3\lib\site-packages\audiostretchy\stretch.py in stretch_audio(input_path, output_path, ratio, gap_ratio, upper_freq, lower_freq, buffer_ms, threshold_gap_db, double_range, fast_detection, normal_detection, sample_rate)
354 audio_stretch = AudioStretch()
355 audio_stretch.open(input_path)
--> 356 audio_stretch.stretch(
357 ratio,
358 gap_ratio,

~\anaconda3\lib\site-packages\audiostretchy\stretch.py in stretch(self, ratio, gap_ratio, upper_freq, lower_freq, buffer_ms, threshold_gap_db, double_range, fast_detection, normal_detection)
314 stretcher.output_capacity(self.nframes, ratio), dtype=np.int16
315 )
--> 316 num_samples = stretcher.process_samples(
317 self.in_samples, len(self.in_samples), self.samples, ratio
318 )

~\anaconda3\lib\site-packages\audiostretchy\interface\tdhs.py in process_samples(self, samples, num_samples, output, ratio)
114 :return: The number of processed samples.
115 """
--> 116 return self.stretch_samples(self.handle, samples, num_samples, output, ratio)
117
118 def flush(self, output: np.ndarray) -> int:

OSError: exception: access violation writing 0x00000284C17F7000

silence in the audios

When I stretch an audio with a ratio different from 0.5 in the interval ]0; 1[ the audio size is indeed accelerated but the duration doesn't change. For example when do that in python:

stretch_audio("input.wav", "output.wav", ratio=0.7)

The output.wav file has the same duration as input.wav, but the beginning is correctly accelerated. There is a silence to fill it.

I plot a graph with the ratio on the x-axis (here there was a 0.1 difference between each calculation) and the time on the y-axis. You can see that there are steps and that the duration doesn't change for "special" values :

time evolution of a 47-second audio as a function of ratio

Stretched videos not at the correct length

I am trying to sync translated audio segments with a video using timestamps returned alongside the audio segment itself from a speech to text package. However, even with the stretch ratio calculated correctly, the duration of certain audio segments become too long, particularly because of a strange long pause at the end of the audio segment. For example in the attached zip folder there is the original audio and the stretched one. When calculating the stretch ratio based on the timestamp, the result duration should be about 5-6 seconds, a stretch ratio of around 1.1. However when inputting it into the stretch audio function, the video becomes 8 seconds instead with a 3 second pause. It will be great to know what's causing the problem and if there's something I am unaware of. The relevant code and audio files are below. Thank you!

`

def generate_segment_audio(segment, speaker_id):
    start, end, translated_text = segment  # Gets start and end timestamps from the audio segment
    segment_path = os.path.join(output_dir, f'segment_{start}_{end}.wav')
    stretched_path = os.path.join(output_dir, f'segment_{start}_{end}_stretched.wav')
    duration = end - start
    # Generate the audio file with the TTS model
    model.tts_to_file(translated_text, speaker_id, segment_path, speed=speed)

    # Adjust the audio speed to match the duration
    segment_audio = AudioSegment.from_file(segment_path)
    current_duration = len(segment_audio) / 1000  # Convert to seconds
    stretch_ratio = duration / current_duration
    print(f'{stretch_ratio} = {duration} / {current_duration}')
    stretch_audio(segment_path, stretched_path, ratio=stretch_ratio)
    return segment_path

`

audiofiles.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.