Code Monkey home page Code Monkey logo

epub_to_audiobook's Introduction

EPUB to Audiobook Converter

This project provides a command-line tool to convert EPUB ebooks into audiobooks. It now supports both the Microsoft Azure Text-to-Speech API and the OpenAI Text-to-Speech API to generate the audio for each chapter in the ebook. The output audio files are optimized for use with Audiobookshelf.

This project is developed with the help of ChatGPT.

Audio Sample

If you're interested in hearing a sample of the audiobook generated by this tool, check the links bellow.

Requirements

Audiobookshelf Integration

The audiobooks generated by this project are optimized for use with Audiobookshelf. Each chapter in the EPUB file is converted into a separate MP3 file, with the chapter title extracted and included as metadata.

demo

Chapter Titles

Parsing and extracting chapter titles from EPUB files can be challenging, as the format and structure may vary significantly between different ebooks. The script employs a simple but effective method for extracting chapter titles, which works for most EPUB files. The method involves parsing the EPUB file and looking for the title tag in the HTML content of each chapter. If the title tag is not present, a fallback title is generated using the first few words of the chapter text.

Please note that this approach may not work perfectly for all EPUB files, especially those with complex or unusual formatting. However, in most cases, it provides a reliable way to extract chapter titles for use in Audiobookshelf.

When you import the generated MP3 files into Audiobookshelf, the chapter titles will be displayed, making it easy to navigate between chapters and enhancing your listening experience.

Installation

  1. Clone this repository:

    git clone https://github.com/p0n1/epub_to_audiobook.git
    cd epub_to_audiobook
  2. Create a virtual environment and activate it:

    python3 -m venv venv
    source venv/bin/activate
  3. Install the required dependencies:

    pip install -r requirements.txt
  4. Set the following environment variables with your Azure Text-to-Speech API credentials, or your OpenAI API key if you're using OpenAI TTS:

    export MS_TTS_KEY=<your_subscription_key> # for Azure
    export MS_TTS_REGION=<your_region> # for Azure
    export OPENAI_API_KEY=<your_openai_api_key> # for OpenAI

Usage

To convert an EPUB ebook to an audiobook, run the following command, specifying the TTS provider of your choice with the --tts option:

python3 epub_to_audiobook.py <input_file> <output_folder> [options]

To check the latest option descriptions for this script, you can run the following command in the terminal:

python3 epub_to_audiobook.py -h
usage: epub_to_audiobook.py [-h] [--tts {azure,openai}] [--log LOG]
                            [--preview] [--language LANGUAGE]
                            [--newline_mode {single,double}]
                            [--chapter_start CHAPTER_START]
                            [--chapter_end CHAPTER_END] [--output_text]
                            [--remove_endnotes] [--voice_name VOICE_NAME]
                            [--break_duration BREAK_DURATION]
                            [--output_format OUTPUT_FORMAT]
                            [--openai_model OPENAI_MODEL]
                            [--openai_voice OPENAI_VOICE]
                            [--openai_format OPENAI_FORMAT]
                            input_file output_folder

Convert EPUB to audiobook

positional arguments:
  input_file            Path to the EPUB file
  output_folder         Path to the output folder

options:
  -h, --help            show this help message and exit
  --tts {azure,openai}  Choose TTS provider (default: azure). azure: Azure
                        Cognitive Services, openai: OpenAI TTS API. When using
                        azure, environment variables MS_TTS_KEY and
                        MS_TTS_REGION must be set. When using openai,
                        environment variable OPENAI_API_KEY must be set.
  --log LOG             Log level (default: INFO), can be DEBUG, INFO,
                        WARNING, ERROR, CRITICAL
  --preview             Enable preview mode. In preview mode, the script will
                        not convert the text to speech. Instead, it will print
                        the chapter index, titles, and character counts.
  --language LANGUAGE   Language for the text-to-speech service (default: en-
                        US). For Azure TTS (--tts=azure), check
                        https://learn.microsoft.com/en-us/azure/ai-
                        services/speech-service/language-
                        support?tabs=tts#text-to-speech for supported
                        languages. For OpenAI TTS (--tts=openai), their API
                        detects the language automatically. But setting this
                        will also help on splitting the text into chunks with
                        different strategies in this tool, especially for
                        Chinese characters. For Chinese books, use zh-CN, zh-
                        TW, or zh-HK.
  --newline_mode {single,double}
                        Choose the mode of detecting new paragraphs: 'single'
                        or 'double'. 'single' means a single newline
                        character, while 'double' means two consecutive
                        newline characters. (default: double, works for most
                        ebooks but will detect less paragraphs for some
                        ebooks)
  --chapter_start CHAPTER_START
                        Chapter start index (default: 1, starting from 1)
  --chapter_end CHAPTER_END
                        Chapter end index (default: -1, meaning to the last
                        chapter)
  --output_text         Enable Output Text. This will export a plain text file
                        for each chapter specified and write the files to the
                        output folder specified.
  --remove_endnotes     This will remove endnote numbers from the end or
                        middle of sentences. This is useful for academic
                        books.

Azure TTS Options:
  --voice_name VOICE_NAME
                        Voice name for the text-to-speech service (default:
                        en-US-GuyNeural). You can use zh-CN-YunyeNeural for
                        Chinese ebooks.
  --break_duration BREAK_DURATION
                        Break duration in milliseconds for the different
                        paragraphs or sections (default: 1250). Valid values
                        range from 0 to 5000 milliseconds.
  --output_format OUTPUT_FORMAT
                        Output format for the text-to-speech service (default:
                        audio-24khz-48kbitrate-mono-mp3). Support formats:
                        audio-16khz-32kbitrate-mono-mp3
                        audio-16khz-64kbitrate-mono-mp3
                        audio-16khz-128kbitrate-mono-mp3
                        audio-24khz-48kbitrate-mono-mp3
                        audio-24khz-96kbitrate-mono-mp3
                        audio-24khz-160kbitrate-mono-mp3
                        audio-48khz-96kbitrate-mono-mp3
                        audio-48khz-192kbitrate-mono-mp3. See
                        https://learn.microsoft.com/en-us/azure/ai-
                        services/speech-service/rest-text-to-
                        speech?tabs=streaming#audio-outputs. Only mp3 is
                        supported for now. Different formats will result in
                        different audio quality and file size.

OpenAI TTS Options:
  --openai_model OPENAI_MODEL
                        Available OpenAI model options: tts-1 and tts-1-hd.
                        Check https://platform.openai.com/docs/guides/text-to-
                        speech/audio-quality.
  --openai_voice OPENAI_VOICE
                        Available OpenAI voice options: alloy, echo, fable,
                        onyx, nova, and shimmer. Check
                        https://platform.openai.com/docs/guides/text-to-
                        speech/voice-options.
  --openai_format OPENAI_FORMAT
                        Available OpenAI output options: mp3, opus, aac, and
                        flac. Check
                        https://platform.openai.com/docs/guides/text-to-
                        speech/supported-output-formats.

Example:

python3 epub_to_audiobook.py examples/The_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder

Executing the above command will generate a directory named output_folder and save the MP3 files for each chapter inside it. Once generated, you can import these audio files into Audiobookshelf or play them with any audio player of your choice.

Preview Mode

Before converting your epub file to an audiobook, you can use the --preview option to get a summary of each chapter. This will provide you with the character count of each chapter and the total count, instead of converting the text to speech.

Example:

python3 epub_to_audiobook.py examples/The_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --preview

Using with Docker

This tool is available as a Docker image, making it easy to run without needing to manage Python dependencies.

First, make sure you have Docker installed on your system.

You can pull the Docker image from the GitHub Container Registry:

docker pull ghcr.io/p0n1/epub_to_audiobook:latest

Then, you can run the tool with the following command:

docker run --rm -v ./:/app -e MS_TTS_KEY=$MS_TTS_KEY -e MS_TTS_REGION=$MS_TTS_REGION ghcr.io/p0n1/epub_to_audiobook your_book.epub audiobook_output --tts azure

For OpenAI, you can run:

docker run --rm -v ./:/app -e OPENAI_API_KEY=$OPENAI_API_KEY ghcr.io/p0n1/epub_to_audiobook your_book.epub audiobook_output --tts openai

Replace $MS_TTS_KEY and $MS_TTS_REGION with your Azure Text-to-Speech API credentials. Replace $OPENAI_API_KEY with your OpenAI API key. Replace your_book.epub with the name of the input EPUB file, and audiobook_output with the name of the directory where you want to save the output files.

The -v ./:/app option mounts the current directory (.) to the /app directory in the Docker container. This allows the tool to read the input file and write the output files to your local file system.

You can also check the this example config file for docker compose usage.

User-Friendly Guide for Windows Users

For Windows users, especially if you're not very familiar with command-line tools, we've got you covered. We understand the challenges and have created a guide specifically tailored for you.

Check this step by step guide and leave a message if you encounter issues.

How to Get Your Azure Cognitive Service Key?

Source: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-text-to-speech#prerequisites

How to Get Your OpenAI API Key?

Check https://platform.openai.com/docs/quickstart/account-setup. Make sure you check the price details before use.

Customization of Voice and Language

You can customize the voice and language used for the Text-to-Speech conversion by passing the --voice_name and --language options when running the script.

Microsoft Azure offers a range of voices and languages for the Text-to-Speech service. For a list of available options, consult the Microsoft Azure Text-to-Speech documentation.

You can also listen to samples of the available voices in the Azure TTS Voice Gallery to help you choose the best voice for your audiobook.

For example, if you want to use a British English female voice for the conversion, you can use the following command:

python3 epub_to_audiobook.py <input_file> <output_folder> --voice_name en-GB-LibbyNeural --language en-GB

For OpenAI TTS, you can specify the model, voice, and format options using --openai_model, --openai_voice, and --openai_format, respectively.

More examples

Here are some examples that demonstrate various option combinations:

Examples Using Azure TTS

  1. Basic conversion using Azure with default settings
    This command will convert an EPUB file to an audiobook using Azure's default TTS settings.

    python3 epub_to_audiobook.py "path/to/book.epub" "path/to/output/folder" --tts azure
  2. Azure conversion with custom language, voice and logging level
    Converts an EPUB file to an audiobook with a specified voice and a custom log level for debugging purposes.

    python3 epub_to_audiobook.py "path/to/book.epub" "path/to/output/folder" --tts azure --language zh-CN --voice_name "zh-CN-YunyeNeural" --log DEBUG
  3. Azure conversion with chapter range and break duration
    Converts a specified range of chapters from an EPUB file to an audiobook with custom break duration between paragraphs.

    python3 epub_to_audiobook.py "path/to/book.epub" "path/to/output/folder" --tts azure --chapter_start 5 --chapter_end 10 --break_duration "1500"

Examples Using OpenAI TTS

  1. Basic conversion using OpenAI with default settings
    This command will convert an EPUB file to an audiobook using OpenAI's default TTS settings.

    python3 epub_to_audiobook.py "path/to/book.epub" "path/to/output/folder" --tts openai
  2. OpenAI conversion with HD model and specific voice
    Converts an EPUB file to an audiobook using the high-definition OpenAI model and a specific voice choice.

    python3 epub_to_audiobook.py "path/to/book.epub" "path/to/output/folder" --tts openai --openai_model "tts-1-hd" --openai_voice "fable"
  3. OpenAI conversion with preview and text output
    Enables preview mode and text output, which will display the chapter index and titles instead of converting them and will also export the text.

    python3 epub_to_audiobook.py "path/to/book.epub" "path/to/output/folder" --tts openai --preview --output_text

Troubleshooting

ModuleNotFoundError: No module named 'importlib_metadata'

This may be because the Python version you are using is less than 3.8. You can try to manually install it by pip3 install importlib-metadata, or use a higher Python version.

License

This project is licensed under the MIT License. See the LICENSE file for details.

epub_to_audiobook's People

Contributors

hendkai avatar jczinger avatar p0n1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.