Code Monkey home page Code Monkey logo

oratio's Introduction

Open Oratio

An open source pipeline to translate .mp4 video files to .mov video files in 20 different languages.

Generate quality video and podcast localizations at scale.

Setup

Most important:

python --version >= 3.7

pip install -r docs/requirements.txt

Also install rubberband brew install rubberband

And follow the instructions in docs/ for aws and gcloud integration. Then make sure to setup the names of the s3 or gcloud bucket you will store your audio in. Set the AWS_BUCKET_NAME and the GCLOUD_BUCKET_NAME constants in src/constants/constants.py.

Optional Setup

Also install image magick, (if you want text overlay) brew install imagemagick

Setup pre-commit, if you want to contribute pre-commit install

Test setup: pre-commit run This should run black and run_tests.py but both should be skipped until code changes

Running the pipeline

python src/main.py tests/test_config.yaml will test your setup to make sure everything is in the right place.

After test_config.yaml starts working, make your own project folder in media/prod and edit the config.yaml to get going! Checkout my test video in media/prod/kaiser to familarize yourself with the setup.

python src/main.py will use the default config.yaml provided in the home directory.

Understanding the Repo

Start with src/main.py. Run it. Read it.

Follow the commands it executes with a debugger.

Then check out src/client.py. This is our biggest piece of abstraction, and especially if you are adding an API feature, you'll want a good understanding of what it is doing.

src/config.py and src/video_project.py have important setup information and maintain the state of the project.

File structure

. home
/docs - contains documentation on ideas, most documentation is in the relevant .py files
/src - contains source code for the pipeline
/src/api - the neural apis we work with, abstracted in the client.py
/media - contains input and output media
/media/dev - stores temporary files made during translation
/media/prod - stores the finalized input and output files
/media/test - stores test input files
/tests - unit tests for the pipeline

Metrics

Performance (speed) Performance (accuracy)

oratio's People

Contributors

kpister avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

oratio's Issues

Integrate a segment-level quality score

Machine translation is integrated, but of course some of the translations are bad.

With an instant segment-level quality score, Oratio can implement various flavours of "hybrid" translation:

  • create a priority queue for human post-editors
  • send segments below a certain quality to human post-editors
    as well as monitor aggregate quality and more system- or process-level issues (bad segmentation, wrong language, issues with content types...).

I'd suggest the ModelFront API.

Full-disclosure: I'm a co-founder of ModelFront.

Allow per language providers

Some providers do better on different languages or have better options. There are a lot of female only voices in google for example

Write helper function to improve locale/language distinction

The project class should have a helper function which takes

locale or language
include input or only targets

example uses:

for locale in self.get_targets(type=LOCALE, include_input=True)

for lang_code in self.get_targets(type=LANGUAGE_CODE, include_input=False)

Add background noise to tts

Making the tts sound more natural is 90% of our product. Even with the recent devs, we are dropping all background while the tts is playing. We could maybe add slight background tracks etc, for when bg music isn't provided to us.

Low priority, should go with a general rework/clarification of how we handle sound and track generation.

Audio only files are broken

Audio only has several broken features in config.py.

I'll investigate further to see where it is broken, my hunch is that some of the recent original sentences work should be treated as video only.

Could be related to video only flags, config.py might not be properly accounting for those, or project.py is not properly abstracted.

Add caption input/output

We will want to allow captions to be input with the video (since some youtubers will have those) and also output captions in the foreign languages.
These are stored in SRT files which have the following format

<section number>
<start time> --> <end time>
<Caption text
Can be multiple lines>

Example:

1
00:00:00,000 --> 00:00:04,400
Here is the first caption of the video.

2
00:00:06,200 --> 00:01:00,000
Then we have a really long
caption that takes up most of the video.

Tag individual sentences as speakers

We need to start considering multispeaker transitions.
One step here would be tagging a sentences with an identity that would track what synthesis model is being used (e.g. en-US-wavenet-A, en-US-wavenet-B or AWS Dave). Will allow for manual implementation of multispeaker videos before we start working with people.

Mimic input audio volume

Listening to some vloggers, they often modulate the volume of their voice - could be cool to add this feature.

An initial step would be tagging sentences at 3 different volume levels and applying numpy masks.
Bigger steps:
if we can tag word to word or phrase to phrase translation we can better connect these.
if we can quantify the amplitude of the input audio, we could make a continuous volume mask.

Allow for more precise timing

Aws gives us hundredths for stt, we should have that as our standard and allow gcloud to just be slightly less precise instead of truncating aws.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.