Code Monkey home page Code Monkey logo

trs-to-webvtt's Introduction

TRS-to-WebVTT

This Python script converts TRS subtitle files to the simpler and more common WebVTT format. TRS subtitles files are XML files produced by the old transcription software Transcriber and TranscriberAG.

The only other script I found to convert TRS files is geomedialab/convert-trs-srt. However, next to an easily fixable bug when encountering specific timecodes, it loses most temporal information when processing long segments by a single speaker. I thus found it easier to write my own script and skip the intermediate conversion from SubRip to WebVTT, which can be done, e.g., using nwoltman/srt-to-vtt-converter.

The script does not handle the named entity annotations described in the TranscriberAG manual, and comments are currently removed.

Requirements

Python 3.6+

There are no other dependencies unless you want to replace the vulnerable standard XML library with defusedxml.

Usage

python convert.py [-h] [-o PATH] [-l LANG] [-s] [-n] INPUTFILE

Unless an output path is specified using -o or --output, the result will be written to stdout.

The -l or --language argument adds a language header to the file.

When using -s or --speakers, each spoken line is prefixed with <v SPEAKER>, if this information is annotated.

When using -n or --noise, noise events such as (laughter), (silence), or (unintelligible) are preserved.

Example

Input TRS file example.trs created with Transcriber:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Trans SYSTEM "trans-14.dtd">
<Trans scribe="(unknown)" audio_filename="18061-006" version="1" version_date="020301">
<Speakers>
<Speaker id="spk1" name="interviewer" check="no" dialect="native" accent="" scope="local"/>
<Speaker id="spk2" name="interviewee" check="no" dialect="native" accent="" scope="local"/>
</Speakers>
<Episode>
<Section type="report" startTime="0" endTime="902.624">
<Turn speaker="spk1" startTime="0" endTime="15.665">
<Sync time="0"/>

<Sync time="9.037"/>
survivor is Abraham [Bommer] the date is August fourteenth nineteen ninety six
</Turn>
<Turn speaker="spk2" startTime="15.665" endTime="34.685">
<Sync time="15.665"/>
you know something I forgot to mention is about
<Sync time="20.3"/>
a man by the name of Captain [Jello]
<Sync time="23.99"/>
he was almost the head man from the
<Event desc="EE-HESITATION" type="noise" extent="instantaneous"/>
 Czechoslovakia army
<Sync time="28.385"/>
he came over there according my knowledge is
<Sync time="32.56"/>
with his wife because his wife was Jewish
</Turn>

[...]

Simple WebVTT output: python convert.py example.trs

WEBVTT

00:00:09.037 --> 00:00:15.665
survivor is Abraham [Bommer] the date is August fourteenth nineteen ninety six

00:00:15.665 --> 00:00:20.300
you know something I forgot to mention is about

00:00:20.300 --> 00:00:23.990
a man by the name of Captain [Jello]

00:00:23.990 --> 00:00:28.385
he was almost the head man from the Czechoslovakia army

00:00:28.385 --> 00:00:32.560
he came over there according my knowledge is

00:00:32.560 --> 00:00:34.685
with his wife because his wife was Jewish

[...]

WebVTT output with language, speaker, and noise info: python convert.py example.trs --language en --speaker --noise

WEBVTT
Language: en

00:00:09.037 --> 00:00:15.665
<v interviewer>survivor is Abraham [Bommer] the date is August fourteenth nineteen ninety six

00:00:15.665 --> 00:00:20.300
<v interviewee>you know something I forgot to mention is about

00:00:20.300 --> 00:00:23.990
<v interviewee>a man by the name of Captain [Jello]

00:00:23.990 --> 00:00:28.385
<v interviewee>he was almost the head man from the <i>(ee-hesitation)</i> Czechoslovakia army

00:00:28.385 --> 00:00:32.560
<v interviewee>he came over there according my knowledge is

00:00:32.560 --> 00:00:34.685
<v interviewee>with his wife because his wife was Jewish

[...]

trs-to-webvtt's People

Contributors

chbridges avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.