Code Monkey home page Code Monkey logo

aws-transcription-parser's Introduction

Automated parsing of Amazon-Transcribe-annotated episode transcripts

This is an automatic parser of Amazon Transcribe jobs - of podcast episodes - which outputs to HTML.

Compatible with Python 2.7 and Python 3+.

Just populate an input file, called input.txt, where each line is semi-colon-separated and contains the name of the Amazon Transcribe output JSON file, and a comma-separated, ordered list of speakers.

For example, the following input.txt file will result in the iterative processing of files episode_1.json, episode_2.json and episode_3.json. speaker_1, speaker_2 and speaker_3 will replace the automatically generated placeholders spk_0, spk_1 and spk_2. The output HTML files will be named after the jobName from each input JSON file.

episode_1.json;speaker_1,speaker_2
episode_2.json;speaker_2,speaker_3
episode_3.json;speaker_1,speaker_2,speaker_3

Once you've created your input.txt file and moved it in the same directory as the process_aws_output.py file, you simply need to run the script with Python:

$ python process_aws_output.py
SUCCESS!

A SUCCESS! message is expected, signifying that all HTML outputs have been stored in the same directory.

Please, don't hesitate to ask questions or request changes or improvements via the Issues section.

If you're feeling generous, donations are welcome:

BTC: 1QFNgTV3GQby8uv3mXwLKBHAgKUEenSREd

ETH: 0xa7350d9fb3c6193759b587bb984f0dfe3568c8ed

LTC: LW3SNJ61CXUfRQTpehpDfV7vv1iVdLh9En

ADA: DdzFFzCqrhtBbS7o5LQ3u1ZxFVz3Q6b2bQ86FEYanf6UsRgK6D3So4grpZEHPXcitQWEuRfnAA7jzi3xmj9Md6kng2UiVn4QLxEsAefK

BCH: 1QFNgTV3GQby8uv3mXwLKBHAgKUEenSREd

aws-transcription-parser's People

Contributors

crypto-jeronimo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

aws-transcription-parser's Issues

list index out of range

I have json file from Amazon with two speakers but I get the below error
Traceback (most recent call last): File "process_aws_output.py", line 114, in <module> run('input.txt') File "process_aws_output.py", line 109, in run parse_raw_transcription(fname, speakers) File "process_aws_output.py", line 96, in parse_raw_transcription html = build_html(lines, end_times, job_name, speaker_names) File "process_aws_output.py", line 81, in build_html html, current_speaker = _update_speaker(html, range_speaker) File "process_aws_output.py", line 31, in update_speaker speaker = speaker_map[int(speaker_name[-1])] **IndexError: list index out of range**

Speaker labels

Hi,

This script assumes there's going to be at least two speakers and there's labels for the speakers. I have a .json file where there's only one person speaking. I don't see any of the speaker labels so I get an error when I run the program. Any suggestions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.