saurabhshri / ccaligner Goto Github PK

🔮 Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.

CMake 0.92% C++ 72.14% Python 4.15% Shell 0.37% Objective-C 1.44% Ruby 0.04% MATLAB 0.05% Batchfile 0.01% Java 3.39% C 15.21% Objective-C++ 0.92% Assembly 0.07% HTML 0.07% JavaScript 0.29% Makefile 0.25% M4 0.35% Perl 0.13% Roff 0.18% C# 0.01% Yacc 0.02%

subtitles aligner subtitle-alignment closed-captions forced-alignment word-level-alignment transcription karaoke api cli

ccaligner's People

Contributors

Stargazers

Watchers

ccaligner's Issues

Can't create vocabulary

I had already tried a lot of time to use parameters and some tested files. However, It showed me some error.

Here is the error showed on the screen:
grammar_tools.cpp (403) : generate | Something went wrong while creating vocabulary!

PS: test.wav is just a white noise and test.srt just contains some tested subtitles.
test.zip

CCAligner crashes when srt timestamp is larger than the audio file

For example, CCAligner will crash when the audio file only has 5 seconds but the srt file contains subtitles starts/ends at 6 seconds.

Heap buffer overflow in pocketsphinx

There is a heap buffer overflow error in PocketsphinxAligner::recognise. It can be reproduced with the latest master and the following files files and executing ./ccaligner -wav Math.wav -srt Math.srt. Here's the complete log for someone who wants to investigate (I have no idea what causes this, sorry):

==16346==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62e00001668a at pc 0x0000004a9d79 bp 0x7ffccbbcf1b0 sp 0x7ffccbbce950
READ of size 320 at 0x62e00001668a thread T0
    #0 0x4a9d78 in memcpy /home/blitz/projects/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:739:5
    #1 0xdbafab in fe_shift_frame (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xdbafab)
    #2 0xdb84e4 in fe_process_frames_ext (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xdb84e4)
    #3 0xdb80a9 in fe_process_frames (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xdb80a9)
    #4 0xd80ed2 in acmod_process_raw (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xd80ed2)
    #5 0xd78aa3 in ps_process_raw (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xd78aa3)
    #6 0x9ad10c in PocketsphinxAligner::recognise() /home/blitz/projects/CCAligner-upstream/src/lib_ccaligner/recognize_using_pocketsphinx.cpp:477:19
    #7 0x9afe4b in PocketsphinxAligner::align() /home/blitz/projects/CCAligner-upstream/src/lib_ccaligner/recognize_using_pocketsphinx.cpp:557:13
    #8 0x56044f in CCAligner::initAligner() /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:58:42
    #9 0x560abe in main /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:76:28
    #10 0x7fcc542e8f69 in __libc_start_main (/usr/lib/libc.so.6+0x20f69)
    #11 0x48f599 in _start (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0x48f599)

0x62e00001668a is located 0 bytes to the right of 41610-byte region [0x62e00000c400,0x62e00001668a)
allocated by thread T0 here:
    #0 0x55cb62 in operator new(unsigned long) /home/blitz/projects/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:92:3
    #1 0x937f38 in std::__1::__allocate(unsigned long) /usr/bin/../include/c++/v1/new:228:10
    #2 0x937f38 in std::__1::allocator<short>::allocate(unsigned long, void const*) /usr/bin/../include/c++/v1/memory:1790
    #3 0x937f38 in std::__1::allocator_traits<std::__1::allocator<short> >::allocate(std::__1::allocator<short>&, unsigned long) /usr/bin/../include/c++/v1/memory:1544
    #4 0x937f38 in std::__1::vector<short, std::__1::allocator<short> >::allocate(unsigned long) /usr/bin/../include/c++/v1/vector:937
    #5 0x9cd670 in _ZNSt3__16vectorIsNS_9allocatorIsEEE6assignIPsEENS_9enable_ifIXaasr21__is_forward_iteratorIT_EE5valuesr16is_constructibleIsNS_15iterator_traitsIS7_E9referenceEEE5valueEvE4typeES7_S7_ /usr/bin/../include/c++/v1/vector:1414:9
    #6 0x979bc7 in std::__1::vector<short, std::__1::allocator<short> >::operator=(std::__1::vector<short, std::__1::allocator<short> > const&) /usr/bin/../include/c++/v1/vector:1359:9
    #7 0x979bc7 in PocketsphinxAligner::PocketsphinxAligner(Params*) /home/blitz/projects/CCAligner-upstream/src/lib_ccaligner/recognize_using_pocketsphinx.cpp:45
    #8 0x560446 in CCAligner::initAligner() /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:58:9
    #9 0x560abe in main /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:76:28
    #10 0x7fcc542e8f69 in __libc_start_main (/usr/lib/libc.so.6+0x20f69)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/blitz/projects/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:739:5 in memcpy
Shadow bytes around the buggy address:
  0x0c5c7fffac80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c5c7fffac90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c5c7fffaca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c5c7fffacb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c5c7fffacc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c5c7fffacd0: 00[02]fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7ffface0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7fffacf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7fffad00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7fffad10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7fffad20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==16346==ABORTING

[Build] Test building on Windows.

The program is built and tested on Linux and Mac, but should also work on Windows after some modifications. I don't have a Windows machine available to me currently, so I could not try it. Here's me hoping someone could try building CCAligner on Windows and report or/and fix the build errors.

Grammar tools will probably not be working directly on Windows as they use Unix binaries right now, but I'll open a separate issue for it.

Improve logging.

There are logger functions implemented in the program (see: https://github.com/saurabhshri/CCAligner/blob/master/src/lib_ccaligner/commons.cpp) , but logging is not implemented everywhere. Use those functions and properly perform logging at relevant places.

Find and integrate a text tokenisation library.

The current implementation of text tokenisation is pretty naive and doesn't cover all aspects. A nice tokenisation library should be able to generate all possible text tokens like currency, dates, numbers, symbols etc..

For example :

In 1996, 1996 people sent emails at someone @ example . com at 1:30 PM.

In nineteen ninety six, one thousand nine hundred and ninety six people sent emails at someone at example dot com at one thirty p m

and all the alternative versions.

The library needs to be integrated in subtitle parser (srtparser.h).

Docker Build

A docker of this with a simple API to access the commands would be awesome! I am having trouble getting the dependencies setup and it would be extremely nice if they were already setup and you just needed to install a docker.

CCAligner crashes while recognizing and aligning

Following commit 96ce9a7

CCAligner crashes after initializing the pocketsphinx decoder. Note that the program runs successfully on previous commits with same input files and parameters.
OS: Windows 7
Parameters: default (-wav file -srt file)
The program also runs with transcription turned on.

Segmentation fault CCAligner errors

There are two errors which cause Segmentation fault (core dumped).

When we are trying to read .wav data from /dev/null(try just nul in Windows) we get Segmentation fault error.

And when we want to process an empty .wav file we also get Segmentation fault error.

An empty .wav file
BBC.zip

Failed to create recognizer

So, I have been trying to install ccaligner since a few days, getting different errors mostly related to dependencies. I even tried installing it in a factory new ubuntu 18.04 lts. I am getting this error.
Screen shot --

Segmentation fault (core dumped) stops the program immediately

Any ideas on this 'Segmentation fault (core dumped)' error? I'm running on Ubuntu 16.04 and made sure to resample the audio files.

Cheers

Reverse-Inception?

Not a bug, but an enhancement / feature request.

How about going one layer shallower and support converting transcripts (txt) to subtitles (srt) (example: https://www.grc.com/sn/sn-676.txt for the audio from https://www.youtube.com/watch?v=stUjByfyLfk )? This would remove the biggest burden (synchronization) from subtitling. It looks like you already did the heavy lifting.

Just a thought; thanks for your time.

[Feature Request] Pass raw samples as input.

Instead of passing a wave file, allow passing raw samples directly. Introduce a new parameter -raw (see /src/lib_ccaligner/params.cpp) and store it directly in _samples in WaveFileData (https://github.com/saurabhshri/CCAligner/blob/master/src/lib_ccaligner/read_wav_file.h#L27) so that they can be used from there.

Tested CCAligner

I checked errors in CCAligner. First of all i checked if it processed right damaged files or files with wrong extension(not .wav files) and also files .wav but with wrong parameters(not those: 16 bit PCM mono sampled at 16KHz).
Then i run right file but with wrong parameters, and process was aborted with error: InvalidParameters. After all i run CCAligner with correct options and got .srt file with subtitles inside.

1.Tried to open file that doesn't exist, and then to open file that exist but with wrong extension(Error was handled)+

2.Tried to process file without grammar installed on my computer(Error was handled)+

3.Tried to process file without right options(not those: 16 bit PCM mono sampled at 16KHz). (Error was handle)+

4.Run CCAligner with wrong parameters(Error was handled)+

5.Changed manually .wav file to damage it and processed using CCAligner. I opened file with text editor and added a few symbols that mustn't be there such as random letters and numbers. After that file became damaged(There are a few noises that you can hear by playing that file). Then I got an error "Core dumped" which wasn't handled(Error wasn't handled)-

**Damaged file:
https://drive.google.com/open?id=1Xx8fm2louuJg_VNbW6l0Izbl5SgfgRLG

Mac installation hiccup

My apologies if this is not the way to do this, but I've always found GitHub very confusing to use. For decades now, I'm afraid.

Under the Linux/Mac dependencies installation, I tried to run sudo python setup.py install, but I got:

running install
running bdist_egg
running egg_info
writing requirements to g2p_seq2seq.egg-info/requires.txt
writing g2p_seq2seq.egg-info/PKG-INFO
writing top-level names to g2p_seq2seq.egg-info/top_level.txt
writing dependency_links to g2p_seq2seq.egg-info/dependency_links.txt
writing entry points to g2p_seq2seq.egg-info/entry_points.txt
reading manifest file 'g2p_seq2seq.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'g2p_seq2seq.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.15-x86_64/egg
running install_lib
running build_py
creating build/bdist.macosx-10.15-x86_64/egg
creating build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/seq2seq_model.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/__init__.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/data_utils.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/app.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/g2p.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/seq2seq_model.py to seq2seq_model.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/__init__.py to __init__.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/data_utils.py to data_utils.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/app.py to app.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/g2p.py to g2p.pyc
creating build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/PKG-INFO -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/SOURCES.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/dependency_links.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/entry_points.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/requires.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/top_level.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/g2p_seq2seq-5.0.0a0-py2.7.egg' and adding 'build/bdist.macosx-10.15-x86_64/egg' to it
removing 'build/bdist.macosx-10.15-x86_64/egg' (and everything under it)
Processing g2p_seq2seq-5.0.0a0-py2.7.egg
Removing /Library/Python/2.7/site-packages/g2p_seq2seq-5.0.0a0-py2.7.egg
Copying g2p_seq2seq-5.0.0a0-py2.7.egg to /Library/Python/2.7/site-packages
g2p-seq2seq 5.0.0a0 is already the active version in easy-install.pth
Installing g2p-seq2seq script to /usr/local/bin

Installed /Library/Python/2.7/site-packages/g2p_seq2seq-5.0.0a0-py2.7.egg
Processing dependencies for g2p-seq2seq==5.0.0a0
Searching for termcolor>=1.1.0
Reading https://pypi.org/simple/termcolor/
Downloading https://files.pythonhosted.org/packages/c1/ee/ad1f448e360e4b662fbff9e75cd210b73ad79998ce6483086e9df5b8e7e2/termcolor-2.0.1.tar.gz#sha256=6b2cf769e93364a2676e1de56a7c0cff2cf5bd07f37e9cc80b0dd6320ebfe388
Best match: termcolor 2.0.1
Processing termcolor-2.0.1.tar.gz
error: Couldn't find a setup script in /tmp/easy_install-YKz0m5/termcolor-2.0.1.tar.gz

Did I do something wrong? Is the problem with the code? I am not a programmer; I just need to align subtitles.

Thank you.

Changes for TF 1.13.1 and CUDA 10.0

Getting error AttributeError: module 'tensorflow.contrib.rnn' has no attribute 'core_rnn_cell'

My TF is 1.13.1 and CUDA is 10.1

Guessing this is b/c of incompatibility b/w my version of TF and the presumed 1.0.0 version. However, I do not want to downgrade my CUDA from 10 to 8 just to use TF 1.0.0.

To fix this, I:

Made the minor code change suggested in this closed PR
Downloaded the current seq2seq, unzipped into the dependencies directory, and installed
Recompiled using build.sh as described in the README
Replaced the install/g2p-seq2seq-cmudict directory with one matching the seq2seq version

This seems to be working with my configuration. Since the aforementioned PR was declined, I'm documenting this only as an issue.

Failed to create recognizer

Couldn't figure why this error is popping up:

Program aborted because an exception has occurred.
Exception details:
Type: 12UnknownError.
Reason: [11-07 21:01:28][Fatal] /home/Sarthak/CCAligner/src/lib_ccaligner/recognize_using_pocketsphinx.cpp (148) : initDecoder | Failed to create recognizer, see log for details

Please help

Memory Leaks

Running valgrind on ccaligner gives -

Invalid JSON

A few small changes appear to be needed:

escape double quotes in subtitle
add commas between subtitle objects

Example:

"subtitles": [
{
"subtitle" : "<font size=:exclamation:"24":exclamation:>Announcer: AND NOW A FIRESIDE CHAT",
"edited_text" : "AND NOW A FIRESIDE CHAT",
"start" : "3103",
"end" : "5606",
"words" : [
{
"word" : "AND",
"recognised" : "0",
"start" : "3103",
"end" : "3133",
"duration" : "30"
},
{
"word" : "NOW",
"recognised" : "1",
"start" : "3133",
"end" : "3693",
"duration" : "560"
},
{
"word" : "A",
"recognised" : "1",
"start" : "3703",
"end" : "3793",
"duration" : "90"
},
{
"word" : "FIRESIDE",
"recognised" : "1",
"start" : "3803",
"end" : "4313",
"duration" : "510"
},
{
"word" : "CHAT",
"recognised" : "1",
"start" : "4323",
"end" : "4603",
"duration" : "280"
}
],
"phonemes" : [
]
}:exclamation: {

So this can turn paragrafh into a synced subittle or it can only modify existing subtitle?

Before I delve into the project, I would like to learn this.

Can we provide entire paragraph of transcript and expect it to generate synced subtitles from it or not? Like YouTube does.

Thank you

Fix dependencies in Windows.

For grammar and language models generation, CCAligner has some dependencies. Current implementation requires user to have them installed and the code calls them using system calls and then process it's output. It currently works only on Linux/Unix.

Both the dependencies can be compiled on Windows (List: https://github.com/saurabhshri/CCAligner#installing-dependencies). Relevant file : (https://github.com/saurabhshri/CCAligner/blob/master/src/lib_ccaligner/grammar_tools.cpp).

Add support for these dependencies in Windows.

[Feature Request] Allow passing raw text transcript.

Allow passing text transcript instead of subtitles. Add a new parameter -txt. When this mode is chosen, do not allow normal word level synchronisation, but only allow complete timed transcription.

initDecoder | Failed to create recognizer

I'm using this CLI:
./ccaligner -wav ../test/data/goforward.wav -srt ../test/data/goforward.srt --generate-grammar no
With the provided Files for testing on the repo.

I tried the installation with TF 1.0.0 with the packages provided on the Repository.
After that i installed the TF 1.13.0 as is recomened on this PR #88 with no success.

I don't know if it is a problem with my Build or Installation. I can't Get it work on project. Any help would be appreciated.

Python 3.5.1 Packages
https://pastebin.com/X8sb67Mk

CCAligener Logs.
https://pastebin.com/DHh0mecr
https://pastebin.com/DcUjffss

Handle more errors!

Error handling is extremely important but is missing at many crucial places. There are some functions such as fatal() (see: https://github.com/saurabhshri/CCAligner/blob/master/src/lib_ccaligner/commons.cpp). Add error handling for those cases to make program more robust.

[Feature Request] Changing the way information is displayed

Now we can see a lot of data printed to stdout in report. I mean not only [Info] and [Fatal] but also [Debug] and [Verbose].

In my opinion there is too much data for user to read.So I suggest not to print [Debug] and [Verbose] in normal mode and add option like -debug(or something like that) to print this data.

Which movie player can recognize the generated file.xml

Which movie player can recognize the generated file.xml, do I need translate to srt or ssa format?
I use potplayer to play movie in Windows

saurabhshri / ccaligner Goto Github PK

ccaligner's People

Contributors

Stargazers

Watchers

Forkers

ccaligner's Issues

Recommend Projects

Recommend Topics

Recommend Org