saurabhshri / ccaligner Goto Github PK
View Code? Open in Web Editor NEW๐ฎ Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.
๐ฎ Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.
I had already tried a lot of time to use parameters and some tested files. However, It showed me some error.
Here is the error showed on the screen:
grammar_tools.cpp (403) : generate | Something went wrong while creating vocabulary!
PS: test.wav is just a white noise and test.srt just contains some tested subtitles.
test.zip
For example, CCAligner will crash when the audio file only has 5 seconds but the srt file contains subtitles starts/ends at 6 seconds.
There is a heap buffer overflow error in PocketsphinxAligner::recognise
. It can be reproduced with the latest master and the following files files and executing ./ccaligner -wav Math.wav -srt Math.srt
. Here's the complete log for someone who wants to investigate (I have no idea what causes this, sorry):
==16346==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62e00001668a at pc 0x0000004a9d79 bp 0x7ffccbbcf1b0 sp 0x7ffccbbce950
READ of size 320 at 0x62e00001668a thread T0
#0 0x4a9d78 in memcpy /home/blitz/projects/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:739:5
#1 0xdbafab in fe_shift_frame (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xdbafab)
#2 0xdb84e4 in fe_process_frames_ext (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xdb84e4)
#3 0xdb80a9 in fe_process_frames (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xdb80a9)
#4 0xd80ed2 in acmod_process_raw (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xd80ed2)
#5 0xd78aa3 in ps_process_raw (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xd78aa3)
#6 0x9ad10c in PocketsphinxAligner::recognise() /home/blitz/projects/CCAligner-upstream/src/lib_ccaligner/recognize_using_pocketsphinx.cpp:477:19
#7 0x9afe4b in PocketsphinxAligner::align() /home/blitz/projects/CCAligner-upstream/src/lib_ccaligner/recognize_using_pocketsphinx.cpp:557:13
#8 0x56044f in CCAligner::initAligner() /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:58:42
#9 0x560abe in main /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:76:28
#10 0x7fcc542e8f69 in __libc_start_main (/usr/lib/libc.so.6+0x20f69)
#11 0x48f599 in _start (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0x48f599)
0x62e00001668a is located 0 bytes to the right of 41610-byte region [0x62e00000c400,0x62e00001668a)
allocated by thread T0 here:
#0 0x55cb62 in operator new(unsigned long) /home/blitz/projects/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:92:3
#1 0x937f38 in std::__1::__allocate(unsigned long) /usr/bin/../include/c++/v1/new:228:10
#2 0x937f38 in std::__1::allocator<short>::allocate(unsigned long, void const*) /usr/bin/../include/c++/v1/memory:1790
#3 0x937f38 in std::__1::allocator_traits<std::__1::allocator<short> >::allocate(std::__1::allocator<short>&, unsigned long) /usr/bin/../include/c++/v1/memory:1544
#4 0x937f38 in std::__1::vector<short, std::__1::allocator<short> >::allocate(unsigned long) /usr/bin/../include/c++/v1/vector:937
#5 0x9cd670 in _ZNSt3__16vectorIsNS_9allocatorIsEEE6assignIPsEENS_9enable_ifIXaasr21__is_forward_iteratorIT_EE5valuesr16is_constructibleIsNS_15iterator_traitsIS7_E9referenceEEE5valueEvE4typeES7_S7_ /usr/bin/../include/c++/v1/vector:1414:9
#6 0x979bc7 in std::__1::vector<short, std::__1::allocator<short> >::operator=(std::__1::vector<short, std::__1::allocator<short> > const&) /usr/bin/../include/c++/v1/vector:1359:9
#7 0x979bc7 in PocketsphinxAligner::PocketsphinxAligner(Params*) /home/blitz/projects/CCAligner-upstream/src/lib_ccaligner/recognize_using_pocketsphinx.cpp:45
#8 0x560446 in CCAligner::initAligner() /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:58:9
#9 0x560abe in main /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:76:28
#10 0x7fcc542e8f69 in __libc_start_main (/usr/lib/libc.so.6+0x20f69)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/blitz/projects/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:739:5 in memcpy
Shadow bytes around the buggy address:
0x0c5c7fffac80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c5c7fffac90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c5c7fffaca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c5c7fffacb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c5c7fffacc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c5c7fffacd0: 00[02]fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c5c7ffface0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c5c7fffacf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c5c7fffad00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c5c7fffad10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c5c7fffad20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==16346==ABORTING
The program is built and tested on Linux and Mac, but should also work on Windows after some modifications. I don't have a Windows machine available to me currently, so I could not try it. Here's me hoping someone could try building CCAligner on Windows and report or/and fix the build errors.
Grammar tools will probably not be working directly on Windows as they use Unix binaries right now, but I'll open a separate issue for it.
There are logger functions implemented in the program (see: https://github.com/saurabhshri/CCAligner/blob/master/src/lib_ccaligner/commons.cpp) , but logging is not implemented everywhere. Use those functions and properly perform logging at relevant places.
The current implementation of text tokenisation is pretty naive and doesn't cover all aspects. A nice tokenisation library should be able to generate all possible text tokens like currency, dates, numbers, symbols etc..
For example :
In 1996, 1996 people sent emails at someone @ example . com at 1:30 PM.
In nineteen ninety six, one thousand nine hundred and ninety six people sent emails at someone at example dot com at one thirty p m
and all the alternative versions.
The library needs to be integrated in subtitle parser (srtparser.h).
A docker of this with a simple API to access the commands would be awesome! I am having trouble getting the dependencies setup and it would be extremely nice if they were already setup and you just needed to install a docker.
Following commit 96ce9a7
CCAligner crashes after initializing the pocketsphinx decoder. Note that the program runs successfully on previous commits with same input files and parameters.
OS: Windows 7
Parameters: default (-wav file -srt file)
The program also runs with transcription turned on.
There are two errors which cause Segmentation fault (core dumped).
When we are trying to read .wav data from /dev/null(try just nul in Windows) we get Segmentation fault error.
And when we want to process an empty .wav file we also get Segmentation fault error.
An empty .wav file
BBC.zip
Not a bug, but an enhancement / feature request.
How about going one layer shallower and support converting transcripts (txt) to subtitles (srt) (example: https://www.grc.com/sn/sn-676.txt for the audio from https://www.youtube.com/watch?v=stUjByfyLfk )? This would remove the biggest burden (synchronization) from subtitling. It looks like you already did the heavy lifting.
Just a thought; thanks for your time.
Instead of passing a wave file, allow passing raw samples directly. Introduce a new parameter -raw
(see /src/lib_ccaligner/params.cpp
) and store it directly in _samples
in WaveFileData
(https://github.com/saurabhshri/CCAligner/blob/master/src/lib_ccaligner/read_wav_file.h#L27) so that they can be used from there.
I checked errors in CCAligner. First of all i checked if it processed right damaged files or files with wrong extension(not .wav files) and also files .wav but with wrong parameters(not those: 16 bit PCM mono sampled at 16KHz).
Then i run right file but with wrong parameters, and process was aborted with error: InvalidParameters. After all i run CCAligner with correct options and got .srt file with subtitles inside.
1.Tried to open file that doesn't exist, and then to open file that exist but with wrong extension(Error was handled)+
2.Tried to process file without grammar installed on my computer(Error was handled)+
3.Tried to process file without right options(not those: 16 bit PCM mono sampled at 16KHz). (Error was handle)+
4.Run CCAligner with wrong parameters(Error was handled)+
5.Changed manually .wav file to damage it and processed using CCAligner. I opened file with text editor and added a few symbols that mustn't be there such as random letters and numbers. After that file became damaged(There are a few noises that you can hear by playing that file). Then I got an error "Core dumped" which wasn't handled(Error wasn't handled)-
**Damaged file:
https://drive.google.com/open?id=1Xx8fm2louuJg_VNbW6l0Izbl5SgfgRLG
My apologies if this is not the way to do this, but I've always found GitHub very confusing to use. For decades now, I'm afraid.
Under the Linux/Mac dependencies installation, I tried to run sudo python setup.py install
, but I got:
running install
running bdist_egg
running egg_info
writing requirements to g2p_seq2seq.egg-info/requires.txt
writing g2p_seq2seq.egg-info/PKG-INFO
writing top-level names to g2p_seq2seq.egg-info/top_level.txt
writing dependency_links to g2p_seq2seq.egg-info/dependency_links.txt
writing entry points to g2p_seq2seq.egg-info/entry_points.txt
reading manifest file 'g2p_seq2seq.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'g2p_seq2seq.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.15-x86_64/egg
running install_lib
running build_py
creating build/bdist.macosx-10.15-x86_64/egg
creating build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/seq2seq_model.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/__init__.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/data_utils.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/app.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/g2p.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/seq2seq_model.py to seq2seq_model.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/__init__.py to __init__.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/data_utils.py to data_utils.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/app.py to app.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/g2p.py to g2p.pyc
creating build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/PKG-INFO -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/SOURCES.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/dependency_links.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/entry_points.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/requires.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/top_level.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/g2p_seq2seq-5.0.0a0-py2.7.egg' and adding 'build/bdist.macosx-10.15-x86_64/egg' to it
removing 'build/bdist.macosx-10.15-x86_64/egg' (and everything under it)
Processing g2p_seq2seq-5.0.0a0-py2.7.egg
Removing /Library/Python/2.7/site-packages/g2p_seq2seq-5.0.0a0-py2.7.egg
Copying g2p_seq2seq-5.0.0a0-py2.7.egg to /Library/Python/2.7/site-packages
g2p-seq2seq 5.0.0a0 is already the active version in easy-install.pth
Installing g2p-seq2seq script to /usr/local/bin
Installed /Library/Python/2.7/site-packages/g2p_seq2seq-5.0.0a0-py2.7.egg
Processing dependencies for g2p-seq2seq==5.0.0a0
Searching for termcolor>=1.1.0
Reading https://pypi.org/simple/termcolor/
Downloading https://files.pythonhosted.org/packages/c1/ee/ad1f448e360e4b662fbff9e75cd210b73ad79998ce6483086e9df5b8e7e2/termcolor-2.0.1.tar.gz#sha256=6b2cf769e93364a2676e1de56a7c0cff2cf5bd07f37e9cc80b0dd6320ebfe388
Best match: termcolor 2.0.1
Processing termcolor-2.0.1.tar.gz
error: Couldn't find a setup script in /tmp/easy_install-YKz0m5/termcolor-2.0.1.tar.gz
Did I do something wrong? Is the problem with the code? I am not a programmer; I just need to align subtitles.
Thank you.
Getting error AttributeError: module 'tensorflow.contrib.rnn' has no attribute 'core_rnn_cell'
My TF is 1.13.1 and CUDA is 10.1
Guessing this is b/c of incompatibility b/w my version of TF and the presumed 1.0.0 version. However, I do not want to downgrade my CUDA from 10 to 8 just to use TF 1.0.0.
To fix this, I:
This seems to be working with my configuration. Since the aforementioned PR was declined, I'm documenting this only as an issue.
Couldn't figure why this error is popping up:
Program aborted because an exception has occurred.
Exception details:
Type: 12UnknownError.
Reason: [11-07 21:01:28][Fatal] /home/Sarthak/CCAligner/src/lib_ccaligner/recognize_using_pocketsphinx.cpp (148) : initDecoder | Failed to create recognizer, see log for details
Please help
A few small changes appear to be needed:
Example:
"subtitles": [
{
"subtitle" : "<font size=:exclamation:"24":exclamation:>Announcer: AND NOW A FIRESIDE CHAT",
"edited_text" : "AND NOW A FIRESIDE CHAT",
"start" : "3103",
"end" : "5606",
"words" : [
{
"word" : "AND",
"recognised" : "0",
"start" : "3103",
"end" : "3133",
"duration" : "30"
},
{
"word" : "NOW",
"recognised" : "1",
"start" : "3133",
"end" : "3693",
"duration" : "560"
},
{
"word" : "A",
"recognised" : "1",
"start" : "3703",
"end" : "3793",
"duration" : "90"
},
{
"word" : "FIRESIDE",
"recognised" : "1",
"start" : "3803",
"end" : "4313",
"duration" : "510"
},
{
"word" : "CHAT",
"recognised" : "1",
"start" : "4323",
"end" : "4603",
"duration" : "280"
}
],
"phonemes" : [
]
}:exclamation: {
Before I delve into the project, I would like to learn this.
Can we provide entire paragraph of transcript and expect it to generate synced subtitles from it or not? Like YouTube does.
Thank you
For grammar and language models generation, CCAligner has some dependencies. Current implementation requires user to have them installed and the code calls them using system calls and then process it's output. It currently works only on Linux/Unix.
Both the dependencies can be compiled on Windows (List: https://github.com/saurabhshri/CCAligner#installing-dependencies). Relevant file : (https://github.com/saurabhshri/CCAligner/blob/master/src/lib_ccaligner/grammar_tools.cpp).
Add support for these dependencies in Windows.
Allow passing text transcript instead of subtitles. Add a new parameter -txt
. When this mode is chosen, do not allow normal word level synchronisation, but only allow complete timed transcription.
I'm using this CLI:
./ccaligner -wav ../test/data/goforward.wav -srt ../test/data/goforward.srt --generate-grammar no
With the provided Files for testing on the repo.
I tried the installation with TF 1.0.0 with the packages provided on the Repository.
After that i installed the TF 1.13.0 as is recomened on this PR #88 with no success.
I don't know if it is a problem with my Build or Installation. I can't Get it work on project. Any help would be appreciated.
Python 3.5.1 Packages
https://pastebin.com/X8sb67Mk
CCAligener Logs.
https://pastebin.com/DHh0mecr
https://pastebin.com/DcUjffss
Error handling is extremely important but is missing at many crucial places. There are some functions such as fatal()
(see: https://github.com/saurabhshri/CCAligner/blob/master/src/lib_ccaligner/commons.cpp). Add error handling for those cases to make program more robust.
Now we can see a lot of data printed to stdout in report. I mean not only [Info] and [Fatal] but also [Debug] and [Verbose].
In my opinion there is too much data for user to read.So I suggest not to print [Debug] and [Verbose] in normal mode and add option like -debug(or something like that) to print this data.
Which movie player can recognize the generated file.xml, do I need translate to srt or ssa format?
I use potplayer to play movie in Windows
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.