Code Monkey home page Code Monkey logo

saurabhshri / ccaligner Goto Github PK

View Code? Open in Web Editor NEW
165.0 16.0 34.0 130.37 MB

๐Ÿ”ฎ Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.

CMake 0.92% C++ 72.14% Python 4.15% Shell 0.37% Objective-C 1.44% Ruby 0.04% MATLAB 0.05% Batchfile 0.01% Java 3.39% C 15.21% Objective-C++ 0.92% Assembly 0.07% HTML 0.07% JavaScript 0.29% Makefile 0.25% M4 0.35% Perl 0.13% Roff 0.18% C# 0.01% Yacc 0.02%
subtitles aligner subtitle-alignment closed-captions forced-alignment word-level-alignment transcription karaoke api cli

ccaligner's People

Contributors

aravindkk14 avatar flyingtwigs avatar harrynull avatar hemangrajvanshy avatar himanshu40 avatar navimakarov avatar nikunj-taneja avatar rakete1111 avatar reconsolidated avatar saurabhshri avatar soumyaranjanbhol2002 avatar sphericalkat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ccaligner's Issues

Can't create vocabulary

I had already tried a lot of time to use parameters and some tested files. However, It showed me some error.

Here is the error showed on the screen:
grammar_tools.cpp (403) : generate | Something went wrong while creating vocabulary!

PS: test.wav is just a white noise and test.srt just contains some tested subtitles.
test.zip

Heap buffer overflow in pocketsphinx

There is a heap buffer overflow error in PocketsphinxAligner::recognise. It can be reproduced with the latest master and the following files files and executing ./ccaligner -wav Math.wav -srt Math.srt. Here's the complete log for someone who wants to investigate (I have no idea what causes this, sorry):

==16346==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62e00001668a at pc 0x0000004a9d79 bp 0x7ffccbbcf1b0 sp 0x7ffccbbce950
READ of size 320 at 0x62e00001668a thread T0
    #0 0x4a9d78 in memcpy /home/blitz/projects/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:739:5
    #1 0xdbafab in fe_shift_frame (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xdbafab)
    #2 0xdb84e4 in fe_process_frames_ext (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xdb84e4)
    #3 0xdb80a9 in fe_process_frames (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xdb80a9)
    #4 0xd80ed2 in acmod_process_raw (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xd80ed2)
    #5 0xd78aa3 in ps_process_raw (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0xd78aa3)
    #6 0x9ad10c in PocketsphinxAligner::recognise() /home/blitz/projects/CCAligner-upstream/src/lib_ccaligner/recognize_using_pocketsphinx.cpp:477:19
    #7 0x9afe4b in PocketsphinxAligner::align() /home/blitz/projects/CCAligner-upstream/src/lib_ccaligner/recognize_using_pocketsphinx.cpp:557:13
    #8 0x56044f in CCAligner::initAligner() /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:58:42
    #9 0x560abe in main /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:76:28
    #10 0x7fcc542e8f69 in __libc_start_main (/usr/lib/libc.so.6+0x20f69)
    #11 0x48f599 in _start (/home/blitz/projects/CCAligner-upstream/install/ccaligner+0x48f599)

0x62e00001668a is located 0 bytes to the right of 41610-byte region [0x62e00000c400,0x62e00001668a)
allocated by thread T0 here:
    #0 0x55cb62 in operator new(unsigned long) /home/blitz/projects/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:92:3
    #1 0x937f38 in std::__1::__allocate(unsigned long) /usr/bin/../include/c++/v1/new:228:10
    #2 0x937f38 in std::__1::allocator<short>::allocate(unsigned long, void const*) /usr/bin/../include/c++/v1/memory:1790
    #3 0x937f38 in std::__1::allocator_traits<std::__1::allocator<short> >::allocate(std::__1::allocator<short>&, unsigned long) /usr/bin/../include/c++/v1/memory:1544
    #4 0x937f38 in std::__1::vector<short, std::__1::allocator<short> >::allocate(unsigned long) /usr/bin/../include/c++/v1/vector:937
    #5 0x9cd670 in _ZNSt3__16vectorIsNS_9allocatorIsEEE6assignIPsEENS_9enable_ifIXaasr21__is_forward_iteratorIT_EE5valuesr16is_constructibleIsNS_15iterator_traitsIS7_E9referenceEEE5valueEvE4typeES7_S7_ /usr/bin/../include/c++/v1/vector:1414:9
    #6 0x979bc7 in std::__1::vector<short, std::__1::allocator<short> >::operator=(std::__1::vector<short, std::__1::allocator<short> > const&) /usr/bin/../include/c++/v1/vector:1359:9
    #7 0x979bc7 in PocketsphinxAligner::PocketsphinxAligner(Params*) /home/blitz/projects/CCAligner-upstream/src/lib_ccaligner/recognize_using_pocketsphinx.cpp:45
    #8 0x560446 in CCAligner::initAligner() /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:58:9
    #9 0x560abe in main /home/blitz/projects/CCAligner-upstream/src/ccaligner.cpp:76:28
    #10 0x7fcc542e8f69 in __libc_start_main (/usr/lib/libc.so.6+0x20f69)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/blitz/projects/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:739:5 in memcpy
Shadow bytes around the buggy address:
  0x0c5c7fffac80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c5c7fffac90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c5c7fffaca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c5c7fffacb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c5c7fffacc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c5c7fffacd0: 00[02]fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7ffface0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7fffacf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7fffad00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7fffad10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c5c7fffad20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==16346==ABORTING

[Build] Test building on Windows.

The program is built and tested on Linux and Mac, but should also work on Windows after some modifications. I don't have a Windows machine available to me currently, so I could not try it. Here's me hoping someone could try building CCAligner on Windows and report or/and fix the build errors.

Grammar tools will probably not be working directly on Windows as they use Unix binaries right now, but I'll open a separate issue for it.

Find and integrate a text tokenisation library.

The current implementation of text tokenisation is pretty naive and doesn't cover all aspects. A nice tokenisation library should be able to generate all possible text tokens like currency, dates, numbers, symbols etc..

For example :

In 1996, 1996 people sent emails at someone @ example . com at 1:30 PM.

In nineteen ninety six, one thousand nine hundred and ninety six people sent emails at someone at example dot com at one thirty p m

and all the alternative versions.

The library needs to be integrated in subtitle parser (srtparser.h).

Docker Build

A docker of this with a simple API to access the commands would be awesome! I am having trouble getting the dependencies setup and it would be extremely nice if they were already setup and you just needed to install a docker.

CCAligner crashes while recognizing and aligning

Following commit 96ce9a7

CCAligner crashes after initializing the pocketsphinx decoder. Note that the program runs successfully on previous commits with same input files and parameters.
OS: Windows 7
Parameters: default (-wav file -srt file)
The program also runs with transcription turned on.

capture

capture

Segmentation fault CCAligner errors

There are two errors which cause Segmentation fault (core dumped).

When we are trying to read .wav data from /dev/null(try just nul in Windows) we get Segmentation fault error.

screenshot from 2018-11-30 22-22-42
screenshot from 2018-11-30 22-22-47

And when we want to process an empty .wav file we also get Segmentation fault error.

screenshot from 2018-11-30 22-23-01
screenshot from 2018-11-30 22-23-05

An empty .wav file
BBC.zip

Failed to create recognizer

So, I have been trying to install ccaligner since a few days, getting different errors mostly related to dependencies. I even tried installing it in a factory new ubuntu 18.04 lts. I am getting this error.
Screen shot --
screenshot from 2018-10-02 19-22-37

Tested CCAligner

I checked errors in CCAligner. First of all i checked if it processed right damaged files or files with wrong extension(not .wav files) and also files .wav but with wrong parameters(not those: 16 bit PCM mono sampled at 16KHz).
Then i run right file but with wrong parameters, and process was aborted with error: InvalidParameters. After all i run CCAligner with correct options and got .srt file with subtitles inside.

1.Tried to open file that doesn't exist, and then to open file that exist but with wrong extension(Error was handled)+
wrong_file

2.Tried to process file without grammar installed on my computer(Error was handled)+
right_wav_file

3.Tried to process file without right options(not those: 16 bit PCM mono sampled at 16KHz). (Error was handle)+
file_with_wrong_wav

4.Run CCAligner with wrong parameters(Error was handled)+
wrong_parameters

5.Changed manually .wav file to damage it and processed using CCAligner. I opened file with text editor and added a few symbols that mustn't be there such as random letters and numbers. After that file became damaged(There are a few noises that you can hear by playing that file). Then I got an error "Core dumped" which wasn't handled(Error wasn't handled)-
error

**Damaged file:
https://drive.google.com/open?id=1Xx8fm2louuJg_VNbW6l0Izbl5SgfgRLG

Mac installation hiccup

My apologies if this is not the way to do this, but I've always found GitHub very confusing to use. For decades now, I'm afraid.

Under the Linux/Mac dependencies installation, I tried to run sudo python setup.py install, but I got:

running install
running bdist_egg
running egg_info
writing requirements to g2p_seq2seq.egg-info/requires.txt
writing g2p_seq2seq.egg-info/PKG-INFO
writing top-level names to g2p_seq2seq.egg-info/top_level.txt
writing dependency_links to g2p_seq2seq.egg-info/dependency_links.txt
writing entry points to g2p_seq2seq.egg-info/entry_points.txt
reading manifest file 'g2p_seq2seq.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'g2p_seq2seq.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.15-x86_64/egg
running install_lib
running build_py
creating build/bdist.macosx-10.15-x86_64/egg
creating build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/seq2seq_model.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/__init__.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/data_utils.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/app.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
copying build/lib/g2p_seq2seq/g2p.py -> build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/seq2seq_model.py to seq2seq_model.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/__init__.py to __init__.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/data_utils.py to data_utils.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/app.py to app.pyc
byte-compiling build/bdist.macosx-10.15-x86_64/egg/g2p_seq2seq/g2p.py to g2p.pyc
creating build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/PKG-INFO -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/SOURCES.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/dependency_links.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/entry_points.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/requires.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
copying g2p_seq2seq.egg-info/top_level.txt -> build/bdist.macosx-10.15-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/g2p_seq2seq-5.0.0a0-py2.7.egg' and adding 'build/bdist.macosx-10.15-x86_64/egg' to it
removing 'build/bdist.macosx-10.15-x86_64/egg' (and everything under it)
Processing g2p_seq2seq-5.0.0a0-py2.7.egg
Removing /Library/Python/2.7/site-packages/g2p_seq2seq-5.0.0a0-py2.7.egg
Copying g2p_seq2seq-5.0.0a0-py2.7.egg to /Library/Python/2.7/site-packages
g2p-seq2seq 5.0.0a0 is already the active version in easy-install.pth
Installing g2p-seq2seq script to /usr/local/bin

Installed /Library/Python/2.7/site-packages/g2p_seq2seq-5.0.0a0-py2.7.egg
Processing dependencies for g2p-seq2seq==5.0.0a0
Searching for termcolor>=1.1.0
Reading https://pypi.org/simple/termcolor/
Downloading https://files.pythonhosted.org/packages/c1/ee/ad1f448e360e4b662fbff9e75cd210b73ad79998ce6483086e9df5b8e7e2/termcolor-2.0.1.tar.gz#sha256=6b2cf769e93364a2676e1de56a7c0cff2cf5bd07f37e9cc80b0dd6320ebfe388
Best match: termcolor 2.0.1
Processing termcolor-2.0.1.tar.gz
error: Couldn't find a setup script in /tmp/easy_install-YKz0m5/termcolor-2.0.1.tar.gz

Did I do something wrong? Is the problem with the code? I am not a programmer; I just need to align subtitles.

Thank you.

Changes for TF 1.13.1 and CUDA 10.0

Getting error AttributeError: module 'tensorflow.contrib.rnn' has no attribute 'core_rnn_cell'

My TF is 1.13.1 and CUDA is 10.1

Guessing this is b/c of incompatibility b/w my version of TF and the presumed 1.0.0 version. However, I do not want to downgrade my CUDA from 10 to 8 just to use TF 1.0.0.

To fix this, I:

  1. Made the minor code change suggested in this closed PR
  2. Downloaded the current seq2seq, unzipped into the dependencies directory, and installed
  3. Recompiled using build.sh as described in the README
  4. Replaced the install/g2p-seq2seq-cmudict directory with one matching the seq2seq version

This seems to be working with my configuration. Since the aforementioned PR was declined, I'm documenting this only as an issue.

Failed to create recognizer

Couldn't figure why this error is popping up:

Program aborted because an exception has occurred.
Exception details:
Type: 12UnknownError.
Reason: [11-07 21:01:28][Fatal] /home/Sarthak/CCAligner/src/lib_ccaligner/recognize_using_pocketsphinx.cpp (148) : initDecoder | Failed to create recognizer, see log for details

Please help

Invalid JSON

A few small changes appear to be needed:

  1. escape double quotes in subtitle
  2. add commas between subtitle objects

Example:

"subtitles": [
{
"subtitle" : "<font size=:exclamation:"24":exclamation:>Announcer: AND NOW A FIRESIDE CHAT",
"edited_text" : "AND NOW A FIRESIDE CHAT",
"start" : "3103",
"end" : "5606",
"words" : [
{
"word" : "AND",
"recognised" : "0",
"start" : "3103",
"end" : "3133",
"duration" : "30"
},
{
"word" : "NOW",
"recognised" : "1",
"start" : "3133",
"end" : "3693",
"duration" : "560"
},
{
"word" : "A",
"recognised" : "1",
"start" : "3703",
"end" : "3793",
"duration" : "90"
},
{
"word" : "FIRESIDE",
"recognised" : "1",
"start" : "3803",
"end" : "4313",
"duration" : "510"
},
{
"word" : "CHAT",
"recognised" : "1",
"start" : "4323",
"end" : "4603",
"duration" : "280"
}
],
"phonemes" : [
]
}:exclamation: {

Fix dependencies in Windows.

For grammar and language models generation, CCAligner has some dependencies. Current implementation requires user to have them installed and the code calls them using system calls and then process it's output. It currently works only on Linux/Unix.

Both the dependencies can be compiled on Windows (List: https://github.com/saurabhshri/CCAligner#installing-dependencies). Relevant file : (https://github.com/saurabhshri/CCAligner/blob/master/src/lib_ccaligner/grammar_tools.cpp).

Add support for these dependencies in Windows.

initDecoder | Failed to create recognizer

I'm using this CLI:
./ccaligner -wav ../test/data/goforward.wav -srt ../test/data/goforward.srt --generate-grammar no
With the provided Files for testing on the repo.

I tried the installation with TF 1.0.0 with the packages provided on the Repository.
After that i installed the TF 1.13.0 as is recomened on this PR #88 with no success.

I don't know if it is a problem with my Build or Installation. I can't Get it work on project. Any help would be appreciated.

Python 3.5.1 Packages
https://pastebin.com/X8sb67Mk

CCAligener Logs.
https://pastebin.com/DHh0mecr
https://pastebin.com/DcUjffss

[Feature Request] Changing the way information is displayed

Now we can see a lot of data printed to stdout in report. I mean not only [Info] and [Fatal] but also [Debug] and [Verbose].

data_in_stdout
In my opinion there is too much data for user to read.So I suggest not to print [Debug] and [Verbose] in normal mode and add option like -debug(or something like that) to print this data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.