tympanix / subsync Goto Github PK

View Code? Open in Web Editor NEW

138.0 9.0 16.0 479 KB

Synchronize your subtitles using machine learning

License: Apache License 2.0

Makefile 1.56% Python 98.44%

neural-network subtitles mfcc machine-learning speech-detection shift-subtitle subtitle delay shift fix

subsync's Introduction

Subsync

Synchronize your subtitles using machine learning

Subsync analyses and processes the sound from your media files and uses machine learning to detect speech. Speech detection is used to shift existing subtitles for a perfect match in audio and text!

Features

Machine learning model for voice activity detection (not recognition)
Shift subtitle as a whole for best match
Sync every sentence in the subtitle individually
Sync using existing matched subtitle in a different laguage

Dependencies

ffmpeg (https://www.ffmpeg.org/download.html)

Installation

pip install subsync

Help

usage: subsync [-h] [--version] [--graph] [-d SECONDS] [-m SECONDS] [-s]
                   [--logfile PATH]
                   MEDIA [MEDIA ...]

positional arguments:
  MEDIA                 media for which to synchronize subtitles

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --graph               show graph for subtitle synchronization (default:
                        False)
  -d SECONDS, --duration SECONDS
                        duration (in seconds) of the sample audio length
                        increases precision but reduces speed (default: 900)
  -m SECONDS, --margin SECONDS
                        the margin in which to search for a subtitle match
                        (default: 12)
  -s, --start           sample audio from the start of the media instad of the
                        middle (default: False)
  -r, --recursive       recurviely sync every sentence in the subtitle
                        (default: False)
  --logfile PATH        path to location of log file for logging application
                        specific information (default: None)

Special thanks

[1] Automatic Subtitle Synchronization through Machine Learning

subsync's People

Contributors

Stargazers

Watchers

Forkers

avinassh xneomac snusy123 matiasaf hswmartin ostlerdev droiter swipswaps itoshikisetnom ghalymt lethalsyntax georgejerry rapportus tosunkaya stanislavalexandrov hpsbranco

subsync's Issues

Can't install tensorflow

Hello,

I am getting this error when installing subsync.

now i get this

No matching distribution found for tensorflow

Hello, I'm extremely noob regarding python but I really want to use this
When doing pip install subsync I get this error:
Collecting subsync
Using cached https://files.pythonhosted.org/packages/9c/6a/ebdc4e6a54cc7c9c80284ec1bbf3333ec8f7cc817a05cb9ff5c28055fb2f/subsync-0.1.5.tar.gz
Collecting tensorflow==1.5.1 (from subsync)
Could not find a version that satisfies the requirement tensorflow==1.5.1 (from subsync) (from versions: )
No matching distribution found for tensorflow==1.5.1 (from subsync)

Remove non-spoken sentences (e.g. advertisements)

Am I missing something. What does it actually do?

(my36) root@ubuntu:~# subsync /root/sample.mkv
Transcoding...
Analysing...
Predicting values...
Fitting...
Shift 0.032 seconds:

Entire subtitle got shifted by .032s, i thought this app would either generate a subtitle from speech detection or if .srt is provided it would re-adjust and align sentences perfectly with begin and end of voice.

Ensure subtitles are saved in UTF-8

Not able to install in Ubuntu 19.10/ Python 3.7

Hi,

I'm not able to install subsync due to following error:

$ pip3 install subsync
...
ERROR: Could not find a version that satisfies the requirement tensorflow==1.5.1 (from subsync) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 1.15.2, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0)
ERROR: No matching distribution found for tensorflow==1.5.1 (from subsync)

Seems to me that tensorflow 1.5.1 is not available for python 3.7 (just 1.13 onwards).

This issue is slightly different to this one, as in that case it seem like a network issue or something and there was no version at all.
#9

Thanks!

Idea/Question for another usecase

Hi,

I have wondered if this software could also be used to synchronize: text and audio. (for example: an ebook and an audiobook).
If that would be possible it could then be converted to an epub3 format and it would equal a selfmade version of amazons whispersync (Words get highlighted as they are spoken in the audiobook. Perfect for reading along while listening to an audiobook)

best,
gelsas

How to train the network?

Hey! I was wondring how i would be able to train the network?

FileNotFound error

Hello,
First of all, thanks for taking the time of sharing this project!
I'm getting this error when I run:

python -m subsync moviename.mp4

....................................................................................................
Windows 10, Python 3.6.3
....................................................................................................
Traceback (most recent call last):
File "C:\Program Files (x86)\Miniconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Program Files (x86)\Miniconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync_main.py", line 4, in
run()
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync\main.py", line 42, in run
m.mfcc(duration=args.duration, seek=not args.start)
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync\media.py", line 83, in mfcc
transcode = Transcode(self.filepath, duration=duration, seek=seek)
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync\ffmpeg.py", line 28, in init
self.length = self.__length()
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync\ffmpeg.py", line 55, in __length
cmd = subprocess.Popen(['ffprobe', self.input], stdout=PIPE, stderr=STDOUT)
File "C:\Program Files (x86)\Miniconda3\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "C:\Program Files (x86)\Miniconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] Le fichier spécifié est introuvable
....................................................................................................

I don't really know what's happening here... reinstalling didn't help, and I get the error both when the subtitle is within a mkv container and when it's a separate file.

Fix paths

When arg is a file in pwd the program crashes. Logging should log absolute paths

Try recurrent neural network

Add Information About Command-Line Usage In Readme

Thanks for all the work that was done to develop this great tool.

I think it might be useful to include a few lines in the Readme file with information about how to use subsync from the command line after installing it with PIP.

I suggest this in part because I've had no luck so far getting subsync to run from the command line with a simple test .mp4 file accompanied by a .srt that I'd like to resynchronize with subsync.

I fully admit that I have next to no Python experience, which is making it very difficult to figure out how to use the program from the command line.

I try as best I can, but all I can get is error messages like this:

$ sudo /home/tuser/.local/bin/subsync temp.mp4 temp.srt
/home/tuser/.local/lib/python2.7/site-packages/librosa/__init__.py:35: FutureWarning: You are using librosa with Python 2. Please note that librosa 0.7 will be the last version to support Python 2, after which it will require Python 3 or later.
  FutureWarning)
Traceback (most recent call last):
  File "/home/tuser/.local/bin/subsync", line 4, in <module>
    subsync.run()
  File "/home/tuser/.local/lib/python2.7/site-packages/subsync/main.py", line 32, in run
    from .media import Media
  File "/home/tuser/.local/lib/python2.7/site-packages/subsync/media.py", line 17, in <module>
    from .ffmpeg import Transcode
  File "/home/tuser/.local/lib/python2.7/site-packages/subsync/ffmpeg.py", line 9, in <module>
    from subprocess import DEVNULL, STDOUT, PIPE
ImportError: cannot import name DEVNULL

most working but some give errors

any help?

Traceback (most recent call last):
File "/home/jeroen/.local/bin/subsync", line 4, in
subsync.run()
File "/home/jeroen/.local/lib/python3.6/site-packages/subsync/main.py", line 42, in run
m.mfcc(duration=args.duration, seek=not args.start)
File "/home/jeroen/.local/lib/python3.6/site-packages/subsync/media.py", line 83, in mfcc
transcode = Transcode(self.filepath, duration=duration, seek=seek)
File "/home/jeroen/.local/lib/python3.6/site-packages/subsync/ffmpeg.py", line 28, in init
self.length = self.__length()
File "/home/jeroen/.local/lib/python3.6/site-packages/subsync/ffmpeg.py", line 57, in __length
match = re.search(r'(\d\d):(\d\d):(\d\d).(\d\d)', duration[0])
IndexError: list index out of range

Automate syncing of new incoming srt files

I made a little script that I run by systemd and checks every 10 min for new subtitles to sync.

#!/bin/bash -e

indir=$1
handled='/media/data/programs/SyncSubs_processed.txt'
handled_now='/media/data/programs/SyncSubs_processing.txt'

while true; do
	rm "$handled_now" && touch $handled_now
	touch "$handled"

	IFS=$'\n'
	srtfiles=$(find $indir -name '*srt' -mtime -3 -printf '%T@|%p\n' 2>/dev/null) || true
	for srt in $srtfiles; do
		IFS='|'
		set $srt
		epochfile=$1
		srtfile=$2
		sf_dir=$(dirname "$srtfile")
		sf_bn=$(basename "$srtfile")
		if grep -q "$srt" "$handled" ||  grep -q "$srt" "$handled_now"; then
			true
	        	echo "Already handled : $srt"
	        else
			## we handle only files older then 10minutes. So all other subs can be downloaded for the video
			curr_epoch=$(date +%s)
			if [ $(($curr_epoch - ${epochfile%%\.*})) -gt 200 ]; then
				## echo "Start handling : $sf_bn"
				movie=$(echo "$sf_bn" | perl -pe 's/(\...?.)?\.srt$//g')
				moviefile=$(find "$sf_dir" -size +300M -name "${movie}.*")
				find "$sf_dir" -name "${movie}*.srt" -printf "%T@|%p\n" >> $handled_now
				/home/jeroen/.local/bin/subsync $moviefile || true
				find "$sf_dir" -name "${movie}*.srt" -printf "%T@|%p\n" >> $handled
				## echo "end handling $movie"
			else
				echo "Not old enough : $sf_bn"
			fi
		fi
	done
rm $handled_now || true
sleep 600
done

cat /etc/systemd/system/SyncSubs.service
[Unit]
Description=Huubs Subtitle Sync Service

[Service]
User=jeroen
Group=jeroen
ExecStart=/media/data/programs/fixSubs.sh /media/data/complete

[Install]
WantedBy=multi-user.target

Recursively split subtitles where spaces occur

Illegal instruction (core dumped)

Heya, tried to run, installed pip pip2 and pip3
ubuntu18 64bit
.........................................................
python3.6 /home/riku/.local/bin/subsync 'moviename.mp4'
Illegal instruction (core dumped)
.........................................................
python /home/riku/.local/bin/subsync 'moviename.mp4'
Traceback (most recent call last):
File "/home/riku/.local/bin/subsync", line 4, in
subsync.run()
File "/home/riku/.local/lib/python2.7/site-packages/subsync/main.py", line 32, in run
from .media import Media
File "/home/riku/.local/lib/python2.7/site-packages/subsync/media.py", line 16, in
from .ffmpeg import Transcode
File "/home/riku/.local/lib/python2.7/site-packages/subsync/ffmpeg.py", line 9, in
from subprocess import DEVNULL, STDOUT, PIPE
ImportError: cannot import name DEVNULL

im not sure what is happening here.. other is python3 and other is 2. there is no info what python version should use so i tried both.

Subtitle syncing for other language than being spoken?

First of all, thanks for the complete write-up. Very interesting, for someone with very basic ML knowledge it was very interesting to learn of the additional metrics for log-loss etc.

Anyway, I was trying to automate subtitle retrieval and syncing for my own native language. Often I can get subtitles in my own language, but they do not line up at all. Is it possible to sync subtitles in another language, than what is spoken in the media file?

Also, does this take into account that sometimes subtitles have advertisements in their subtitles, e.g. feeding false information?

Edit: perhaps it's a good idea to create a wrapper that uses both https://github.com/Diaoul/subliminal to download and subsync to align each subtitle. That would be a killer solution. AFAIK there is no proper subtitle solution available.