Code Monkey home page Code Monkey logo

subsync's Introduction

Subsync

Synchronize your subtitles using machine learning

Subsync analyses and processes the sound from your media files and uses machine learning to detect speech. Speech detection is used to shift existing subtitles for a perfect match in audio and text!

Features

  • Machine learning model for voice activity detection (not recognition)
  • Shift subtitle as a whole for best match
  • Sync every sentence in the subtitle individually
  • Sync using existing matched subtitle in a different laguage

Dependencies

Installation

pip install subsync

Help

usage: subsync [-h] [--version] [--graph] [-d SECONDS] [-m SECONDS] [-s]
                   [--logfile PATH]
                   MEDIA [MEDIA ...]

positional arguments:
  MEDIA                 media for which to synchronize subtitles

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --graph               show graph for subtitle synchronization (default:
                        False)
  -d SECONDS, --duration SECONDS
                        duration (in seconds) of the sample audio length
                        increases precision but reduces speed (default: 900)
  -m SECONDS, --margin SECONDS
                        the margin in which to search for a subtitle match
                        (default: 12)
  -s, --start           sample audio from the start of the media instad of the
                        middle (default: False)
  -r, --recursive       recurviely sync every sentence in the subtitle
                        (default: False)
  --logfile PATH        path to location of log file for logging application
                        specific information (default: None)

Special thanks

[1] Automatic Subtitle Synchronization through Machine Learning

subsync's People

Contributors

paaff avatar tympanix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

subsync's Issues

No matching distribution found for tensorflow

Hello, I'm extremely noob regarding python but I really want to use this
When doing pip install subsync I get this error:
Collecting subsync
Using cached https://files.pythonhosted.org/packages/9c/6a/ebdc4e6a54cc7c9c80284ec1bbf3333ec8f7cc817a05cb9ff5c28055fb2f/subsync-0.1.5.tar.gz
Collecting tensorflow==1.5.1 (from subsync)
Could not find a version that satisfies the requirement tensorflow==1.5.1 (from subsync) (from versions: )
No matching distribution found for tensorflow==1.5.1 (from subsync)

Am I missing something. What does it actually do?

(my36) root@ubuntu:~# subsync /root/sample.mkv
Transcoding...
Analysing...
Predicting values...
Fitting...
Shift 0.032 seconds:

Entire subtitle got shifted by .032s, i thought this app would either generate a subtitle from speech detection or if .srt is provided it would re-adjust and align sentences perfectly with begin and end of voice.

Not able to install in Ubuntu 19.10/ Python 3.7

Hi,

I'm not able to install subsync due to following error:

$ pip3 install subsync
...
ERROR: Could not find a version that satisfies the requirement tensorflow==1.5.1 (from subsync) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 1.15.2, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0)
ERROR: No matching distribution found for tensorflow==1.5.1 (from subsync)

Seems to me that tensorflow 1.5.1 is not available for python 3.7 (just 1.13 onwards).

This issue is slightly different to this one, as in that case it seem like a network issue or something and there was no version at all.
#9

Thanks!

Idea/Question for another usecase

Hi,

I have wondered if this software could also be used to synchronize: text and audio. (for example: an ebook and an audiobook).
If that would be possible it could then be converted to an epub3 format and it would equal a selfmade version of amazons whispersync (Words get highlighted as they are spoken in the audiobook. Perfect for reading along while listening to an audiobook)

best,
gelsas

FileNotFound error

Hello,
First of all, thanks for taking the time of sharing this project!
I'm getting this error when I run:

python -m subsync moviename.mp4

....................................................................................................
Windows 10, Python 3.6.3
....................................................................................................
Traceback (most recent call last):
File "C:\Program Files (x86)\Miniconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Program Files (x86)\Miniconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync_main
.py", line 4, in
run()
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync\main.py", line 42, in run
m.mfcc(duration=args.duration, seek=not args.start)
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync\media.py", line 83, in mfcc
transcode = Transcode(self.filepath, duration=duration, seek=seek)
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync\ffmpeg.py", line 28, in init
self.length = self.__length()
File "C:\Program Files (x86)\Miniconda3\lib\site-packages\subsync\ffmpeg.py", line 55, in __length
cmd = subprocess.Popen(['ffprobe', self.input], stdout=PIPE, stderr=STDOUT)
File "C:\Program Files (x86)\Miniconda3\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "C:\Program Files (x86)\Miniconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] Le fichier spécifié est introuvable
....................................................................................................

I don't really know what's happening here... reinstalling didn't help, and I get the error both when the subtitle is within a mkv container and when it's a separate file.

Fix paths

When arg is a file in pwd the program crashes. Logging should log absolute paths

Add Information About Command-Line Usage In Readme

Thanks for all the work that was done to develop this great tool.

I think it might be useful to include a few lines in the Readme file with information about how to use subsync from the command line after installing it with PIP.

I suggest this in part because I've had no luck so far getting subsync to run from the command line with a simple test .mp4 file accompanied by a .srt that I'd like to resynchronize with subsync.

I fully admit that I have next to no Python experience, which is making it very difficult to figure out how to use the program from the command line.

I try as best I can, but all I can get is error messages like this:

$ sudo /home/tuser/.local/bin/subsync temp.mp4 temp.srt
/home/tuser/.local/lib/python2.7/site-packages/librosa/__init__.py:35: FutureWarning: You are using librosa with Python 2. Please note that librosa 0.7 will be the last version to support Python 2, after which it will require Python 3 or later.
  FutureWarning)
Traceback (most recent call last):
  File "/home/tuser/.local/bin/subsync", line 4, in <module>
    subsync.run()
  File "/home/tuser/.local/lib/python2.7/site-packages/subsync/main.py", line 32, in run
    from .media import Media
  File "/home/tuser/.local/lib/python2.7/site-packages/subsync/media.py", line 17, in <module>
    from .ffmpeg import Transcode
  File "/home/tuser/.local/lib/python2.7/site-packages/subsync/ffmpeg.py", line 9, in <module>
    from subprocess import DEVNULL, STDOUT, PIPE
ImportError: cannot import name DEVNULL

most working but some give errors

any help?

Traceback (most recent call last):
File "/home/jeroen/.local/bin/subsync", line 4, in
subsync.run()
File "/home/jeroen/.local/lib/python3.6/site-packages/subsync/main.py", line 42, in run
m.mfcc(duration=args.duration, seek=not args.start)
File "/home/jeroen/.local/lib/python3.6/site-packages/subsync/media.py", line 83, in mfcc
transcode = Transcode(self.filepath, duration=duration, seek=seek)
File "/home/jeroen/.local/lib/python3.6/site-packages/subsync/ffmpeg.py", line 28, in init
self.length = self.__length()
File "/home/jeroen/.local/lib/python3.6/site-packages/subsync/ffmpeg.py", line 57, in __length
match = re.search(r'(\d\d):(\d\d):(\d\d).(\d\d)', duration[0])
IndexError: list index out of range

Automate syncing of new incoming srt files

I made a little script that I run by systemd and checks every 10 min for new subtitles to sync.

#!/bin/bash -e

indir=$1
handled='/media/data/programs/SyncSubs_processed.txt'
handled_now='/media/data/programs/SyncSubs_processing.txt'

while true; do
	rm "$handled_now" && touch $handled_now
	touch "$handled"

	IFS=$'\n'
	srtfiles=$(find $indir -name '*srt' -mtime -3 -printf '%T@|%p\n' 2>/dev/null) || true
	for srt in $srtfiles; do
		IFS='|'
		set $srt
		epochfile=$1
		srtfile=$2
		sf_dir=$(dirname "$srtfile")
		sf_bn=$(basename "$srtfile")
		if grep -q "$srt" "$handled" ||  grep -q "$srt" "$handled_now"; then
			true
	        	echo "Already handled : $srt"
	        else
			## we handle only files older then 10minutes. So all other subs can be downloaded for the video
			curr_epoch=$(date +%s)
			if [ $(($curr_epoch - ${epochfile%%\.*})) -gt 200 ]; then
				## echo "Start handling : $sf_bn"
				movie=$(echo "$sf_bn" | perl -pe 's/(\...?.)?\.srt$//g')
				moviefile=$(find "$sf_dir" -size +300M -name "${movie}.*")
				find "$sf_dir" -name "${movie}*.srt" -printf "%T@|%p\n" >> $handled_now
				/home/jeroen/.local/bin/subsync $moviefile || true
				find "$sf_dir" -name "${movie}*.srt" -printf "%T@|%p\n" >> $handled
				## echo "end handling $movie"
			else
				echo "Not old enough : $sf_bn"
			fi
		fi
	done
rm $handled_now || true
sleep 600
done

cat /etc/systemd/system/SyncSubs.service
[Unit]
Description=Huubs Subtitle Sync Service

[Service]
User=jeroen
Group=jeroen
ExecStart=/media/data/programs/fixSubs.sh /media/data/complete

[Install]
WantedBy=multi-user.target

Illegal instruction (core dumped)

Heya, tried to run, installed pip pip2 and pip3
ubuntu18 64bit
.........................................................
python3.6 /home/riku/.local/bin/subsync 'moviename.mp4'
Illegal instruction (core dumped)
.........................................................
python /home/riku/.local/bin/subsync 'moviename.mp4'
Traceback (most recent call last):
File "/home/riku/.local/bin/subsync", line 4, in
subsync.run()
File "/home/riku/.local/lib/python2.7/site-packages/subsync/main.py", line 32, in run
from .media import Media
File "/home/riku/.local/lib/python2.7/site-packages/subsync/media.py", line 16, in
from .ffmpeg import Transcode
File "/home/riku/.local/lib/python2.7/site-packages/subsync/ffmpeg.py", line 9, in
from subprocess import DEVNULL, STDOUT, PIPE
ImportError: cannot import name DEVNULL

im not sure what is happening here.. other is python3 and other is 2. there is no info what python version should use so i tried both.

Subtitle syncing for other language than being spoken?

First of all, thanks for the complete write-up. Very interesting, for someone with very basic ML knowledge it was very interesting to learn of the additional metrics for log-loss etc.

Anyway, I was trying to automate subtitle retrieval and syncing for my own native language. Often I can get subtitles in my own language, but they do not line up at all. Is it possible to sync subtitles in another language, than what is spoken in the media file?

Also, does this take into account that sometimes subtitles have advertisements in their subtitles, e.g. feeding false information?

Edit: perhaps it's a good idea to create a wrapper that uses both https://github.com/Diaoul/subliminal to download and subsync to align each subtitle. That would be a killer solution. AFAIK there is no proper subtitle solution available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.