Code Monkey home page Code Monkey logo

subtite-add-subtitles-to-videos's People

Contributors

novacer avatar

Watchers

 avatar

subtite-add-subtitles-to-videos's Issues

Allow client to receive live stream of stdout from SubprocessExecutor

Currently, SubprocessExecutor only returns stdout or stderr when client exits. For longer running tasks, we may instead want to allow stdout to be received in a streaming fashion. Most likely, the API will be a callback which is called when new data is read from the stdout pipe.

For example, ffmpeg will intermittenly print something like

frame=7149
fps=48.44
stream_0_0_q=0.0
bitrate=N/A
total_size=762
out_time_us=0
out_time_ms=0
out_time=00:00:00.000000
dup_frames=0
drop_frames=0
speed=0x
progress=continue

So Subprocess Executor calls FFMPEG wrapper with what ever data it just read. This may data may be "incomplete", so FFMPEG needs to separately buffer it. Note each section will have last line progress=continue or progress=end. So FFMPEG buffers until it sees progress=, parses the current progress, and returns that to the application.

Note the actual ffmpeg subprocess will be blocked until we read from the buffer, so have to do this quickly. Alternatively, we can use ffmpeg option -stats_period 5 so progress is only written every 5 seconds (thus we reduce number of blocking IO operations and give us more time to process it).

Better documentation for building from source

The readme.md right now is a bit disorganized since there are build instructions for windows and linux. The instructions aren't cleanly separated. We should separate the build instructions for each environment to avoid confusion.

Implement POSIX compliant version of SubprocessExecutor

The current implementation of SubprocessExecutor is using windows system calls. Since we want to have the app support Unix-y systems we will need to write another implementation.

Namely, instead of CreateProcess() we will use the posix "sort-of-equivalent" posix_spawnp.

I suspect we will also need multithreading to be able to read from stdout and stderr separately.

Example on stackoverflow

Loading an existing subtitle in UTF-8 BOM doesn't work.

UTF-8 BOM (byte order mark) inserts a few extra bytes to the beginning of the file which tells of the encoding type. The issue is that this BOM interferes with the parsing of the file (doesn't work).

We can either

  1. Detect BOM, then tell user to convert the encoding to UTF-8 or
  2. Ignore the BOM ourselves. (But when we output the file, should we keep the BOM back?)

Enable ASAN for MSVC, also fix deadlock issue when using ASAN with GetOpenFileName

We want to be able to use ASAN on windows build for debugging. The release build will not use ASAN.

We can take inspiration from these steps to enable ASAN for msvc with bazel: https://github.com/bazelbuild/bazel/issues/12955.
Further, from here it looks like we have to add some dlls to our system path if we want to have dynamic linking

I've already tried adding this locally, but with ASAN enabled GetOpenFileName will hang forever and not allow the user to select their file. On stack overflow it's said this is a windows bug, doesn't look to be fixed yet. https://stackoverflow.com/a/69718929/17786559. But apparently the following is a workaround.

#include <windows.h>

int main() {
  SetProcessAffinityMask(GetCurrentProcess(), 1);
  // ...
}

Child process will hang if main process is not reading from it's stdout.

Read: https://devblogs.microsoft.com/oldnewthing/20110707-00/?p=10223

The subprocess launched by SubprocessExecutor will hang if the client does not call

executor.Start();
executor.WaitUntilFinished();

immediately after each other. This is because the output pipe of the child process is being directed to the subprocess executor. However, if the subprocess executor is not reading the output pipe, then the child process is literally stuck writing to the pipe. Note that WaitUntilFinished() performs reading of the pipe.

// Spin until we are unable to read stdout of child process anymore.
for (;;) {
success = ReadFile(
/* hFile= */ fields->hStdOutPipeRead,
/* lpBuffer= */ buffer,
/* nNumberOfBytesToRead= */ BUFFER_SIZE,
/* lpNumberOfBytesRead= */ &amount_read,
/* lpOverlapped= */ NULL);
if (!success || amount_read == 0) {
break;
}

This is problematic since we will not be able to perform async operations while the child process is running. For example:

SubprocessExecutor executor("ffplay /path/to/movie");
executor.Start() // start video player
cin >> subtitle_line; // allow user to input subtitle for that section
executor.WaitUntilFinished();

will cause video player to hang forever.

The solution is either:

  1. Do not perform long-running async operations (not preferred)
  2. Disable async operations when capture_stdout_ is ON. (By changing API methods)
  3. Write pipes to a file instead and retrieve that later.

A collection of ideas

Some ideas on how to improve to make using CLI faster:

  • Instead of defaulting video segment to 5s, show 10s at a time. Then user can remark where they want to cut off the dialogue. PR #29
  • If user changes duration only, then just play the end of the video instead of the full clip, so they know they got the right timing.
  • Offer abbreviations for the positions. #31
    • bottom-left = bl, middle-center = mc, top-right = tr
  • While in sub mode, offer commands such as /play so user can replay video while subbing. PR #29
  • Allow user to use /cancel while in add sub mode to exit without adding a subtitle. PR #29
  • Offer way to edit subtitle text or position on existing subtitle. PR #32
  • Offer way to set default subtitle position

Setup CI pipeline

Should require automated bazel build and test on both windows and linux before merging into master.

Buffer overflow bug when using ReadFile()

The result of buffer after running ReadFile() may not be null-terminated.
Thus the returned string of the stdout may contain arbitrary data at the end of the buffer.

ReadFile() returns the number of bytes read. Thus, buffer[amount_read] = '\0' will correctly terminate the buffer.

std::string SubprocessExecutor::WaitUntilFinished() {
DWORD amount_read;
CHAR buffer[BUFFER_SIZE];
BOOL success = FALSE;
std::ostringstream str;
// Spin until we are unable to read stdout of child process anymore.
for (;;) {
success = ReadFile(
/* hFile= */ fields->hStdOutPipeRead,
/* lpBuffer= */ buffer,
/* nNumberOfBytesToRead= */ BUFFER_SIZE,
/* lpNumberOfBytesRead= */ &amount_read,
/* lpOverlapped= */ NULL);
if (!success || amount_read == 0) {
break;
}
if (capture_stdout_) {
str << buffer;
}
}

There is one remaining issue, which if the underlying file bytes contains a null char in between the data. Then any buffered bytes after the null char will not be read by ostringstream. However, in this context we are reading from stdout of specific programs like ffmpeg, ffprobe etc. So shouldn't be an issue.

Investigate how to store API keys for Speech-to-text services.

Possibly we could just store keys on disk in plaintext, but since the APIs are paid it's better to be careful with how the keys are stored. We could use something like libsodium to encrypt the files, or just don't store any keys (user has to re-enter them everytime).

New feature to load existing subtitles

When user selects an existing .srt file for the output paths, we should attempt to load that file internally. Then, users can edit their existing subtitles instead of creating new ones each time. This will also have the added benefit that users can save their work, quit, and then come back later to complete it.

Paths wrapped around with quotes cannot be opened by ofstream.

There's a bug in cli/main.cpp where if the subtitle file path contains quotes (as inputted directly by user from cin), then we cannot open the file using ofstream. Thus we need to differentiate between when file paths actually need quotes or not.

if (FLAGS_output_subtitle_path.empty()) {
std::cout << "Please provide path to the file you want to save the subtitles:" << std::endl;
if (!std::getline(std::cin, FLAGS_output_subtitle_path)) {
LOG(ERROR) << "Unable to get output_subtitle_path!";
return 1;
}
}
// Sanity test writing to output path beforehand so we know it works.
{
// Write empty string in append mode.
std::ofstream ofs{FLAGS_output_subtitle_path, std::ios_base::app};
if (!ofs) {
LOG(ERROR) << "Unable to open file: " << FLAGS_output_subtitle_path;
return 1;
}
ofs << "";
}

There already is a helper function for adding quotes to the input video_path, since it is passed as an command line argument to ffplay.

void FixInputPath(std::string &path) {
// Any input paths with a space in them needs to be wrapped in quotes.
if (path.find(' ') != std::string::npos) {
std::ostringstream stream;
stream << '"' << path << '"';
path = stream.str();
}
}

For paths passed to ofstream, we basically need the opposite.
Particularly:

  • Path passed somewhere as a command line argument => needs quotes
  • Path used to directly open the file => no quotes.

autogen_timeline.cpp only works on windows and not linux

As title, autogen_timeline.cpp which was generated using rcc on windows unsurprisingly only works on windows.

We need to run rcc on linux to get the version which works on linux. Options:

  1. Either check in linux and windows versions of the RCC files and have bazel select the right one or
  2. Integrate rcc compiling in the current bazel toolchain. It would require modifying https://github.com/Novacer/bazel_rules_qt so I'm not sure if I want to do that yet. But this is the preferred solution.

Implement method to open ffplay video player

Using subprocess executor fixed in #10, we should add another class which wraps this to open a video player.
We can choose ffplay as the video player.

It is important that the video player be running asynchronously from the subtitler. Since subtitler can become blocked waiting for user input, this must not allow the video player to block.

Add support for ffprobe

FFProbe can be used to detect the duration of the video, whether there are audio/video/subtitle streams, and a lot of detailed information that we need.

Add option to remux subtitle with video as mkv

Currently export dialog allows subtitles to be burned into the mp4. This is fairly slow since it has to decode, add subtitles, and then re-encode the video frames. We can add an additional option to remux the subtitles with the video as mkv. This way, we copy the video and audio streams from the input mp4, and copy the srt to combine everything in an mkv. This should give O(1) processing time (not accounting for time spent copying data).

Benefits:

  • much faster than burning subtitles
  • user can later choose to disable the subtitles in their player (ex. VLC)
  • user can easily replace the subtitles also

Drawbacks:

  • not all players may support displaying subtitles from mkv.
  • if the user doesn't have the same fonts installed on their system, they may see different subtitles.

Thus we should make sure to communicate these benefits/drawbacks to the user in the gui.

Investigate use of hardware decoding.

Currently using software decoding as it is more stable.

On windows, possible to use d3d11, dxva2, or cuda for GPU accelerated decoding instead. In theory, this should give better performance especially for 4k videos, but my simple benchmarks so far show that the HW decoding implementation in https://github.com/Novacer/QtAVPlayer perform worse on any video > 1080p60fps.

Thus we should investigate why the performance is worse (does threading help, or increasing frame buffers etc?) Then hopefully integrate it with the player.

Playing twice should close the previous player

As title, using CLI to play twice should close the current player.

A few ideas:

  • Try to play again, if it throws exception then close and replay
  • Use a flag to store if the player is playing. If last command was "play" then close before playing. This is preferred since we will need a similar pattern for end. If last command was play and user ends, then close the player. If not, then just move on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.