Code Monkey home page Code Monkey logo

deepbacksub's Introduction

DeepBackSub

Virtual Video Device for Background Replacement with Deep Semantic Segmentation

Screenshots with my stupid grinning face (Credits for the nice backgrounds to Mary Sabell and PhotoFunia)

In these modern times where everyone is sitting at home and skype-ing/zoom-ing/webrtc-ing all the time, I was a bit annoyed about always showing my messy home office to the world. Skype has a "blur background" feature, but that's also getting boring after a while. Zoom has something similar built-in, but I'm not touching that software with a bargepole. So I decided to look into how to roll my own implementation without being dependent on any particular video conferencing software to support this.

This whole shebang involves three main steps with varying difficulty:

  • find person in video (hard)
  • replace background (easy)
  • pipe data to virtual video device (medium)

Finding person in video

Attempt 0: Depth camera (Intel Realsense)

I've been working a lot with depth cameras previously, also for background segmentation (see SurfaceStreams), so I just grabbed a leftover RealSense camera from the lab and gave it a shot. However, the depth data in a cluttered office environment is quite noisy, and no matter how I tweaked the camera settings, it could not produce any depth data for my hair...? I looked like a medieval monk who had the top of his head chopped off, so ... next.

Attempt 1: OpenCV BackgroundSubtractor

See https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html for tutorial. Should work OK for mostly static backgrounds and small moving objects, but does not work for a mostly static person in front of a static background. Next.

Attempt 2: OpenCV Face Detector

See https://docs.opencv.org/3.4/db/d28/tutorial_cascade_classifier.html for tutorial. Works okay-ish, but obviously only detects the face, and not the rest of the person. Also, only roughly matches an ellipse which is looking rater weird in the end. Next.

Attempt 3: Deep learning!

I've heard good things about this deep learning stuff, so let's try that. I first had to find my way through a pile of frameworks (Keras, Tensorflow, PyTorch, etc.), but after I found a ready-made model for semantic segmentation based on Tensorflow Lite (DeepLab v3+), I settled on that.

I had a look at the corresponding Python example, C++ example, and Android example, and based on those, I first cobbled together a Python demo. That was running at about 2.5 FPS, which is really excruciatingly slow, so I built a C++ version which manages 10 FPS without too much hand optimization. Good enough.

Replace Background

This is basically one line of code with OpenCV: bg.copyTo(raw,mask); Told you that's the easy part.

Virtual Video Device

I'm using v4l2loopback to pipe the data from my userspace tool into any software that can open a V4L2 device. This isn't too hard because of the nice examples, but there are some catches, most notably color space. It took quite some trial and error to find a common pixel format that's accepted by Firefox, Skype, and guvcview, and that is YUYV. Nicely enough, my webcam can output YUYV directly as raw data, so that does save me some colorspace conversions.

End Result

The dataflow through the whole program is roughly as follows:

  • init
    • load background.png, convert to YUYV
    • load DeepLab v3+ network, initialize TFLite
    • setup V4L2 Loopback device (w,h,YUYV)
  • loop
    • grab raw YUYV image from camera
    • extract square ROI in center
      • downscale ROI to 257 x 257 (*)
      • convert to RGB (*)
      • run DeepLab v3+
      • convert result to binary mask for class "person"
      • denoise mask using erode/dilate
    • upscale mask to raw image size
    • copy background over raw image with mask (see above)
    • write() data to virtual video device

(*) these are required input parameters for DeepLab v3+

Requirements

Tested with the following dependencies:

  • Ubuntu 18.04.5, x86-64
  • Linux kernel 4.15 (stock package)
  • OpenCV 3.2.0 (stock package)
  • V4L2-Loopback 0.10.0 (stock package)
  • Tensorflow Lite 2.1.0 (from repo)
    • Ultra-short build guide for Tensorflow Lite C++ library: clone repo above, then...
      • run ./tensorflow/lite/tools/make/download_dependencies.sh
      • run ./tensorflow/lite/tools/make/build_lib.sh

Tested with the following software:

  • Firefox 74.0.1 (works)
  • Skype 8.58.0.93 (works)
  • guvcview 2.0.5 (works)
  • Chrome 80.0.3987.87 (b0rks, might be an issue with v4l2loopback)

Limitations/Extensions

As usual: pull requests welcome.

  • The project name isn't catchy enough. Help me find a nice backronym.
  • Resolution is currently hardcoded to 640x480 (lowest common denominator).
  • Background image size needs to match camera resolution.
  • Only works with Linux, because that's what I use.
  • Needs a webcam that can produce raw YUYV data (but extending to the common YUV420 format should be trivial)
  • CPU hog: maxes out two cores on my 2.7 GHz i5 machine for just VGA @ 10 FPS.
  • Uses stock Deeplab v3+ network. Maybe re-training with only "person" and "background" classes could improve performance?

Fixed

  • Should probably do a erosion (+ dilation?) operation on the mask.

Other links

Firefox preferred formats: https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_capture/linux/video_capture_linux.cc#142-159

deepbacksub's People

Contributors

floe avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.