Code Monkey home page Code Monkey logo

gesture-based-media-control's Introduction

Gesture-Based Media Control System

Overview

This project enables media control using custom gestures. By leveraging hand landmark detection technology, users can interact with media playback and other system functionalities using hand movements. This README provides information on how to set up, configure, and use the gesture-based media control system.

Hand Landmarks

Table of Contents

Usage

  1. To use gesture media controller, simply run the execution.py file.

  2. Following are the present functionalities added.

    • Palm: Pause/Play
    • Thumbs-up: Volume up
    • Thumbs-down: Volume down
    • Thumbs-left (neutral): Mute/unmute
    • V-up (2 fingers): Brightness up
    • V-down: Brightness down
    • Point-left: Previous track
    • Thumb-right: Next track
  3. The fist is a neutral gesture and you can use that as an intermediate between two consecutive pause/play or other toggle functions.

Method

Data Collection

The project makes use of OpenCV to access the webcam of the device. A python script (collect_images.py) is used to capture 500 images of each class at sample rate of 20 images per second into a data folder. Another script (extract_marks.py) is used to extract the coordinates of each landmark of the hand detected using Medipipe Hand-Landmarker and save the list of data-points into a pickle format.

Hand Landmarks
Taken from [1]

Training

During training, the pickled data-points are loaded. Cross-entropy Loss is used to calculated the loss ie. deviation from actual labels. Adam optimizer is then used to update the weights with momentum using the gradients calculated by back-propagation.

Neural Network Archicture

The neural network has a sub-module for each of the fingers and the palm (which includes base of each finger) ie. six in total. The outputs from these sub-modules are then concatenated and sent further to fully connected layers. The last layer (self.fc3 of class Gesture) has as many outputs as many predictions we need (10 in out case).

Hand Landmarks

Deployment

During the execution phase, individual images are read through the webcam and passed to the Mediapipe Hand-Landmarker which returns the x,y,z coordinates of each landmark in a detected hand (labelled 0-20). The coordiantes are passed to the neural network model. The softmax of the model prediction is passed to the control function. The function accepts an integer-key for which a corresponding action is taken place. Corresponding to each int-key there is a dictionay storing its associated Virual Key code. The key press is then simulated using pywin32.

Installation

  1. Clone the Repository:
    git clone https://github.com/mz-hassan/gesture-based-media-control.git
    
  2. Install required packages from the requirements.txt file
    pip install -r requirements.txt
    

Create custom Gestures and commands

  1. Use the collect_images.py script to scan your own gestures. Set the number of classes and dataset size as per your requirements.

  2. Use extract_marks.py to detect and save the marks from the data created as data.pickle.

  3. Train the model using train.py. Adjust the parameters as per your requiremets and save the weights.

  4. Access the weights in execution.py and run it. You can uncomment out imshow if you prefer to see the live capture.

  5. To create custom commands, you can simulate any button press by specifying the Virtual Key Code Associated with it. Simply add the key code next to the associated numerical key in the gestures dictionary.

References

[1] https://developers.google.com/mediapipe/solutions/vision/hand_landmarker

gesture-based-media-control's People

Contributors

mz-hassan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.