Code Monkey home page Code Monkey logo

rimtouny / video-deepfake-detection-masters-graduation-project Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 32 KB

Undertook a comprehensive exploration of fake and real video datasets, employing advanced techniques in face detection, data preprocessing, and the creation of structured training, validation, and testing sets..This project holds significance as it served as the culmination of my Master's degree in Ottawa in 2023.

License: MIT License

video-deepfake-detection-masters-graduation-project's Introduction

Video Deepfake Detection Master's Graduation Project

Undertook a comprehensive exploration of fake and real video datasets, employing advanced techniques in face detection, data preprocessing, and the creation of structured training, validation, and testing sets.This project holds significance as it served as the culmination of my Master's degree in Ottawa in 2023.

  • Google Colab Pro+: Ensure you have access to Colab Pro+ for enhanced features.
  • Required libraries: scikit-learn, pandas, matplotlib.
  • Execute cells in a Jupyter Notebook environment.
  • Processing power needed (GPU).
  • Libraries:
    • OpenCV: Version installed: 4.8.1.78
    • NumPy: Version 1.24.3
    • cvlib
    • Matplotlib: Version 3.7.2
    • TensorFlow: Version: 2.15.0
    • Kears: Version 2.15.0
    • scikit-learn: Version: 1.3.0

Binary classification: Detect and classify Deepfake videos: Real or Fake.

Design Overview

The deepfake detection system utilizes a multi-stage approach involving data preprocessing, feature extraction, deep learning-based classification, and a user-friendly web interface. It employs state-of-the-art algorithms to distinguish between authentic and manipulated videos, addressing the challenge of deepfake proliferation.

Dataset Description:

  • Dataset Overview:
    • Compilation of 2000 videos.
    • Each video duration ranges between 8 to 13 seconds.
  • Data Sources:
    • Combination of sponsor-contributed data (under confidentiality agreements).
    • Internally generated data using the ROOP Face Swap technique.
  • Dataset Composition:
    • Balanced composition with 1000 authentic videos.
    • Includes 1000 deep fake simulations.
  • Realism and Applicability:
    • Diverse subjects featured in the videos.
    • Encompasses both celebrities and ordinary individuals.
    • Enhances realism and ensures broader applicability of the dataset.

Key Tasks Undertaken

  1. Data Collection

    • Mixture of sponsor-contributed data data and ROOP Face Swap technique-generated data.
  2. Data Exploration

    • Assess real and fake video counts.
          Number of Fake Videos: 1000
          Number of Real Videos: 1000
  3. Video Processing

    • Extract frames at 1 Frame per 1s, 2s, 4s intervals.
      • 1 Frame per 1s
          Capture one frame every 1 seconds
          Total number of videos: 1999
          Total number of frames: 16370
          Average frames per video: 8.189094547273637
      • 1 Frame per 2s
            Capture one frame every 2 seconds
            Total number of videos: 1999
            Total number of frames: 7965
            Average frames per video: 3.9844922461230614
      • 1 Frame per 4s
            Capture one frame every 4 seconds
            Total number of videos: 1999
            Total number of frames: 3258
            Average frames per video: 1.629814907453727
    • Resize frames to 128x128 pixels.
    • Store metadata: Video ID, Frame ID, Video Label.
    • Face detection using cvlib, resizing images to 300x300, and drawing rectangles around faces.
  4. Data Preprocessing

    • Normalize pixel values to [0, 1].
    • Label encoding for fake/real using LabelEncoder.
    • Split data into 80% training, 10% validation, 10% testing (1 Frame per 1s, 2s, 4s intervals).
  5. Data Preparation

    • Convert Normalized Frame data and Labels columns to TensorFlow tensors.
  6. Model Creation and Training

    • ResNet50, InceptionResNetV2, MobileNetV2, VGG16 models pre-trained on ImageNet.
    • Transfer learning with specific architectures(custom Layers)
        x = GlobalAveragePooling2D()(resnet_model.output)
        x = Dense(512, activation='relu')(x)
        x = Dropout(0.5)(x) 
        x = Dense(2, activation='softmax')(x)
    • Model compilation:
      • ResNet50 Model
          custom_optimizer = Adam(learning_rate=0.0001)   
          model.compile(optimizer=custom_optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
      • InceptionResNetV2 Model
          lr_schedule = ExponentialDecay(initial_learning_rate, decay_steps=100000, decay_rate=0.96, staircase=True)
          optimizer = Adam(learning_rate=lr_schedule)
          model.compile(optimizer=custom_optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
      • MobileNetV2 Model
           sgd = SGD(lr=0.0001)  # Stochastic Gradient Descent optimizer with a specific learning rate
           vgg_model_transfer.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])  # Compile the model
    • Training details: epochs, batch size, early stopping.
         epochs=100
         batch_size=32
         learning rate= 0.00001
         early_stopping = EarlyStopping(monitor='val_loss', patience=7, restore_best_weights=True)
  7. Evaluation and Result Analysis

    • Confusion matrix for video label determination: Calculated based on a specific threshold for determining the video label (REAL or FAKE) from the predicted frames.

      • Prediction Threshold:

        • The evaluation process primarily relies on counting the occurrence of the Real Label among the predicted frames for each video.
        • If over 70% of the frames are predicted as REAL, the video is categorized as REAL; otherwise, it is classified as FAKE.
      • Categorization of Videos:

        • For each video, the model predicts a label based on the majority class of the frames. If more than 70% of the frames are predicted as REAL, the video is categorized as REAL; otherwise, it is categorized as FAKE.
      • Comparison with Actual Labels:

        • Subsequently, the predicted label obtained from the majority prediction of frames for each video is compared with the actual video label.

    Here is an example illustrating our evaluation process on the ResNetV2 model using the Test Set in one frame per 1 sec. The green column (Actual Label) contains the known actual labels of each video, while the red column (Model Decision) is derived from the two blue columns (Predicted Fake Count, Predicted Real Count).
    • F1 score calculation based on video-level predictions and labels.
    • Learning and loss curve analysis based on video-level predictions and labels.
    • Model comparison to identify champion model.

      + 1-Second Superiority Selecting a One frame per 1-second duration for video processing is recommended due to its consistent high training and validation accuracy across different models (ResNetV2, InceptionResNetV2, MobileNetV2, VGG-16).This duration strikes a balance between capturing essential temporal information, ensuring better generalization, and reducing computational load for improved efficiency in training and inference. ```python ```
  8. Cross-Validation

    • Model selection based on Model Comparison results: MobileNetV2 was selected for cross-validation.
    • 5-fold split maintaining class distribution.
    • For each fold, the model is trained on a subset of the data and evaluated on the validation set and calculate accuracy, F1 score and trained models and training history for each fold

  9. Soft Voting

    • Predict probabilities for top 3 models:ResNet50, InceptionResNetV2 and MobileNetV2 models.
    • Apply threshold of 0.5 for binary predictions.

  10. Hyperparameters Tuning: on Chapion Model MobileNetV2 Model

    • Different Learning Rates: with batch size 32 and early stop after 5 epochs.

    • Different Batch Sizes: with Learning Rate 10^(-4) and early stop after 5 epochs.

    • Different number of epochs in early stop: with Learning Rate 10^(-4) and batch size =32.

  11. Overall Comparison and The Superior Model - Save the superior model for further development.

  12. Deployment Phase - Implement deployment using Flask.

video-deepfake-detection-masters-graduation-project's People

Contributors

rimtouny avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.