Code Monkey home page Code Monkey logo

multimodal-deepfake's People

Contributors

rshaojimmy avatar tianxingwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

multimodal-deepfake's Issues

The text part of the dataset DGM4

Hi, I have downloaded your DGM4 dataset directly via the link, but after checking, I only found the images in the file 'manipulation' and 'origin', which is different from your dataset samples.

Missing keys problem

Hello, when I try to use bert, I need to download the model locally and then specify the path to use, but I will be prompted for Missing keys during training. Does this have any impact?
image

How much memory is needed for training

Hello,
I am trying to train but out of memory always occured. Could you tell me how much memory you used and how much time did you spend for training?

Lmac

Lmac中文本投影text_feat你们使用的是bert的最后一层,而非cls,这样的选取是因为你们做实验比较性能了吗?

Dataset link failed

Could you please re-share the connection of the data set? It's not working now

Details of comparison with uni-modal methods

Thanks for your awesome work!

I was wondering when comparing to the deepfake detection and sequence tagging methods, do you retrain the model using uni-modal data? If it is re-training, are the multi-modal-related modules such as Multi-Modal Aggregator deleted, or are all 0 data used to replace the input of another modality?

训练参数

您好,我在使用您的数据集和模型时出现了达不到您论文中提到的结果,请问您当时详细的训练参数是什么呢?

mtcnn的两个注释

image
mtcnn我看到的解释是人脸的最大边界坐标,既然这张图片是真实的,那为什么有两个坐标?我还没理解清楚这两个坐标所代表的含义,希望您能解释一下。

Suggestion: Dump the command line configs into yaml config

Dear Sir,

  • I‘m going to reproduce your work and use the pretrained best checkpoint for transfer learning, but struggling to check the config parameters among config/*.yaml, *.sh shell scripts and the parser.add_argument() in python scripts back and forth.

  • I think aggregating all these configs into yaml files is a more delightful and elegant way with more readabilty and more convenience for others to use your checkpoint.

Appreciate it.

About Text_Swap

I want to get more details on text_swap. I have found that some text of the 'orig' label and the 'text_swap' label are the same in the datasets. Can you provide a more detailed explanation of text_swap and its fake_text_pos?
here is an example:
{
"id": 683133,
"image": "DGM4/origin/guardian/0385/488.jpg",
"text": "Making a song and dance David Hasselhoff will perform a oneman show at the Edinburgh festival fringe",
"fake_cls": "orig",
"fake_image_box": [],
"fake_text_pos": [],
"mtcnn_boxes": [...]
},
{
"id": 896499,
"image": "DGM4/origin/guardian/0114/251.jpg",
"text": "Making a song and dance David Hasselhoff will perform a oneman show at the Edinburgh festival fringe",
"fake_cls": "text_swap",
"fake_image_box": [],
"fake_text_pos": [
0,
7,
8,
9,
10,
11,
13,
14,
15,
16
],
"mtcnn_boxes": [...]
}

Visualization

Hi,
I have read your paper and code and was deeply impressed. But I had some difficulty trying to reproduce your code, how do you visualize it? In order to get the same results as you show in the Visualization Results section ,how could I reproduce the code?
thanks

Error in Training Codes with Default Distributed Setting as False

I am trying out the training codes that you have provided. I am not using a distributed GPU system, here is the config I am using.
*i converted the argparse code for the distributed argument to default to False

But I have encountered this error:

Start training
Traceback (most recent call last):
  File "train.py", line 561, in <module>
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(args, config))
  File "anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "Github/MultiModal-DeepFake-root/MultiModal-DeepFake/train.py", line 416, in main_worker
    train_stats = train(args, model, train_loader, optimizer, tokenizer, epoch, warmup_steps, device, lr_scheduler, config, summary_writer)
  File "Github/MultiModal-DeepFake-root/MultiModal-DeepFake/train.py", line 141, in train
    loss_MAC, loss_BIC, loss_bbox, loss_giou, loss_TMG, loss_MLC = model(image, label, text_input, fake_image_box, fake_token_pos, alpha = alpha)
  File "/anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "Github/MultiModal-DeepFake-root/MultiModal-DeepFake/models/HAMMER.py", line 211, in forward
    self._dequeue_and_enqueue(image_feat_m, text_feat_m)
  File "anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "Github/MultiModal-DeepFake-root/MultiModal-DeepFake/models/HAMMER.py", line 363, in _dequeue_and_enqueue
    image_feats = concat_all_gather(image_feat)
  File "anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "Github/MultiModal-DeepFake-root/MultiModal-DeepFake/models/HAMMER.py", line 386, in concat_all_gather
    for _ in range(torch.distributed.get_world_size())]
  File "anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size
    return _get_group_size(group)
  File "anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size
    default_pg = _get_default_group()
  File "anaconda3/envs/DGM4/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 410, in _get_default_group
    raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Do correct me if im wrong, but I have already specified distributed to be False, why are the errors still referencing distributed codes?

Here is the change I made to argparse:
parser.add_argument('--distributed', default=False, action='store_true')

AttributeError: 'list' object has no attribute 'size'

Hi, thanks for your wonderful work!
But when I ran train.sh, I encountered an error:
error
I checked the type before and in model.forward(), I found before forward(), the type of text.input_ids is correct, i.e., torch.LongTensor, but in forward() it changes to list. Can you help me find out where the mistake is? thanks very much!

test videos

can we test our own videos in it or how to pre-process tests video to get metadata.json file

Visualization Results

I have read your paper and code and was deeply impressed. But I had some difficulty trying to reproduce your code, how do you visualize it? I used the grad-cam module and felt that it was difficult to integrate into this project.

Data download

Hi, nice work. But I always get error when directly download the dataset on Microsoft365. Is there another way to download data. Thanks a lot.

download datasets

The download link of datasets has been removed. Could you share a new one? Thank you!

train.sh not running

how to run the train.sh when using VScode it shows no command found but if i run in gitbash it shows
$ sh train.sh
Traceback (most recent call last):
File "train.py", line 18, in v$ sh train.sh
Traceback (most recent call last):
File "train.py", line 18, in
import torch.nn as nn
File "C:\Users\athen\anaconda3\envs\DGM4\lib\site-packages\torch\nn_init_.py", line 1, in
from .modules import * # noqa: F403
File "C:\Users\athen\anaconda3\envs\DGM4\lib\site-packages\torch\nn\modules_init_.py", line 1, in
from .module import Module
File "C:\Users\athen\anaconda3\envs\DGM4\lib\site-packages\torch\nn\modules\module.py", line 7, in
from ..parameter import Parameter
File "C:\Users\athen\anaconda3\envs\DGM4\lib\site-packages\torch\nn\parameter.py", line 2, in
from torch._C import _disabled_torch_function_impl
ModuleNotFoundError: No module named 'torch._C'

I'm running this normal cpu I5 processor

How to get the region of FA?

Hi, I was wondering how to get the region of FA. In the paper's Section 3.2 (Face Attribute Manipulation), the author mentioned, "we first predict ..... using GAN-based methods". Can I understand this as the author first applies an expression detector to the face to get the expression region (e.g., "smile mouth"), then employs StyleCLIP to modify the expression of the face, and finally replaces the original expression region with the modified expression region?

How many face attributes does the dataset provide (e.g., "smile to angry")? what is the expression detector? and is the process of replacing as simple as copy and paste?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.