Papers
arxiv:2403.14974

AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies

Published on Mar 22, 2024
Authors:
,
,
,
,

Abstract

A dual Transformers-based model with dynamic weight fusion improves deepfake detection performance across multiple datasets by capturing both visual and audio modalities.

With the continuous improvements of deepfake methods, forgery messages have transitioned from single-modality to multi-modal fusion, posing new challenges for existing forgery detection algorithms. In this paper, we propose AVT2-DWF, the Audio-Visual dual Transformers grounded in Dynamic Weight Fusion, which aims to amplify both intra- and cross-modal forgery cues, thereby enhancing detection capabilities. AVT2-DWF adopts a dual-stage approach to capture both spatial characteristics and temporal dynamics of facial expressions. This is achieved through a face transformer with an n-frame-wise tokenization strategy encoder and an audio transformer encoder. Subsequently, it uses multi-modal conversion with dynamic weight fusion to address the challenge of heterogeneous information fusion between audio and visual modalities. Experiments on DeepfakeTIMIT, FakeAVCeleb, and DFDC datasets indicate that AVT2-DWF achieves state-of-the-art performance intra- and cross-dataset Deepfake detection. Code is available at https://github.com/raining-dev/AVT2-DWF.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2403.14974
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.14974 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.14974 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.14974 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.