2025年6月26日 星期四
深度换脸:伪造及检测方法综述
Deepfake:A Comprehensive Survey of Generation and Detection Methods
摘要

在生成对抗网络 (GANs) 和扩散模型 (Diffusion Models) 等生成式模型的推动下, 人脸深度伪造 (Deepfake) 技术取得了显著进展。其中, 深度换脸作为Deepfake中应用最为普遍且最具影响的研究方向之一, 引起了人们的广泛关注。一方面, 该技术在日常娱乐、电影制作等多元领域中催生了新的创新可能性, 推动了相关行业的发展。另一方面, 其快速的技术演进也对个人隐私保护、社会稳定乃至国家安全构成了日益严峻的挑战。在此背景下, 研发高效且可靠的人脸深度伪造检测技术成为了应对这一复杂威胁的关键策略。首先, 文章综述了基于深度学习的换脸方法, 从生成方式和引导方式两个角度对换脸方法进行了归类分析总结。其次, 文章从图像空域和图像频域两个关键角度系统地概述了面向人脸图像的深度伪造检测技术。进一步地, 文章从帧内图像伪造特征、帧间时空特征融合和多模态信息融合等角度对面向视频的人脸深度伪造检测技术进行了分类整理。最后, 文章总结了深度换脸和检测算法在应对技术问题和隐私安全等方面的一系列挑战, 同时对未来的发展方向进行了探讨。

Abstract

Under the impetus of generative models such as Generative Adversarial Networks (GANs) and Diffusion Models, facial deepfake technology has made significant progress. Among these, deep face swapping, as one of the most widespread and impactful research directions in the realm of Deepfake, has captured widespread attention. With its application in diverse areas such as daily entertainment and film production, this technology has catalyzed new innovative possibilities, thereby propelling the advancement of related industries. On the other hand, its rapid evolution poses an increasingly formidable challenge to personal privacy protection, societal stability, and even national security. Against this backdrop, the development of efficient and reliable facial deepfake detection technology has become a critical strategy to counteract this complex threat. Firstly, this article provides an overview of face-swapping methods based on deep learning, categorizing and summarizing them from the perspectives of generation and guidance. Subsequently, the article systematically outlines facial deepfake detection techniques focusing on image spatial and frequency domains. Furthermore, it categorizes and organizes video-oriented facial deepfake detection technology from the perspectives of intra-frame. image forgery features, inter-frame. spatiotemporal feature fusion, and multimodal information fusion. In conclusion, the article summarizes the array of challenges faced by deep face swapping and detection algorithms in addressing technological issues and privacy security. Simultaneously, it delves into the exploration of future directions for development.  

DOI10.48014/ccsr.20240102002
文章类型综 述
收稿日期2024-01-02
接收日期2024-04-20
出版日期2024-09-28
关键词深度伪造, 深度换脸, 多媒体取证, 深度学习
KeywordsDeepfake, deep face-swapping, multimedia forensics, deep learning
作者牛源晨, 李元满*, 李斌, 李霞
AuthorNIU Yuanchen, LI Yuanman*, LI Bin, LI Xia
所在单位深圳大学, 深圳 518060
CompanyShenzhen University, Shenzhen 518060, China
浏览量659
下载量197
参考文献[1] Korshunova I, Shi W, Dambre J, et al. Fast face-swap using convolutional neural networks[C]. In Proceedings of the IEEE international conference on computer vision. 2017: 3677-3685.
[2] Deepfakes. 2019.
https://github.com/deepfakes/faceswap
[3] Liu K, Perov I, Gao D, et al. Deepfacelab: Integrated, flexible and extensible face-swapping framework[J]. Pattern Recognition. 2023, 141: 1-12.
https://doi.org/10.1016/j.patcog.2023.109628
[4] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[J]. In Proceedings of the Advances in neural information processing systems, 2014, 27: 139-144.
https://doi.org/10.1145/3422622
[5] Mirza M, Osindero S. Conditional generative adversarial nets[J]. arXiv preprint arXiv: 1411. 1784, 2014.
[6] Li L, Bao J, Yang H, et al. Faceshifter: Towards high fidelity and occlusion aware face swapping[J]. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 5073-5082.
[7] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. In Proceedings of the Advances in neural information processing systems, 2020, 33: 6840-6851.
[8] Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models[C]. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 10684-10695.
https://doi.org/10.1109/CVPR52688.2022.01042
[9] Zhao W, Rao Y, Shi W, et al. DiffSwap: High-Fidel-ity and Controllable Face Swapping via 3D-Aware Masked Diffusion[C]. In Proceedings of the IEEE/C-VF Conference on Computer Vision and Pattern Rec-ognition. 2023: 8568-8577.
https://doi.org/10.1109/CVPR52729.2023.00828
[10] Agarwal S, Farid H, Gu Y, et al. Protecting World Leaders Against Deep Fakes[C]. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019: 38-45.
https://doi.org/10.1073/pnas.2216035119
[11] Yang K, Chen K, Guo D, et al. Face2Face ρ: Real-Time High-Resolution One-Shot Face Reenactment[C]. In Proceedings of the European conference on computer vision. 2022: 55-71.
[12] Blanz V, Vetter T. A morphable model for the synthesis of 3D faces[C]. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques. 1999: 187-194.
https://doi.org/10.1145/311535.311556
[13] Tuan Tran A, Hassner T, Masi I, et al. Regressing robust and discriminative 3D morphable models with a very deep neural network[C]. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5163-5172.
https://doi.org/10.1109/CVPR.2017.163
[14] Blanz V, Scherbaum K, Vetter T, et al. Exchanging faces in images[C]. Computer Graphics Forum, 2004, 23(3): 669-676.
https://doi.org/10.1111/j.1467-8659.2004.00799.x
[15] Nirkin Y, Masi I, Tuan A T, et al. On face segmentation, face swapping, and face perception[C]. IEEE International Conference on Automatic Face & Gesture Recognition. 2018: 98-105.
https://doi.org/10.1109/FG.2018.00024
[16] Wang Y, Chen X, Zhu J, et al. Hififace: 3d shape and semantic prior guided high fidelity face swapping[J]. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. 2021: 1136-1142.
https://doi.org/10.24963/ijcai.2021/157
[17] Li J, Li Z, Cao J, et al. Faceinpainter: High fidelity face adaptation to heterogeneous domains[C]. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 5089-5098.
https://doi.org/10.1109/CVPR46437.2021.00505
[18] Zhao W, Rao Y, Shi W, et al. DiffSwap: High-Fidel-ity and Controllable Face Swapping via 3D-Aware Masked Diffusion[C]. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 8568-8577.
https://doi.org/10.1109/CVPR52729.2023.00828
[19] Li Y, Ma C, Yan Y, et al. 3D-Aware Face Swapping [C]. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 12705-12714.
https://doi.org/10.1109/CVPR52729.2023.01222
[20] He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners[C]. In Proceedings of the IEEE/CVF conference on computer vision and pattern re-cognition. 2022: 16000-16009.
https://doi.org/10.1109/CVPR52688.2022.01553
[21] Zeng H, Zhang W, Fan C, et al. Flowface: Semantic flow-guided shape-aware face swapping[C]. In Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(3): 3367-3375.
https://doi.org/10.1609/aaai.v37i3.25444
[22] Arjovsky M, Chintala S, Bottou L. Wasserstein gene-rative adversarial networks[C]. In Proceedings of the International conference on machine learning. 2017: 214-223.
https://doi.org/10.1145/3625820
[23] Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier gans[C]. In Proceedings of the International conference on machine learning. 2017: 2642-2651.
[24] Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks[C]. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 4401-4410.
https://doi.org/10.1109/TPAMI.2020.2970919
[25] Isola P, Zhu J Y, Zhou T, et al. Image-to-image translation with conditional adversarial networks[C]. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1125-1134.
https://doi.org/10.1109/CVPR.2017.632
[26] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[ C]. In Proceedings of the IEEE international conference on computer vision. 2017: 2223-2232.
https://doi.org/10.1109/ICCV.2017.244
[27] Karras T, Laine S, Aittala M, et al. Analyzing and im- proving the image quality of stylegan[C]. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 8110-8119.
https://doi.org/10.1109/CVPR42600.2020.00813
[28] Karras T, Aittala M, Laine S, et al. Alias-free generative adversarial networks[J]. In Proceedings of the Advances in Neural Information Processing Systems. 2021, 34: 852-863.
[29] Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network [C]. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4681-4690.
https://doi.org/10.1109/CVPR.2017.19
[30] Bulat A, Yang J, Tzimiropoulos G. To learn image super- resolution, use a gan to learn how to do image degradation first[C]. In Proceedings of the European conference on computer vision. 2018: 185-200.
[31] Nirkin Y, Keller Y, Hassner T. Fsgan: Subject agnostic face swapping and reenactment[C]. In Proceedings of the IEEE/CVF international conference on computer vision. 2019: 7184-7193.
https://doi.org/10.1109/ICCV.2019.00728
[32] Nirkin Y, Keller Y, Hassner T. FSGANv2: Improved subject agnostic face swapping and reenactment[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 560-575.
https://doi.org/10.1109/TPAMI.2022.3155571
[33] Chen R, Chen X, Ni B, et al. Simswap: An efficientframework for high fidelity face swapping[C]. In Proceedingsof the ACM International Conference on Multimedia. 2020: 2003-2011.
https://doi.org/10.1145/3394171.3413630
[34] Chen X, Ni B, Liu Y, et al. SimSwap++: Towards Fasterand High-Quality Identity Swapping[J]. IEEETransactions on Pattern Analysis and Machine Intelligence, 2023, 46: 576-592.
https://doi.org/10.1109/TPAMI.2023.3307156
[35] Choi J, Kim S, Jeong Y, et al. Ilvr: Conditioning methodfor denoising diffusion probabilistic models[J]. In Proceedingsof the IEEE/CVF International Conference onComputer Vision. 2021: 14347-14356.
https://doi.org/10.1109/ICCV48922.2021.01410
[36] Lugmayr A, Danelljan M, Romero A, et al. Repaint: Inpaintingusing denoising diffusion probabilistic models[C]. In Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition. 2022: 11461-11471.
https://doi.org/10.1109/CVPR52688.2022.01117
[37] Meng C, Song Y, Song J, et al. Sdedit: Image synthesisand editing with stochastic differential equations[J]. InProceedings of the International Conference on LearningRepresentations. 2022.
[38] Saharia C, Chan W, Chang H, et al. Palette: Image-toimagediffusion models[C]. In Proceedings of the ACMSIGGRAPH. 2022: 1-10.
https://doi.org/10.1145/3528233.3530757
[39] Seo J, Lee G, Cho S, et al. Midms: Matching interleaveddiffusion models for exemplar-based image translation[C]. In Proceedings of the AAAI Conference on ArtificialIntelligence. 2023, 37(2): 2191-2199.
https://doi.org/10.48448/qmj8-2718
[40] Kim K, Kim Y, Cho S, et al. Diffface: Diffusion-basedface swapping with facial guidance[J]. arXiv preprintarXiv: 2212. 13344, 2022.
[41] Dhariwal P, Nichol A. Diffusion models beat gans onimage synthesis[J]. In Proceedings of the Advances inneural information processing systems. 2021, 34: 8780-8794.
[42] Deng J, Guo J, Xue N, et al. Arcface: Additive angularmargin loss for deep face recognition[C]. In Proceedingsof the IEEE/CVF conference on computer visionand pattern recognition. 2019: 4690-4699.
https://doi.org/10.1109/CVPR.2019.00482
[43] Zhu Y, Li Q, Wang J, et al. One shot face swapping onmegapixels[C]. In Proceedings of the IEEE/CVF conferenceon computer vision and pattern recognition. 2021: 4834-4844.
https://doi.org/10.1109/CVPR46437.2021.00480
[44] Xu C, Zhang J, Hua M, et al. Region-aware face swapping[C]. In Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition. 2022: 7632-7641.
https://doi.org/10.1109/CVPR52688.2022.00748
[45] Xu Y, Deng B, Wang J, et al. High-resolution faceswapping via latent semantics disentanglement[C]. InProceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition. 2022: 7642-7651.
https://doi.org/10.1109/CVPR52688.2022.00749
[46] Tan M, Le Q. Efficientnet: Rethinking model scaling forconvolutional neural networks[C]//In Proceedings of International conference on machine learning. 2019: 6105-6114.
[47] He K, Zhang X, Ren S, et al. Deep residual learnin-g forimage recognition[C]//In Proceedings of the IEEEconference on computer vision and pattern recog-nition. 2016: 770-778.
[48] Zhao H, Zhou W, Chen D, et al. Multi-attentional deepfakedetection[C]. In Proceedings of the IEEE/CVFconference on computer vision and pattern recognition. 2021: 2185-2194.
https://doi.org/10.1109/CVPR46437.2021.00222
[49] Chollet F. Xception: Deep learning with depthwise separableconvolutions[C]//In Proceedings of the IEEEconference on computer vision and pattern recognition. 2017: 1251-1258.
[50] Shiohara K, Yamasaki T. Detecting deepfakes with selfblendedimages[C]. In Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition. 2022: 18720-18729.
https://doi.org/10.1109/CVPR52688.2022.01816
[51] Huang B, Wang Z, Yang J, et al. Implicit identity drivendeepfake face swapping detection[C]//In Proceedingsof the IEEE/CVF Conference on Computer Vision andPattern Recognition. 2023: 4490-4499.
[52] X. Zhang, S. Karaman and S. -F. Chang. Detecting andSimulating Artifacts in GAN Fake Images[C]//InProceedings of the IEEE International Workshop onInformation Forensics and Security. 2019: 1-6.
[53] Dzanic T, Shah K, Witherden F. Fourier spectrum discrepanciesin deep network generated images[J]. InProceedings of the Advances in neural information processingsystems. 2020, 33: 3022-3032.
https://doi.org/10.5555/3495724.3495978
[54] Giudice O, Guarnera L, Battiato S. Fighting deepfakesby detecting gan dct anomalies[J]. Journal of Imaging. 2021, 7(8): 128.
https://doi.org/10.3390/jimaging7080128
[55] Qian Y, Yin G, Sheng L, et al. Thinking in frequency: Face forgery detection by mining frequency-aware clues[C]. In Proceedings of the European conference oncomputer vision. 2020: 86-103.
[56] Li J, Xie H, Li J, et al. Frequency-aware discriminativefeature learning supervised by single-center loss forface forgery detection[C]. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 6458-6467
https://doi.org/10.1109/CVPR46437.2021.00639
[57] Rossler A, Cozzolino D, Verdoliva L, et al. Faceforensics++: Learning to detect manipulated facial images[C]. In Proceedings of the IEEE/CVF internationalconference on computer vision. 2019: 1-11.
[58] Li Y, Lyu S. Exposing deepfake videos by detecting facewarping artifacts[J]. arXiv preprint arXiv: 1811. 00656, 2018.
[59] Dong X, Bao J, Chen D, et al. Protecting celebrities fromdeepfake with identity consistency transformer[C]. InProceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition. 2022: 9468-9478.
https://doi.org/10.1109/CVPR52688.2022.00925
[60] Matern F, Riess C, Stamminger M. Exploiting visual artifactsto expose deepfakes and face manipulations[C]. IEEE Winter Applications of Computer Vision Workshops. 2019: 83-92.
https://doi.org/10.1109/WACVW.2019.00020
[61] Li L, Bao J, Zhang T, et al. Face x-ray for more generalface forgery detection[C]. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 5001-5010.
https://doi.org/10.1109/CVPR42600.2020.00505
[62] Güera D, Delp E J. Deepfake video detection using recurrentneural networks[C]. IEEE international conferenceon advanced video and signal based surveillance. 2018: 1-6.
https://doi.org/10.1109/AVSS.2018.8639163
[63] Liy C M, InIctuOculi L. Exposing aicreated fake videosby detecting eye blinking[C]. In Proceedings of theIEEE International workshop on information forensicsand security. 2018: 11-13.
[64] Sabir E, Cheng J, Jaiswal A, et al. Recurrent convolutionalstrategies for face manipulation detection in videos[J]. Interfaces. 2019, 3(1): 80-87.
[65] Amerini I, Caldelli R. Exploiting prediction error inconsistenciesthrough LSTM-based classifiers to detectdeepfake videos[C]. In Proceedings of the 2020 ACMworkshop on information hiding and multimedia security. 2020: 97-102.
https://doi.org/10.1145/3369412.3395070
[66] Masi I, Killekar A, Mascarenhas R M, et al. Twobranchrecurrent network for isolating deepfakes invideos[C]. In Proceedings of the European Conference on Computer Vision. 2020: 667-684.
[67] Yu Y, Ni R, Zhao Y, et al. MSVT: Multiple SpatiotemporalViews Transformer for DeepFake Video Detection[J]. IEEE Transactions on Circuits and Systems forVideo Technology. 2023, 33(9): 4462-4471.
[68] Haliassos A, Vougioukas K, Petridis S, et al. Lips don􀆳tlie: A generalisable and robust approach to face forgerydetection[C]. In Proceedings of the IEEE/CVF conferenceon computer vision and pattern recognition. 2021: 5039-5049.
https://doi.org/10.1109/CVPR46437.2021.00500
[69] Haliassos A, Mira R, Petridis S, et al. Leveraging realtalking faces via self-supervision for robust forgery detection[C]. In Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition. 2022: 14950-14962.
[70] Yu Y, Liu X, Ni R, et al. PVASS-MDD: Predictive Visual-audio Alignment Self-supervision for MultimodalDeepfake Detection[J]. IEEE Transactions on Circuitsand Systems for Video Technology. 2023.
https://doi.org/10.1109/TCSVT.2023.330989
引用本文牛源晨, 李元满, 李斌, 等. 深度换脸: 伪造及检测方法综述[J]. 中国计算机科学评论, 2024, 2(3): 24-37.
CitationNIU Yuanchen, LI Yuanman, LI Bin, et al. Deepfake: a comprehensive survey of generation and detection methods[J]. Chinese Computer Sciences Review, 2024, 2(3): 24-37.