2025年6月26日 星期四
一种提升攻击能力的降信息对抗样本方法
An Adversarial Example Method Based on Dropping Information to Improve Attack Capability
摘要

当前, 深度神经网络在多个研究领域中得到广泛应用。但随着对人工智能研究的深入, 研究者发现基于深度神经网络的人工智能技术在带来便利的同时也存在安全隐患。例如, 攻击者使用对抗样本方法在干净图像中添加细微扰动, 可导致图像分类模型输出错误的结果。相较于以往在图像上添加额外信息产生对抗样本的方法, Duan Ranjie等人近期提出了AdvDrop算法。该算法通过调整量化步长来删除图像的现有信息从而生成对抗样本, 但AdvDrop在量化过程中并未考虑量化表不同的梯度数值对于对抗效果的不同影响。对此, 本文提出AdvDrop+, 即在每次迭代过程中根据量化表的梯度数值来更新量化表。梯度数值被一个缩放因子所缩放。为了找到合适的缩放因子, AdvDrop+在梯度直方图中找到最高频率的梯度值, 并计算其对数, 最终得到的结果即为缩放因子。实验表明, 在目标攻击的设置下, 图像失真几近相同, AdvDrop+有着比AdvDrop更好的攻击性能。同时, AdvDrop+保留了AdvDrop能够降信息的特点。

Abstract

At present, deep neural network has been widely used in many research fields. However, with the in-depth research on artificial intelligence research, it is found that artificial intelligence technology based on deep neural networks bringsconvenience coming with potential security risks. For example, an attacker may misguide the image classification model to output a wrong result with high confidence by adding slight perturbations to a clean image via the adversarial example method. Compared with the previous methods of adding additional information to the images to generate adversarial examples, Ranjie Duan et al. proposed the AdvDrop algorithm. In this algorithm, adversarial examples are generated by deleting the existing information of the images, which is realized by adjusting the quantization step. However, in the quantization process, the AdvDrop algorithm does not consider the different effects of different gradient values of the quantization table on the adversarial effect. In this regard, AdvDrop+ is proposed, that is, in each iteration, the quantization tables are updated according to the gradient numerical values scaled with a factor. To find the proper scaling factor, we find the gradient value of the highest frequency in the gradient histogram and compute its logarithm and the final result is the scaling factor. The experiments show that AdvDrop+ has better attack performance than AdvDrop under the setting of target attack with nearly the same image distortion. At the same time, AdvDrop+ retains the characteristic of AdvDrop, which can drop information.  

DOI10.48014/ccsr.20231107001
文章类型研究性论文
收稿日期2023-11-07
接收日期2023-11-23
出版日期2024-06-28
关键词神经网络, 对抗样本, 图像分类, 降信息
KeywordsNeural networks, adversarial example, image example, dropping information
作者李施旻*, 张用明
AuthorLI Shimin*, ZHANG Yongming
所在单位南京航空航天大学计算机科学与技术学院, 南京211100
CompanyCollege of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China
浏览量439
下载量245
参考文献[1] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[2] Zeng M, Wang Y, Luo Y. Dirichlet latent variable hierarchical recurrent encoder-decoder in dialogue generation [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing( EMNLP-IJCNLP). 2019: 1267-1272.
https://doi.org/10.18653/v1/D19-1124.
[3] Spencer M, Eickholt J, Cheng J. A Deep Learning Network Approach to ab Initio Protein Secondary Structure Prediction[J]. IEEE/ACM transactions on computational biology and bioinformatics, 2014, 12(1): 103-112.
https://doi.org/10.1109/TCBB.2014.2343960.
[4] Liao Y, Vakanski A, Xian M. A Deep Learning Framework for Assessing Physical Rehabilitation Exercises[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2020, 28(2): 468-477.
https://doi.org/10.1109/TNSRE.2020.2966249.
[5] Dahl G E, Yu D, Deng L, et al. Context-dependent Pretrained Deep Neural Networks for Large-vocabulary Speech Recognition[J]. IEEE Transactions on audio, speech, and language processing, 2011, 20(1): 30-42.
https://doi.org/10.1109/TASL.2011.2134090.
[6] Chan W, Jaitly N, Le Q, et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition[C]//2016 IEEE international conference on acoustics, speech and signal processing(ICASSP). IEEE, 2016: 4960-4964.
https://doi.org/10.1109/ICASSP.2016.7472621.
[7] Shokri R, Stronati M, Song C, et al. Membership inference attacks against machine learning models[C]//2017 IEEE symposium on security and privacy(SP). IEEE, 2017: 3-18.
https://doi.org/10.1109/SP.2017.41.
[8] Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[J]. arXiv e-prints, 2013: arXiv: 1312. 6199.
https://doi.org/10.48550/arXiv.1312.6199.
[9] Bojarski M, Del Testa D, Dworakowski D, et al. End-toend learning for self-driving cars[J]. arXiv e-prints, 2016: arXiv: 1604. 07316.
https://doi.org/10.48550/arXiv.1604.07316.
[10] Lopes A T, De Aguiar E, De Souza A F, et al. Facial expression recognition with convolutional neural networks: coping with few data and the training sample order[J]. Pattern recognition, 2017, 61(0031-3203): 610-628.
https://doi.org/10.1016/j.patcog.2016.07.026.t.
[11] Grosse K, Papernot N, Manoharan P, et al. Adversarial examples for malware detection[C]//European symposium on research in computer security. Springer, Cham, 2017: 62-79.
https://doi.org/10.1007/978-3-319-66399-9_4.
[12] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: learning augmentation strategies from data[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 113-123.
[13] 潘文雯, 王新宇, 宋明黎, 等. 对抗样本生成技术综述 [J]. 软件学报, 2020, 31(1): 67-81.
https://doi.org/10.13328/j.cnki.jos.005884.
[14] Khan M E, Khan F. A comparative study of white box, black box and grey box testing techniques[J]. International Journal of Advanced Computer Science and Applications, 2012, 3(6): 1-15.
https://doi.org/10.14569/IJACSA.2012.030603.
[15] Papernot N, McDaniel P, Goodfellow I, et al. Practical black-box attacks against machine learning[C]//Proc of the 2017 ACM on Asia conference on computer and communi-cations security. 2017: 506-519.
https://doi.org/10.1145/3052973.3053009.
[16] Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[J]. arXiv e-prints, 2014: arXiv: 1412. 6572.
https://doi.org/10.48550/arXiv.1412.6572.
[17] Fawzi A, Fawzi O, Frossard P. Fundamental limits on adversarial robustness[C]//Proc. ICML, Workshop on Deep Learning. 2015: 55.
[18] Gilmer J, Metz L, Faghri F, et al. Adversarial spheres[J]. arXiv e-prints, 2018: arXiv: 1801. 02774.
https://doi.org/10.48550/arXiv.1801.02774.
[19] Schmidt L, Santurkar S, Tsipras D, et al. Adversarially robust generalization requires more data[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018: 5019-5031.
[20] Ilyas A, Santurkar S, Tsipras D, et al. Adversarial examples are not bugs, they are features[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 125-136.
[21] Duan R, Chen Y, Niu D, et al. AdvDrop: Adversarial attack to DNNs by dropping information[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 7506-7515.
[22] Kurakin A, Goodfellow I J, Bengio S. Adversarial examples in the physical world[M]//Artificial Intelligence Safety and Security. Chapman and Hall/CRC, 2018: 99-112.
[23] Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv preprint arXiv: 1706. 06083, 2017.
https://doi.org/10.48550/arXiv.1706.06083.
[24] Papernot N, McDaniel P, Jha S, et al. The limitations of deep learning in adversarial settings[C]//2016 IEEE European symposium on security and privacy(EuroS&P). IEEE, 2016: 372-387.
https://doi.org/10.1109/EuroSP.2016.36.
[25] Moosavi-Dezfooli S M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2574-2582.
[26] Carlini N, Wagner D. Towards evaluating the robustness of neural networks[C]//2017 IEEE symposium on security and privacy(SP). IEEE, 2017: 39-57.
https://doi.org/10.1109/SP.2017.49.
引用本文李施旻, 张用明. 一种提升攻击能力的降信息对抗样本方法[J]. 中国计算机科学评论, 2024, 2(2): 14-23.
CitationLI Shimin and ZHANG Yongming. An adversarial example method based on dropping information to improve attack capability[J]. Chinese Computer Sciences Review, 2024, 2(2): 14-23.