Certified Robustness of Malware Deep Learning Identification Model Based on Random Smoothing
-
摘要:面向恶意软件识别的任务中,使用随机平滑算法向所有特征中添加噪声得到的噪声样本可能会失去恶意功能. 现有的认证算法按照噪声空间分布的似然比从大到小的顺序构建认证区域,使认证的鲁棒区域小、认证准确率低. 本文提出一种基于随机平滑的恶意软件识别深度学习模型的鲁棒性认证方法,方法只向与恶意功能非必需的特征中添加离散伯努利噪声构建可认证的平滑模型,选取似然比更小的区域构建认证区域,实现更准确的鲁棒性认证. 实验表明,提出的方法在3个数据集上平均认证半径是对比方法的 4.37倍、2.67倍和2.72倍。该方法可以提供与实际鲁棒边界更紧密的认证半径,在模型鲁棒性评估方面具有较强的实用价值.Abstract:Robustness, the ability to resist uncertain disturbances, is an important index of machine learning model. The certified method based on random smoothing can certify the robustness of large and complex models. In the task of malware identification, the noise samples obtained by adding noise to all features using random smoothing algorithm may lose the malicious function. The existing certification algorithms construct the certified region according to the likelihood ratio of noise spatial distribution from large to small, causing the certified robust region small and the certified accuracy not good. So, a robust certification method was proposed based on random smoothing for malware recognition deep learning model. The method was arranged to add discrete Bernoulli noise only to the unnecessary features of malicious functions to construct a certifiable smoothing model, and to select the region with smaller likelihood ratio to construct a certified region to achieve more accurate certified robustness. Experiment results show that the average certified radius of the proposed method on three data sets is 4.37 times, 2.67 times and 2.72 times that of the comparison method. This method can provide the certified radius closer to the actual robust boundary, possessing a strong practical value in the evaluation of model robustness.
-
Key words:
- certified robustness/
- random smoothing/
- malware
-
表 1实现恶意功能的必需权限及示例
Table 1.Sensitive permission categories and examples
相关权限 权限示例 传感器相关 BODY_SENSORS 相机相关 CAMERA 通话相关 READ_CALL_LOG 短信相关 READ_SMS 音频相关 RECORD_AUDIO 日历相关 WRITE_CALENDAR 联系人相关 WRITE_CONTACTS 系统软件权限 MOUNT_FORMAT_FILESYSTEMS 内存读写相关 READ_EXTERNAL_STORAGE 位置相关 ACCESS_FINE_LOCATION 表 2不同认证算法认证的平均认证半径
Table 2.Average certified radius of different certificate algorithms
数据集 认证算法 JIA COHEN LECUYER 本文 Drebin 1.42 0.00 0.00 6.20 Androzoo 1.02 0.00 0.00 2.74 CIC 1.40 0.00 0.00 3.81 -
[1] 田东海, 魏行, 张博, 等. 基于机器学习的内核恶意程序检测研究与实现[J]. bob手机在线登陆学报自然版, 2020, 40(12): 1295 − 1301.TIAN Donghai, WEI Hang, ZHANG Bo, et al. Research and implementation of kernel malicious code detection based on machine learning[J]. Transactions of Beijing Institute of Technology, 2020, 40(12): 1295 − 1301. (in Chinese) [2] 刘亚姝, 王志海, 李经纬, 等. 基于卡方检验的Android恶意应用检测方法[J]. bob手机在线登陆学报自然版, 2019, 39(3): 290 − 294.LIU Yashu, WANG Zhihai, LI Jingwei, et al. An android malware detection method based on Chi-Squared test[J]. Transactions of Beijing institute of Technology, 2019, 39(3): 290 − 294. (in Chinese) [3] 崔甲, 施蕾, 李娟, 等. 一种高效的恶意域名检测框架[J]. bob手机在线登陆学报自然版, 2019, 39(1): 64 − 67.CUI Jia, SHI Lei, LI Juan, et al. An effective malicious domain detection framework[J]. Transactions of Beijing institute of Technology, 2019, 39(1): 64 − 67. (in Chinese) [4] GOODFELLOW I, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples. International Conference on Learning Representations [J/OL]. (2015-05-20). [2021-11-15]. https://arxiv.org/abs/1412.6572. [5] LECUYER M, ATLIDAKIS V, GEAMBASU R, et al. Certified robustness to adversarial examples with differential privacy[C]//2019 IEEE Symposium on Security and Privacy. San Francisco: IEEE Press, 2019: 656 − 672. [6] COHEN J, ROSENFELD E , KOLTER J. Certified adversarial robustness via randomized smoothing[C]//Proceedings of the 36th International Conference on Machine Learning . Long Beach: ACM, 2019: 1310 − 1320. [7] LEE G H, YUAN Y, CHANG S Y, et al. Tight certificates of adversarial robustness for randomly smoothed classifiers[C]//Neural Information Processing Systems . Vancouver: MIT Press, 2019: 1 − 12. [8] LI B, CHEN C, WANG W, et al. Certified adversarial robustness with additive noise[C]//The 33rd Conference on Neural Information Processing Systems. Vancouver: MIT Press, 2019: 1 − 11. [9] JIA J, CAO X Y, WANG B H, et al. Certified robustness of community detection against adversarial structural perturbation via randomized smoothing[C]//Proceedings of the Web Conference 2020. New York: ACM, 2020: 2718 − 2724. [10] WANG Y H, ZHANG H, CHEN H, et al. On Lp-norm robustness of ensemble decision stumps and trees[C]//International Conference on Machine Learning. Vienna: ACM, 2020: 10104 − 10114. [11] GOOGLE. Permissions on android[EB/OL]. (2021-05-10). [2021-05-10]. https://developer.android.com/guide/topics/permisions/overview#dangerous_permissions. [12] ARP D, PREITZENBARTH M, HUBNER M. Drebin: Effective and explainable detection of android malware in your pocket[C]//21st Annual Network and Distributed System Security Symposium . Rosten: The Internet Society, 2014: 23 − 26. [13] ALLIX K, F. BISSYANDÉ T, KLEIN J, et al. AndroZoo: collecting millions of Android apps for the research community[C]//Proceedings of the 13th International Conference on Mining Software Repositories. Association for Computing Machinery . New York: MSR, 2016: 468 − 471. [14] LASHKARI A H, A. KADIR A F, TAHERI L, et al. Toward developing a systematic approach to generate benchmark android malware datasets and classification[C]//2018 International Carnahan Conference on Security Technology. Bangalore: IEEE, 2018: 1 − 7.