本文已被:浏览 978次 下载 667次
投稿时间:2021-04-18 修订日期:2022-03-27
投稿时间:2021-04-18 修订日期:2022-03-27
中文摘要: 针对密码算法识别工作中因密码算法数量增多、密文数据复杂化以及数据间干扰增加,导致单层识别方案的识别准确率和稳定性变差等问题,提出一种基于混合梯度提升决策树和逻辑回归模型,并基于该模型构造分组密码算法识别方案。在该方案中,首先,采用NIST随机性测试标准中的15种测试方法作为密文特征提取方法对密文文件进行特征提取,并选定有意义的10种特征值作为分类器的输入;然后,使用这10组特征训练梯度提升决策树模型,并利用其学习而生成的树来构造新特征;最后,将这些新特征做one-hot编码,并将其加入到原有特征中训练逻辑回归模型进行预测。在唯密文情况下,基于9种不同的分类器模型分别构造9种不同的密码算法识别方案,并利用这9种方案对2种典型的分组密码算法AES和3DES加密的不同大小的密文文件进行密码算法二分类实验,对5种常用的分组密码算法AES、3DES、Blowfish、CAST和RC2加密的不同大小的密文文件进行密码算法五分类实验。实验结果表明,相较于其他识别方案,当密文长度相同时,本文所提方案在二分类和五分类识别问题中几乎均有最高的识别准确率。同时,随着密文长度的变化,识别准确率呈波动性变化,本文所提方案波动幅度最小,受影响程度最小,稳定性最高。
Abstract:To solve the problems that the identification accuracy and stability of the single-layer identification scheme are deteriorated due to the increase in the number of cryptographic algorithms, the complexity of ciphertext data and the increase of interference between data, a hybrid gradient boosting decision tree and logistic regression (HGBDTLR) model were proposed, and a block cipher algorithm identification scheme based on this model (HGLBIS) was constructed. In this scheme, 15 NIST randomness testing methods were used as the ciphertext feature extraction methods to extract features from the ciphertext files, and 10 meaningful feature values were selected as the input to the classifier; then these 10 feature values were used to train a gradient boosting decision tree model, and the trees generated from its learning were used to construct new features; finally, the new features were one-hot encoded, and added to the original features to train the logistic regression model. In the ciphertext only scenario, nine different cryptographic algorithm identification schemes were constructed based on nine different classifier models. Then, these nine schemes were used to perform binary classification of block ciphers experiments on ciphertext files of different sizes encrypted by two typical block cipher algorithms AES and 3DES, and five classification of block ciphers experiments on ciphertext files of different sizes encrypted by five commonly-used block cipher algorithms AES, 3DES, Blowfish, CAST and RC2. The experimental results showed that, compared with the existing identification schemes, when the size of ciphertext files is the same, the scheme proposed in this paper has almost the highest identification accuracy on both the binary classification and five classifications of block ciphers. At the same time, with the change of the size of ciphertext files, the identification accuracy shows a fluctuating change. In conclusion, the proposed scheme has the smallest fluctuation range, the smallest degree of influence and the highest stability.
keywords: cryptographic algorithm identification machine learning ensemble learning gradient boosting decision tree logistic regression
文章编号:202100341 中图分类号:TP309.7 文献标志码:
基金项目:国家重点研发计划项目(2018YFA0704703);国家自然科学基金项目(61802111;61972073;61972215);天津市自然科学基金项目(20JCZDJC00640);河南省重点研发与推广专项(222102210062);河南省高等学校重点科研项目基础研究计划(22A413004);国家级大学生创新训练项目(202110475072)
作者简介:第一作者:袁科(1982-),男,副教授,博士.研究方向:密码学与信息安全.E-mail:yuanke@henu.edu.cn;通信作者:杜展飞,E-mail:duzhanfei@henu.edu.cn
引用文本:
袁科,黄雅冰,杜展飞,李家保,贾春福.基于混合梯度提升决策树和逻辑回归模型的分组密码算法识别方案[J].工程科学与技术,2022,54(4):218-227.
YUAN Ke,HUANG Yabing,DU Zhanfei,LI Jiabao,JIA Chunfu.Block Cipher Algorithm Identification Scheme Based on Hybrid Gradient Boosting Decision Tree and Logistic Regression Model[J].Advanced Engineering Sciences,2022,54(4):218-227.
引用文本:
袁科,黄雅冰,杜展飞,李家保,贾春福.基于混合梯度提升决策树和逻辑回归模型的分组密码算法识别方案[J].工程科学与技术,2022,54(4):218-227.
YUAN Ke,HUANG Yabing,DU Zhanfei,LI Jiabao,JIA Chunfu.Block Cipher Algorithm Identification Scheme Based on Hybrid Gradient Boosting Decision Tree and Logistic Regression Model[J].Advanced Engineering Sciences,2022,54(4):218-227.