###
工程科学与技术:2022,54(2):180-187
←前一篇   |   后一篇→
本文二维码信息
码上扫一扫!
加性频域分解的生成对抗网络语音去混响
(昆明理工大学 信息工程与自动化学院, 云南 昆明 650500)
Speech Dereverberation Based on Generative Adversarial Network with Additive Frequency Domain Decomposition
(School of Info. Eng. and Automation, Kunming Univ. of Technol., Kunming 650500, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 295次   下载 111
投稿时间:2021-03-30    修订日期:2021-10-06
中文摘要: 混响语音信号包括由路径延迟效应引起的不同频率分量,这些频率分量在频域中进行相关调制。为了降低混响语音在频谱中的高相关性,提出了一种基于加性频域分解的改进生成对抗网络(generative adversarial network,GAN)算法。首先,对混响语音的短时幅度谱进行对数运算,将调制的混响语音幅度谱转换为线性幅度谱,从而对卷积的语音分量进行分解;然后,通过sigmoid非线性函数进行归一化以平衡数据分布,再将解调后的幅度谱应用于深度全卷积网络以训练GAN模型;最后,基于生成模型和判别模型的对抗性学习机制,可以有效学习混响语音和声源语音的分布多样性,指导生成模型更精确地重构增强语音。采用Aishell中文语音数据集进行算法性能验证,分别比较了GAN、FCN和DNN模型有(或无)加性频域分解的去混响性能,并通过语谱图的差异来证明所提方法的有效性。实验结果表明,在4种不同的混响时间参数下,采用加性频域分解的GAN、FCN和DNN模型的PESQ、STOI、LSD评价分数比没有加性频域分解的提高了10%左右。因此,加性频域分解在用于语音去混响时可以有效提高GAN的性能。同时,在非同源测试集下也具有较好的泛化能力。
Abstract:The reverberant speech signal includes different frequency components induced by the effect of path delay. The frequency components are correlatedly modulated in frequency domain. In order to reduce the high correlation of reverberant speech in the spectrum, an improved generative adversarial network (GAN) algorithm based on additive frequency domain decomposition was proposed. Firstly, the short-time amplitude spectrums of the reverberant speech were processed with the logarithmic operation, by which the modulated amplitude spectrums of reverberant speech were converted into the linear ones, and then the convolved speech components were decomposed. After normalized by the sigmoid nonlinear function to balance the data distribution, the demodulated amplitude spectrums were applied to a deep fully convolutional network to train a GAN model. Finally, based on the adversarial learning mechanism of the generative model and the discriminative model, the distribution diversity of the reverberant speech and the source speech were effectively learned, and the enhanced speech signal was accurately reconstructed with the generative model. In experiments, the Chinese speech data set of Aishell was used to test the performance of the proposed algorithm. The dereverberation performances of GAN, FCN, and DNN with (or without) additive frequency domain decomposition were respectively compared and demonstrated by the difference of spectrograms. Experimental results showed that under four different reverberation time parameters, the PESQ, STOI, and LSD’s evaluation scores of GAN, FCN, and DNN with additive frequency domain decomposition are about 10% higher than the ones without additive frequency domain decomposition. In conclusion, the additive frequency domain decomposition can effectively improve the performance of GAN in speech dereverberation application. Generally, the algorithm can be also applied to the non-homologous speech dereverberation.
文章编号:202100267     中图分类号:TP912    文献标志码:
基金项目:国家自然科学基金项目(41364002;61861023)
作者简介:第一作者:全海燕(1970-),男,副教授,博士.研究方向:智能信号处理;机器学习.E-mail:quanhaiyan@163.com;通信作者:王涛,E-mail:wtao762@163.com
引用文本:
全海燕,王涛,郑志清.加性频域分解的生成对抗网络语音去混响[J].工程科学与技术,2022,54(2):180-187.
QUAN Haiyan,WANG Tao,ZHENG Zhiqing.Speech Dereverberation Based on Generative Adversarial Network with Additive Frequency Domain Decomposition[J].Advanced Engineering Sciences,2022,54(2):180-187.