###
工程科学与技术:2017,49(3):162-169
←前一篇   |   后一篇→
本文二维码信息
码上扫一扫!
基于CNN特征空间的微博多标签情感分类
(1.武汉大学 计算机学院, 湖北 武汉 430072;2.武汉大学 软件工程国家重点实验室, 湖北 武汉 430072)
Multi-label Emotion Classification for Microblog Based on CNN Feature Space
(1.School of Computer, Wuhan Univ., Wuhan 430072, China;2.State Key Lab. of Software Eng., Wuhan Univ., Wuhan 430072, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 2769次   下载 3300
投稿时间:2016-08-07    修订日期:2017-01-02
中文摘要: 面对微博情感评测任务中的多标签分类问题时,基于向量空间模型的传统文本特征表示方法难以提供有效的语义特征。基于深度学习的词向量表示技术,能够很好地体现词语的语法和语义关系,且可以依据语义合成原理有效地构建句子的特征表示向量。作者提出一个针对微博句子的多标签情感分类系统,首先从1个大规模的无标注微博文本数据集中学习中文词语的词向量表示,然后采用卷积神经网络(convolution neural network,CNN)模型进行有监督的多情感分类学习,利用学习到的CNN模型将微博句子中的词向量合成为句子向量,最后将这些句子向量作为特征训练多标签分类器,完成微博的多标签情感分类。2013年NLPCC(Natural Language Processing and Chinese Computing)会议的微博情感评测公开数据集中,相比最优评测结果的宽松指标和严格指标,本系统的最佳分类性能分别提升了19.16%和17.75%;采用Recursive Neural Tensor Network模型合成句子向量的方法,取得目前已知文献中的最佳分类性能,系统将2个指标分别提升了3.66%和2.89%。采用多种多标签分类器来对比不同的特征表示方法,发现基于CNN特征空间的句子向量具有最好的情感语义区分度;通过对CNN迭代训练过程的分析,体现了语义合成过程中的模式识别规律。进一步的工作包括引入更多合适的深度学习模型,并深入探索基于词向量的语义合成现象。
Abstract:While the evaluation task of microblog emotion is a multi-label classification problem,the traditional text representing methods,which are usually based on vector space model,fail to provide more effective semantic features.Word embedding technology is based on deep learning,which can well capture the syntax and semantic relations between words,and build sentence representing effectively according to semantic compositionality.A multi-label emotion classification system was proposed.First,word embedding for Chinese words was learned from a large scale of unlabeled Chinese microblog text dataset.Second,the Convolution Neural Network (CNN) model was exploited to train a supervised multi-emotion classifier.Third,the learned CNN model was used to composite the feature vector for sentences from microblog.At last,these sentence vectors were treated as semantic features to train the multi-label classifier,which was used to finish the multi-label emotion classification for microblog.Based on the open dataset from microblog emotion evaluation task of NLPCC (Natural Language Processing and Chinese Computing) conference in 2013,the best performance of the proposed system achieved 19.16% and 17.75% improvement in the loose metric and the strict metric,respectively,comparing to the best performance of all the evaluation results.The state-of-art performance,which was achieved by the method of exploiting Recursive Neural Tensor Network model to composite the sentence vector,was also outperformed by the proposed system up to 3.66% and 2.89% on the two metrics.Several multi-label classifiers were employed to compare different feature representing methods,and the sentence vectors based CNN feature space were showed to have the most discriminative emotion semantic.The pattern recognition in the semantic composition procedure was showed by analyzing the training iteration of CNN model.
文章编号:201600780     中图分类号:    文献标志码:
基金项目:国家自然科学基金资助项目(61303115;61373039;61472290);高等学校博士学科点专项科研基金资助项目(2013014111002512)
作者简介:
引用文本:
孙松涛,何炎祥.基于CNN特征空间的微博多标签情感分类[J].工程科学与技术,2017,49(3):162-169.
SUN Songtao,HE Yanxiang.Multi-label Emotion Classification for Microblog Based on CNN Feature Space[J].Advanced Engineering Sciences,2017,49(3):162-169.