浏览全部资源
扫码关注微信
1. 武汉大学 计算机学院湖北,武汉,430072
2. 武汉大学 软件工程国家重点实验室湖北,武汉,430072
纸质出版日期:2017,
网络出版日期:2017-1-2,
扫 描 看 全 文
孙松涛,何炎祥.基于CNN特征空间的微博多标签情感分类[J].工程科学与技术,2017,49(3):162-169.
Multi-label Emotion Classification for Microblog Based on CNN Feature Space[J]. Advanced Engineering Sciences, 2017,49(3):162-169.
孙松涛,何炎祥.基于CNN特征空间的微博多标签情感分类[J].工程科学与技术,2017,49(3):162-169. DOI: 10.15961/j.jsuese.201600780.
Multi-label Emotion Classification for Microblog Based on CNN Feature Space[J]. Advanced Engineering Sciences, 2017,49(3):162-169. DOI: 10.15961/j.jsuese.201600780.
中文摘要: 面对微博情感评测任务中的多标签分类问题时,基于向量空间模型的传统文本特征表示方法难以提供有效的语义特征。基于深度学习的词向量表示技术,能够很好地体现词语的语法和语义关系,且可以依据语义合成原理有效地构建句子的特征表示向量。作者提出一个针对微博句子的多标签情感分类系统,首先从1个大规模的无标注微博文本数据集中学习中文词语的词向量表示,然后采用卷积神经网络(convolution neural network,CNN)模型进行有监督的多情感分类学习,利用学习到的CNN模型将微博句子中的词向量合成为句子向量,最后将这些句子向量作为特征训练多标签分类器,完成微博的多标签情感分类。2013年NLPCC(Natural Language Processing and Chinese Computing)会议的微博情感评测公开数据集中,相比最优评测结果的宽松指标和严格指标,本系统的最佳分类性能分别提升了19.16%和17.75%;采用Recursive Neural Tensor Network模型合成句子向量的方法,取得目前已知文献中的最佳分类性能,系统将2个指标分别提升了3.66%和2.89%。采用多种多标签分类器来对比不同的特征表示方法,发现基于CNN特征空间的句子向量具有最好的情感语义区分度;通过对CNN迭代训练过程的分析,体现了语义合成过程中的模式识别规律。进一步的工作包括引入更多合适的深度学习模型,并深入探索基于词向量的语义合成现象。
Abstract:While the evaluation task of microblog emotion is a multi-label classification problem
the traditional text representing methods
which are usually based on vector space model
fail to provide more effective semantic features.Word embedding technology is based on deep learning
which can well capture the syntax and semantic relations between words
and build sentence representing effectively according to semantic compositionality.A multi-label emotion classification system was proposed.First
word embedding for Chinese words was learned from a large scale of unlabeled Chinese microblog text dataset.Second
the Convolution Neural Network (CNN) model was exploited to train a supervised multi-emotion classifier.Third
the learned CNN model was used to composite the feature vector for sentences from microblog.At last
these sentence vectors were treated as semantic features to train the multi-label classifier
which was used to finish the multi-label emotion classification for microblog.Based on the open dataset from microblog emotion evaluation task of NLPCC (Natural Language Processing and Chinese Computing) conference in 2013
the best performance of the proposed system achieved 19.16% and 17.75% improvement in the loose metric and the strict metric
respectively
comparing to the best performance of all the evaluation results.The state-of-art performance
which was achieved by the method of exploiting Recursive Neural Tensor Network model to composite the sentence vector
was also outperformed by the proposed system up to 3.66% and 2.89% on the two metrics.Several multi-label classifiers were employed to compare different feature representing methods
and the sentence vectors based CNN feature space were showed to have the most discriminative emotion semantic.The pattern recognition in the semantic composition procedure was showed by analyzing the training iteration of CNN model.
情感分类多标签分类词向量表示卷积神经网络语义合成
emotion classificationmulti-label classificationword embeddingconvolution neural networksemantic compositionality
Zhao Yanyan,Qin Bing,Liu Ting.Sentiment analysis[J].Journal of Software,2010,21(8):1834-1848.[赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848.]
Pak A,Paroubek P.Twitter as a corpus for sentiment analysis and opinion mining[C]//Proceedings of the International Conference on Language Resources and Evaluation.Valletta,Malta:European Language Resources Association,2010:1320-1326.
Kim S,Hovy E.Automatic identification of pro and con reasons in online reviews[C]//Proceedings of the COLING/ACL on Main Conference Poster Sessions.Sydney:ACL,2006:483-490.
Chen T,Xu R,Lu Q,et al.A sentence vector based over-sampling method for imbalanced emotion classification[C]//Proceedings of 15th International Conference on Computational Linguistics and Intelligent Text Processing.Kathmandu:Springer,2014:62-72.
Xu R,Chen T,Xia Y,et al.Word embedding composition for data imbalances in sentiment and emotion classification[J].Cognitive Computation,2015,7(2):226-240.
Xu R,Wang Z,Xu J,et al.An iterative emotion classification approach for microblogs[C]//Proceedings of 16th International Conference on Computational Linguistics and Intelligent Text Processing.Cairo:Springer,2015:104-113.
Cambria E,Schuller B,Xia Y,et al.New avenues in opinion mining and sentiment analysis[J].IEEE Intelligent Systems,2013,28(2):15-21.
Salton G,Wong A,Yang C.A vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of Advances in Neural Information Processing Systems 26.
Mitchell J,Lapata M.Composition in distributional models of semantics[J].Cognitive Science,2010,34(8):1388-1429.
Socher R,Huval B,Manning C D,et al.Semantic compositionality through recursive matrix-vector spaces[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Jeju Island:Association for Computational Linguistics,2012:1201-1211.
Socher R,Perelygin A,Wu Y,et al.Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Seattle:Association for Computational Linguistics,2013:1631-1642.
Tai K S,Socher R,Manning C D.Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing.Beijing:Association for Computational Linguistics,2015:1556-1566.
Kim Y.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:Association for Computational Linguistics,2014:1746-1751.
dos Santos C N,Gatti M.Deep convolutional neural networks for sentiment analysis of short texts[C]//Proceedings of the 25th International Conference on Computational Linguistics.Dublin:ACL,2014:69-78.
Zhang M,Zhou Z.A review on multi-label learning algorithms[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(8):1819-1837.
Boutell M R,Luo J,Shen X,et al.Learning multi-label scene classification[J].Pattern Recognition,2004,37(9):1757 - 1771.
Read J,Pfahringer B,Holmes G,et al.Classifier chains for multi-label classification[J].Machine Learning.2011,85(3):333-359.
Tsoumakas G,Vlahavas I.Random k-labelsets:An ensemble method for multilabel classification[C]//Proceedings of 18th European Conference on Machine Learning.Warsaw:Springer Berlin Heidelberg,2007:406-417.
Fürnkranz J,Hüllermeier E,Mencía E L,et al.Multilabel classification via calibrated label ranking[J].Machine Learning.2008,73(2):133-153.
0
浏览量
6797
下载量
40
CNKI被引量
关联资源
相关文章
相关作者
相关机构