###
DOI:
工程科学与技术:2012,44(6):127-132
←前一篇   |   后一篇→
本文二维码信息
码上扫一扫!
一种句词五特征融合模型的复述研究
(四川大学 计算机学院)
Research on Word-level Contextual Paraphrase Retrieving with Five-features
(School of Computer Sci.,Sichuan Univ.)
摘要
图/表
参考文献
相似文献
本文已被:浏览 2418次   下载 0
投稿时间:2012-06-27    修订日期:2012-09-20
中文摘要: 为解决中文同义词词林无法用做上下文相关的复述语料问题,提出了一种词汇级复述方法。在中文大语料库环境下,根据给定的上下文,提取复述目标词和复述候选词;建立词、句融合的分层概率统计模型,给出了计算句、词复述相似度的5项特征值,用以训练二元分类器,并对候选复述词进行筛选。实验结果证明:1)基于大语料库数据挖掘,获取候选复述词提取方法具有实用价值,每个目标词给定的上下文句子中获取3.1个正确复述词;2)利用二元分类器对复述确认是有效的,精确率达到0.65;3)提取的复述中,有32%在《中文同义词扩展词林》无法查出,有效扩展了传统同义词复述方法。
Abstract:To solve the weakness of Chinese synonym dictionary Tongyici-Cilin’s,which can’t be used as a context-dependent paraphrase corpus, a word-level paraphrase method was presented to improved the Chinese paraphrase extraction accuracy. Based on its contextual sentence, the target word’s paraphrase candidates were identified and extracted from large-size corpuses. The target word was then paired up with each candidate, and a five-feature probability model captured the information of the target word, the context sentence, and the paraphrase candidates were established. Values of those five features were inputted to train a binary classifier which subsequently filtered out the paraphrase candidates. The experiment proved that through data mining the method for retrieving candidate paraphrases from large-size corpuses had pragmatic value, and on average 3.1 correct paraphrases were obtained for a word. Binary classifier was efficient in filtering out the paraphrases, with an accuracy rate of 0.65. 32% of the retrieved paraphrases could not be found in the Expanded Chinese Synonym Dictionary.
文章编号:201200489     中图分类号:    文献标志码:
基金项目:四川省科技平台支撑计划资助项目(JCPT2011-7)
作者简介:
引用文本:
何贤江,何维维,左航.一种句词五特征融合模型的复述研究[J].工程科学与技术,2012,44(6):127-132.
He Xianjiang,He Weiwei,Zuo Hang.Research on Word-level Contextual Paraphrase Retrieving with Five-features[J].Advanced Engineering Sciences,2012,44(6):127-132.