An Algorithm that Learns What's in a Name

阅读量:

1448

作者:

DM BikelR SchwartzRM Weischedel

展开

摘要:

In this paper, we present IdentiFinderTM, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities. We have evaluated the model in English (based on data from the Sixth and Seventh Message Understanding Conferences [MUC-6, MUC-7] and broadcast news) and in Spanish (based on data distributed through the First Multilingual Entity Task [MET-1]), and on speech input (based on broadcast news). We report results here on standard materials only to quantify performance on data available to the community, namely, MUC-6 and MET-1. Results have been consistently better than reported by any other learning algorithm. IdentiFinder's performance is competitive with approaches based on handcrafted rules on mixed case text and superior on text where case information is not available. We also present a controlled experiment showing the effect of training set size on performance, demonstrating that as little as 100,000 words of training data is adequate to get performance around 90% on newswire. Although we present our understanding of why this algorithm performs so well on this class of problems, we believe that significant improvement in performance may still be possible.

展开

DOI:

10.1023/A:1007558221122

被引量:

1714

年份:

1999

通过文献互助平台发起求助,成功后即可免费获取论文全文。

相似文献

参考文献

引证文献

来源期刊

Machine Learning
1999/02/01

引用走势

2008
被引量:144

辅助模式

0

引用

文献可以批量引用啦~
欢迎点我试用!

引用