Data Mining: Information Extraction

Niket Bhargava, in Journal of Advances in Science and Technology


Animportant approach to text mining involves the use of natural-languageinformation extraction. Information extraction (IE) distills structured data orknowledge from un-structured text by identifying references to named entitiesas well as stated relationships between such entities. IE systems can be usedto directly extricate abstract knowledge from a text corpus, or to extract concretedata from a set of documents which can then be further analyzed withtraditional data-mining techniques to discover more general patterns. Wediscuss methods and implemented systems for both of these approaches andsummarize results on mining real text corpora of biomedical abstracts, jobannouncements, and product descriptions. We also discuss challenges that arisewhen employing current information extraction technology to discover knowledgein text