TEXT MINING
Enormous amount of knowledge resides today in text documents that are stored either within the organization or outside of it. Text databases are rapidly growing because of the increasing amounts of information available in electronic form, such as electronic publications, digital libraries, e-mail, and the World Wide Web. Data stored in most text databases are semistructured, and special data-mining techniques, called text mining, have been developed for discovering new information from large collection of textual data.
In general, there are two key technologies that make online text mining possible. One is Internet searching capabilities and the other is the text analysis methodology. Internet searching has been around for a few years. With the explosion of Web sites in the past few years, numerous search engines designed to help users find content appeared practically overnight. Yahoo, Alta Vista, and Excite are three of the earliest. Search engines operate by indexing the content in a particular Web site and allowing users to search these indexes. With the new generation of Internet-searching tools, users can gain relevant information by processing smaller amount of links, pages, and indexes.
Text analysis, as a field, has been around longer than Internet searching. It has been a part of the efforts to make computers understand natural languages and it is commonly thought of as a problem for artificial intelligence. Text analysis can be used anywhere where there is a large amount of text that need to be analyzed. Although automatic processing of documents using different techniques does not allow the depth of analysis that a human can bring to the task, it can be used to extract key points, categorize documents, and generate summaries in a situation when a large number of documents makes manual analysis impossible. Market research, business-intelligence gathering, e-mail management, claim analysis, E-procurement, and automated help desk are only a few of the possible applications where text mining can be deployed successfully.
Tuesday, December 16, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment