next up previous
Next: Information Extraction Engine Up: Knowledge-based Information Agents Previous: Agent Architecture

Three Knowledge Bases

Focusing on information extraction from semi-structured data, we summarize the main components of the three categories of knowledge as follows:

The three categories of knowledge have different priorities when they are used for information extraction. The priorities are given as follows:

  1. Site specific knowledge (S)
  2. Domain knowledge (D)
  3. General knowledge (G)

During the information extraction process, the site specific knowledge has the highest priority and the general knowledge has the lowest. When there are conflicts between the knowledge, the higher priority knowledge overrides the lower priority knowledge. When we get a particular site, we search for site specific knowledge first. If some site specific knowledge is found, this knowledge is used instead of the associated knowledge in either the general knowledge base or domain knowledge base. For example, if a site specific information extraction pattern is found for a special knowledge unit, then this pattern is used for extracting the knowledge unit, instead of using the more general pattern in the domain knowledge base.


next up previous
Next: Information Extraction Engine Up: Knowledge-based Information Agents Previous: Agent Architecture

Xiaoying Gao
Tue Dec 11 16:30:56 NZDT 2001