[转]Ontologies – Description and Applications

摘要

The word “ontology” has gained a good popularity within the AI community. Ontology is usually viewed as a high-level description con- sisting of concepts that organize the upper parts of the knowledge base. However, meaning of the term “ontology” tends to be a bit vague, as the term is used in dierent ways. In this paper we will attempt to clarify the meaning of the ontology including the philosophical views and show why ontologies are useful and important. We will give an overview of ontology structures in several particular systems. A field proposed within ontological eorts, “ontological engi- neering”, will be also described. Usage of ontologies in several particular ways will be discussed. These include systems and ideas to support knowledge base sharing and reuse, both for computers and humans, ontology based communication in multi- agent systems, applications of ontologies for natural language processing, applications in documents search and enrichment of knowledge bases, both particularly for the World Wide Web environment and construction of educational systems, particularly intelligent tutoring systems.

本体（ontology）一词在人工智能界已经有相当的知名度了。本体通常被认为是由概念所组成的高级描述，概念则是用来对知识库进行组织的上层部分。然而，当“ontology”这个术语在不同的场合以不同方式加以应用时，其含义往往是有点儿含糊不清的。本文将力图阐明本体的含义，包括哲学观点上的含义，并指明为什么本体是很有用的，也是很重要的。我们将给出几个特殊系统中的本体结构总体情况，并对“本体工程”这一最新被提议的研究领域加以阐述。本文还将讨论本体的几种特定的用法，包括支持人与计算机的知识库共享与重用的系统和想法、多主体系统中基于本体的通信、本体在自然语言中的应用、本体在文本搜索和知识库浓缩中的应用，同时包括在互联网环境和教育系统中，特别是智能辅导系统。

1 Introduction

The word “ontology” has gained a good popularity within the AI community. Ontology is usually viewed as a high-level description consisting of concepts that organize the upper parts of the knowledge base. However, meaning of the term “ontology” tends to be a bit vague, as the term is used in different ways. In this paper we will attempt to clarify the meaning of the ontology and show why ontologies are useful and important. We will discuss usage of ontologies in several particular ways, such as knowledge base reuse, knowledge sharing, communication in multi-agent systems, applications of ontologies for WWW applications, for natural language processing, and for intelligent tutoring systems.

1 简介

“本体”这个词在AI领域中广泛流传。本体经常被视作一个高层次的描述方法，这个描述方法由一些概念组成，而这些概念被认为组成了知识库的上层结构。但是，由于它被用在许多不同的地方，“本体”一词的意思似乎很容易被混淆。在这份文件中，我们将尝试弄清本体的真正意思，并且展示产生本体重要意义和实用性的原因。我们将用不同的方面讨论本体的用处，例如知识库的复用，知识库的共享，多代理系统内部的通讯，用作网络应用的本体应用程序，用作自然语言处理的本体应用程序以及用作智能辅助系统内的本体应用程序。

1.1 动机　　在AI研究历史中，定义了两种研究类型[31,8]：面向形式的研究（机制理论）及面向内容的研究（内容理论）。前者处理逻辑与知识表达，而后者处理知识的内容。显然前者时至今日是AI的勘察范围，然而在最近，面向内容的研究已逐渐引起更多的关注，因为许多现实世界的问题的解决如知识的重用、agent通讯的简化、通过理解集成媒体、大规模的知识基等等，不仅需要先进的理论或推理方法而且还需要对知识内容进行复杂的处理。 Formal theories such as predicate logic provides us with a powerful tool to guarantee sound reasoning and thinking. It even enables us to discuss the limits of our reasoning in a principled way. However, it cannot answer to any of the questions such as what knowledge we should have for solving given problems, what is knowledge at all, what properties a specific knowledge has, and so on. Sometimes, the AI community gets excited by some mechanisms such as neural nets, fuzzy logic, genetic algorithms, constraint propagation etc. These mechanisms are proposed as the “secret” of making intelligent machines. At other times, it is realized that, however wonderful the mechanism, it cannot do much without a good content theory of the domain on which it is to work. Moreover, we often recognize that once a good content theory is available, many dierent mechanisms might be used equally well to implement eective systems, all using essentially the same content.

Importance of content-oriented research is being recognized more and more nowadays. Unfortunately it seems that there are no widely recognized sophisticated methodologies for content-oriented research now. Major results till later years were only development of knowl- edge bases. 以前的理论比如谓词逻辑学提供了一种合理的推理和思考的工具。它甚至使我们可以在一定原则下来探讨推理的局限性。然而，这一理论却不能回答诸如“解决特定问题需要什么知识”，“究竟什么是知识”，“一种特定知识具备怎样的特征”等等的问题。有时，人工智能领域因为一些理论机制而变得沸沸扬扬，比如神经网络，模糊学，基因运算规则以及选择性繁殖等。这些理论被认为是开发人工智能的“秘密”所在。而又有些时候，我们意识到不管这些机制多么令人赞叹，如果在其作用领域内没有一个完善的内容理论，它将难以发挥巨大作用。更进一步，我们常常发现一旦建立了完备的内容理论，许多不同的理论机制都能良好的实现有效的系统，而这些系统本质上都应用同样的内容。现在，面向内容的研究的重要性已日益为我们所重视。遗憾的是目前还没有形成面向内容的被广泛认同的精确的方法论，近年来最大的成果也只是知识库的开发。

 The reasons for this can be [31]:

content-oriented research tends to be ad hoc there is no methodology that enables to accumulate research results It is necessary to overcome these diculties in the content-oriented research. Ontologies are proposed for that purpose. Ontology engineering, as proposed in e.g. [31], is a research methodology which gives us design rationale of a knowledge base, kernel conceptualization of the world of interest, strict definition of basic meanings of basic concepts together with sophis- ticated theories and technologies enabling accumulation of knowledge which is dispensable for modeling the real world. Interest in ontologies has also grown as researchers and system developers have become more interested in reusing or sharing knowledge across systems. Currently, one key imped- iment to sharing knowledge is that dierent systems use dierent concepts and terms for describing domains. These dierences make it dicult to take knowledge out of one system and use it in another. If we could develop ontologies that could be used as the basis for multi- ple systems, they would share a common terminology that would facilitate sharing and reuse. Developing such reusable ontologies is an important goal of ontology research. Similarly, if we could develop tools that would support merging ontologies and translating between them, sharing would be possible even between systems based on dierent ontologies. 出现这种情况的原因或许有如下几点：【31】
1.面向内容的研究更趋于专业化
2.对于研究结果的聚集尚无一定的方法论
内容研究必须克服这些难点，而本体就是基于这个目的提出的。本体设计，就像【31】所要求的，是一种内容研究的方法论，它提供了知识库设计的基本原理，专业领域的核心概念，对基本概念含义的严格定义，以及模拟现实世界所必不可少的知识聚集的复杂理论和技术。
随着研究人员和系统开发者对系统内的知识重用和共享越发感兴趣，对本体论的兴趣也日益增长。目前，阻碍知识共享的一个关键问题是不同系统使用不同的概念和术语来描述其领域。这种不同使得将一个系统的知识用于其他系统变得十分复杂。如果可以开发一些能够用作多个系统的基础的本体，这些系统就可以共享通用的术语以实现知识共享和重用。开发这样的可重用本体是本体论研究的重要目标。类似的，如果我们可以开发一些支持本体合并以及本体间互译的工具，那么即使是基于不同本体的系统也可以实现共享。

1.2 Philosophical View

哲学角度看本体

The term ontology was taken from philosophy. According toWebster’s Dictionary an ontology is a branch of metaphysics relating to the nature and relations of being a particular theory about the nature of being or the kinds of existence
Ontology (the “science of being”) is a word, like metaphysics, that is used in many dierent senses. It is sometimes considered to be identical to metaphysics, but we prefer to use it in a more specific sense, as that part of metaphysics that specifies the most fundamental categories of existence, the elementary substances or structures out of which the world is made. Ontology will thus analyze the most general and abstract concepts or distinctions that underlay every more specific description of any phenomenon in the world, e.g. time, space, matter, process, cause and eect, system. Recently, the term of “ontology” has been up taken by researchers in Artificial Intelligence, who use it to designate the building blocks out of which models of the world are made.

An agent (e.g. an autonomous robot) using a particular model will only be able to perceive that part of the world that his ontology is able to represent. In this sense, only the things in his ontology can exist for that agent. In that way, an ontology becomes the basic level of a knowledge representation scheme. An example is set of link types for a semantic network representation which is based on a set of ”ontological” distinctions: changing–invariant, and general–specific.
本体这个术语来自于哲学。根据韦氏词典的解释，本体是
形而上学的一个分支，研究关于自然和存在的关系;
关于存在的本质的专门理论。
本体（指关于存在的科学）是个词，就好象形而上学，可以用于各种不同的语境。有时候把本体等同于形而上学，但我们倾向于在更具体的意义上应用它，就像形而上学详细说明了存在的最基本的范畴，组成世界的基本物质或结构。本体论因此将分析最普遍最抽象的概念或差别，这种差别成为对世界上各种现象（比如时间、空间、物质、过程、原因和结果、系统等）进行具体描述的根基。
最近，本体在人工智能领域中得以应用，它被认为是构建世界模型的积木。
一个使用特定模型的代理（比如一个自主机器人），只能理解它内部定义的本体所能代表的世界的某部分。在这个意义上，只有在代理本体里定义的事物对代理来说才是存在的。这样，一个本体就代表了知识大纲的基本水平。例如对语义网的链接类型的表现是基于一系列“本体论的”定义：变更——固定；普遍——特殊。

2 What is an Ontology?

The term “ontology” is used in many dierent ways. In this section we will discuss what an ontology is on several definitions that are currently used.

何谓本体论？

本体论这个术语应用于很多方面。这一节中我们将在几个目前所使用的不同定义的基础上讨论什么是“本体论”。

2.1 Common Definitions

2.1 普遍定义

The most widespread definitions of ontology are given below. 1. Ontology is a term in philosophy and its meaning is “theory of existence”. 2. Ontology is an explicit specification of conceptualization [21]. 3. Ontology is a theory of vocabulary or concepts used for building artificial systems [31]. 4. Ontology is a body of knowledge describing some domain (eg. a common sense knowl- edge domain in CYC [45]) The definition 1 is radically dierent from all the others (including additional ones dis- cussed below). We will shortly discuss some implications of its meaning for definition of “ontology” for AI purposes. The second definition is generally proposed as a definition of what an ontology is for the AI community. It may be classified as “syntactic”, but its precise meaning depends on the understanding of the terms “specification” and “conceptualization”. The third definition is a proposal for definition within the knowledge engineering community. The last fourth definition diers from the previous two ones — it views the ontology as an inner body of knowledge, not as the way to describe the knowledge. Although these definitions are compact, they are not sucient for in-depth understanding of what an ontology is. We will try to give more comprehensive definitions and insights. 最广为流传的本体论定义如下：
1.本体论是一个哲学术语，意义为“关于存在的理论”
2.本体论是关于概念化的清楚详细的说明
3.本体论是关于词汇或概念的理论，它用于构建人工智能系统
4.本体论是用来定义某一领域的知识主体（比如：在CYC领域的常识性知识）
定义1与其他定义(包括下面将要讨论的其他定义)有着本质不同。我们一会儿将讨论在人工智能领域的“本体论”的深层含义。第二个定义通常认为是“本体论”在人工智能中的定义。它或许可以归为符合造句法的一类，然而其更准确的含义要依靠对“详细说明”和“概念化”的理解。第三个定义是知识工程师团体推荐的定义。最后第四个有别于前两个定义——它把本体论看作知识的内主体，而不是描述知识的途径。
这些定义虽然简洁，但是要深层理解本体论这些是不够的。我们将试着给出更多的更为全面的定义和观点。

2.1.1 Ontology as a Philosophical Term

2.1.1 作为哲学名词的"本体"

Following [24] we will use the convention that the uppercase initial letter “O” is to distinguish the “Ontology” as a philosophical discipline from other usages of this term. Ontology is a branch of philosophy that deals with the nature and the organization of reality. It tries to answer questions like “what is existence”, “what properties can explain the existence” etc. Aristotle defined Ontology as the science of being as such. Unlike the special sciences, each of which investigates a class of beings and their determinations, Ontology regards “all the species qua being and the attributes that belong to it qua being” (Aristotle, Metaphysics, IV, 1). In this sense Ontology tries to answer the question “what is the being?” or, in a meaningful reformulation “what are the features common to all beings?”. This is what is today called “General Ontology” in contrast with various Special or Re- gional Ontologies (eg. Biological, Social). From this, Formal Ontology is defined as an area that has to determinate the conditions of the possibility of the object in general and the in- dividualization of the requirements that every object’s constitution has to satisfy. According to [24] Formal Ontology can be defined as the systematic, formal, axiomatic development of the logic of all forms and modes of being. From this, Formal Ontology is not concerned so much in the existence of certain objects, but rather in the rigorous description of their forms of being, i.e. their structural features. In practice, Formal Ontology can be intended as the theory of the distinctions, which can be applied independently of the state of the world, i. e. the distinctions: among the entities of the world (physical objects, events, regions...) among the meta-level categories used to model the world (concept, property, quality, state, role, part...) In this sense, Formal Ontology, as a discipline, may be relevant to both Knowledge Rep- resentation and Knowledge Acquisition [24].
以下，我们使用首字母大写的“O”时，指“Ontology”作为一门哲学学科，以此与它的其他用法进行区别。“Ontology”（哲学上的本体论）时哲学的一个分支，研究自然存在以及现实的组成结构。它试图回答“什么是存在”，“存在的性质是什么”等等。亚里士多德也同样定义“本体论”是存在的科学。每一门具体科学都研究一类事物和它们的性质，与之不同，本体论涉及的是“所有作为存在的事物以及它们作为存在的特性(亚里士多德, 形而上学,IV, 1). ”在这个意义上，本体论是试图回答“存在是什么”的科学，或者这个问题可以表达为含义更清楚的形式，即“所有的存在有什么共性？”
这就是今天所说的“一般本体论”，它与各种特殊的专门的本体论相对（如，生物本体论，社会本体论）。从这个观点出发，形式本体论是指这样一个领域，它确定客观事物总体上的可能的状态，确定每个客观事物的结构所必须满足的个性化的需求。根据[24]，形式本体论可以定义为有关存在的一切形式和模式的系统，正式，自明的发展。

由此看来，形式本体论并不是特别关注特定事物的存在，而是严格描述它们存在的形式，比如它们的结构特征。实践中，形式本体论可以看作是区别理论，可以独立应用于世界的状态，如：
世界上不同实体之间的区别(物理实体、事件、地区等)；
模拟世界的元范畴间的区别（概念、性质、质量、状态、角色、部分等）

2.1.2 Ontology as a Specification of Conceptualization

2.1.2 作为概念化详细说明的本体论

The second definition of ontology mentioned above, explicit specification of conceptualiza- tion, is briefly described in [20]. The definition comes from work [22] where the ontology is used in context of knowledge sharing. According to Thomas Gruber, explicit specification of conceptualization means that an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set of concept definitions, but more general.In this sense, ontology is important for the purpose of enabling knowledge sharing and reuse. An ontology is in this context a specification used for making ontological commitments. Practically, an ontological commitment is an agreement to use a vocabulary (i.e. ask queries and make assertions) in way that is consistent (but not complete) with respect to the theory specified by an ontology. Agents are then built that commit to ontologies and ontologies are designed so that the knowledge can be shared with and among these agents.
上面所提到的本体论第二个定义——概念化的清楚详细的说明——在【20】中进行了简要描述。这一定义来自【22】的工作，在这里本体用于知识共享。根据Thomas Gruber的解释，概念化的清楚的详细说明是指：一个本体是对概念和关系的描述（就像程序的详细说明书），而这些概念和关系可能是针对一个代理或代理群体而存在的。这个定义与本体论在概念定义中的描述一致，但它更具普遍意义。在这个意义上，本体论对于知识共享和重用非常重要。此处，一个本体是用来进行本体委托的详细说明。事实上，本体委托就是使用词汇的一个协议（比如进行询问和做出声明），而使用的方法要与某个本体指定的理论一致（而不必完全的照本宣科）。然后就可以开发应用这些本体的代理，而本体设计的目的就是让代理内部或者代理之间能够共享知识。 The body of a knowledge is based on a conceptualization: the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationship that hold among them. A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly.
知识的主体是基于概念化的：客观事物、概念以及其他实体存在于特定领域和其所处关系之中。概念化是对世界的抽象，是我们在一定目的下对期望表现的世界简化观察。每个知识库，基于知识的系统，或者是知识水平上的代理都或明显或潜在地遵照某些概念化的过程。 For these systems, what “exists” is that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, in the context of AI, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of discourse (e.g. classes, relations, functions, or other objects) with human readable text describing what the names mean, and formal axioms that constraint the interpretation and well-formed use of these terms. Formally it can be said that an ontology is a statement of a logical theory [20].
对这些系统来说，存在的就是那些可以被表示的。当某个领域的知识以声明的形式表示时，那些可以表示的对象的集合就称为universe of discourse。这些对象集以及它们之间可描述的关系，可以用描述性词汇来表示，这种词汇被用于基于知识的系统表达知识。因此，在人工智能环境下，可以通过定义一套描述性术语来描绘程序的本体。在这种本体中，定义与universe of discourse中的实体名相交互，用人类可读的文本来描述这些名字的含义，描述普遍真理，而这些真理规定了如何理解和正确使用这些术语。正规一些，我们可以说本体是对逻辑理论的阐述。 Ontologies are often equated with taxonomic hierarchies of classes without class definitions and the subsumption relation. Ontologies need not to be limited to these forms. Ontologies are also not limited to conservative definitions, that is, definitions in the traditional logic sense that only introduce terminology and do not add any knowledge about the world. To specify a conceptualization, one needs to state axioms that do constrain the possible interpretations for the defined terms. 本体常常等同于没有类的定义也不包括它们之间的关系的类的分类等级。然而本体并不局限于此形式。它也不只限于保守的定义，即在传统逻辑意义上的只包括术语而不附加任何关于知识的定义。要详细说明概念化，必须说明那些对定义项目的理解进行限制的公理。 Pragmatically, a common ontology defines the vocabulary with which queries and as- sertions are exchanged among agents. The agents sharing a vocabulary need not share a knowledge base. An agent that commits to an ontology is not required to answer all queries that can be formulated in the shared vocabulary. In short, a commitment to a common ontol- ogy is a guarantee of consistency, but not completeness, with respect to queries and assertions using the vocabulary defined in the ontology. 实际运用中，一个一般性的本体定义代理之间进行询问和声明所用的词汇表。共享词汇表的代理之间不需要共享一个知识库。遵循某个本体的代理也不需要能够回答用共享词汇表所构成的所有问题。总之，遵循一般性本体是连贯性的保证，但不是完整性的保证。

2.1.3 Ontology as a Representational Vocabulary

2.1.3作为代表性词汇的本体 The third definition of ontology proposed above says that it is in fact a representational vo- cabulary [8, 31]. The vocabulary can be specialized to some domain or subject matter.

 More

precisely, it is not the vocabulary as such that qualifies as an ontology, but the conceptu- alization that the terms in the vocabulary are intended to capture. Thus, translating the terms in an ontology from one language to another, for example from Czech to English, does not change the ontology conceptually.

 In engineering design, one might discuss the ontology of an electronic devices domain, which might include vocabulary that describes conceptual

elements — transistors, operational amplifiers, and voltages — and the relations between these elements — operational amplifiers are a type-of electronic device, and transistors are component-of operational amplifiers. Identifying such a vocabulary and the underlying con- ceptualization generally requires careful analysis of the kinds of objects and relations that can exist in the domain.
上述本体的第三个定义认为本体实际上是一种代表性的词汇。这种词汇可以应用于特定领域或者主题。更确切的说，它不是像本体那样严格定义的词汇，而是一种概念化，这种概念化是词汇表中的术语想要抽取出来的。因此，将这些术语用本体的形式在不同语言间翻译时，比如由捷克语译成英语，并不从概念上改变本体。在工程设计中，或许会讨论到电子设备领域的本体，它包含一些描述基本概念的词汇，比如晶体管，运算放大器，电压等；也包含这些基本元素间的关系，运算放大器是电子设备的一种，而晶体管是运算放大器的组件。一般来说，识别这种词汇和潜在的概念需要仔细分析领域内存在的各种对象和关系。
The term ontology is sometimes used to refer to a body of knowledge describing some domain (see below), typically a common sense knowledge domain, using a representational vocabulary. For example, CYC [45] often refers to its knowledge representation of some area of knowledge as its ontology. In other words, the representation vocabulary provides a set of terms with which one can describe the facts in some domain, while the body of knowledge using that vocabulary is a collection of facts about a domain. However, this distinction is not as clear as it might first appear. In the electronic-device example, that transistor is a component-of operational amplifier or that the latter is a type-of electronic device is just as much a fact about its domain as a CYC fact about some aspect of space, time or numbers. The distinction is that the former emphasizes the use of ontology as a set of terms for representing specific facts in an instance of the domain, while the latter emphasizes the view of ontology as a general set of facts to be shared.
本体这一术语有时候用于指描述某个领域的知识主体。比如，CYC常将它对某个领域知识的表示称为本体。也就是说，表示词汇提供了一套用于描述领域内事实的术语，而使用这些词汇的知识主体是这个领域内事实的集合。但是，它们之间的这种区别并不明显。在电子设备的例子中，晶体管是运算放大器的一个组件，或者运算放大器是一种电子设备也可以是领域内的一种事实，就像关于宇宙，时间或者数字的CYC事实一样。两者的区别在于，前者强调本体作为表现领域内特定事实的术语集而使用，而后者则强调本体是可以共享的普遍的事实的集合。

2.1.4 Ontology as a Body of Knowledge

2.1.4作为知识主体的本体 Sometimes, ontology is defined as a body of knowledge describing some domain, typically a common sense knowledge domain, using a representation vocabulary as described above. In this case, an ontology is not only the vocabulary, but the whole “upper” knowledge base (including the vocabulary that is used to describe this knowledge base). The typical example of this definition usage is project CYC (http://www.cyc.com/, [45]) that defines its knowledge base as an ontology for any other knowledge based system. CYC is the name of a very large, multi-contextual knowledge base and inference engine. The development of CYC started during the early 1980s headed by Douglas Lenat. CYC is an attempt to do symbolic AI on a massive scale. It is neither based on numerical methods such as statistical probabilities, nor is it based on neural networks or fuzzy logic. All of the knowledge in CYC is represented declaratively in the form of logical assertions. CYC contains over 400; 000 significant assertions [45], which include simple statements of fact, rules about what conclusions to draw if certain statements of fact are satisfied (true), and rules about how to reason with certain types of facts and rules. New conclusions are derived by the inference engine using deductive reasoning. The CYC team doesn’t believe there is any shortcut toward being intelligent or creating an artificial intelligence based agent. Addressing the need for a large body of knowledge with content and context may only be done by manually organizing and collating information.
有时候，本体被定义为描述某个领域的知识，通常是一般意义上的知识领域，它使用上面提到的表示性词汇。这时，一个本体不仅仅是词汇表，而是整个上层知识库（包括用于描述这个知识库的词汇）。这种定义的典型应用是CYC工程，它以本体定义其知识库，为其他知识库系统所用。CYC是一个巨型的，多关系型知识库和推理引擎。CYC的开发早在80年代就已经开始，重要负责人是Douglas Lenat。CYC是大型的符号型人工智能的一次尝试。它不是基于数字方法，比如概率统计，也不是基于神经网络或者模糊逻辑。 CYC中所有的知识都以逻辑声明的形式表示。CYC包含400，000多个关键声明，这其中包含对事实的简单陈述，关于满足特定事实陈述时得出何种结论的规则，以及关于通过一定类型的事实和规则如何推理的标准。新的结论由推理引擎通过演绎推理得到。CYC小组不相信在通往智能化或创造基于人工智能的代理的途中存在什么捷径。他们强调需要有大型的内容知识主体，而联系只能通过手工组织和比较信息而获得。

This knowledge includes heuristic, rule of thumb problem solving strategies, as well as facts that can only be known to a machine if it is told. Much of the useful common sense knowledge needed for life is prescientific and has there- fore not been analyzed in detail. Thus a large part of the work of the CYC project is to formalize common relationships and fill in the gaps between the highly systematized knowl- edge used by specialists. It is not necessary to divide such a large knowledge base into smaller pieces to enable reasoning in reasonable time. Because of this, the CYC knowledge base uses a special context space [29], that is divided by 12 dimensions into smaller pieces (contexts) that have something in common and can be used to reason about a specific problem in that context. It is possible to “lift” assertion from one context to another when the problem requires it. The CYC common sense knowledge can be used as a body of a knowledge base for any knowledge intensive system. In this sense, this body of knowledge can be viewed as an ontology of the knowledge base of the system.
这种知识包括启发、问题解决策略的检索规则，也包含只能被机器理解的事实。生活中需要的常识知识大部分是近代科学以前的，因此尚未详细分析。所以CYC很大一部分工作就是格式化一般的关系并填补它与专家使用的高度系统化的知识间的空白。为了在合理时间内完成推理而将这样一个大型的知识库分割成小部分是不必要的。为此，CYC知识库使用特殊的关系空间，这一空间被十二个因素分割成小块儿（关系），每个小块有共同点，可以用来推理特定的问题。在需要的时候也可以将声明从一个关系块转换到另一个关系块。CYC常识知识库可以被用作任何知识密集型系统的知识主体。在这个意义上，知识主体可以被看成系统知识库的本体。

2.2 Other Ontology Definitions

/* 2.2 其它本体定义*/ 正如我们从上述讨论中所见，还没有明确的对本体的准确定义，然而可以看出上述定义有许多共同之处。除了上述定义外还有许多对本体定义的其它说法。[24]中收集的一些其它的定义有：1.非正式的概念体系 2.正式的语义说明3. 对概念体系用逻辑性的理论进行描述 (a) 用特定格式的属性表现其特征(b) 仅按其特定的目标进行特征描述4. 逻辑性理论所采用的词汇表5. 逻辑理论的规范。定义1和定义2将一个本体视为一个概念的“语义”实体，正式或非正式的，而概念3，4和5的阐述则是一个具体的“语法”对象。根据定义1，一个本体是一个被设想成能够由特定知识库支持的概念体系。而定义2则认为有知识库支持的本体在语义层根据适当形式的结构予以表示。在上述2定义下，我们都可以说“知识库A的本体与知识库B的本体不同”。在定义3下，一个本体仅是一个逻辑理论。问题在于这样一个理论要成为本体是否需要有特殊格式的属性，或是否以让人将一个逻辑理论作为本体考虑为目标。后者可以由一个本体是关于事物的加注解和索引的声明的集合的辩论来支持： “离开注解和索引，它变成一个声明的集合：逻辑上何谓理论。(Pat Hayes 在 [24]中阐述的). 根据定义4，一个本体不作为一个逻辑理论，而是作为逻辑理论使用的词汇表。如果一个本体被视为一个包含一系列逻辑定义的词汇规范，则此定义转化为3.a。可以预测当概念化试图作为词汇表时Gruber的定义描述（概念化规范）也将转化为3.a。最后，在定义5下，基于一种认识：它指定了在特定领域的理论中使用的“构件”，一个本体被视为一个逻辑理论的规范*/*/As we can see from the above discussions, the exact definition of ontology is not obvious, however it can be seen that the definitions have much in common. In addition to the above definitions there are many other proposals for ontology definitions. Some other definitions collected from [24] are: 1. informal conceptual system 2. formal semantic account 3. representation of a conceptual system via a logical theory (a) characterized by specific formal properties (b) characterized only by its specific purposes 4. vocabulary used by a logical theory 5. (meta-level) specification of a logical theory Definitions 1 and 2 conceive an ontology as a conceptual “semantic” entity, either formal or informal, while according to the interpretations 3, 4 and 5 is a specific “syntactic” object. According to interpretation 1, an ontology is the conceptual system which may be assumed to underlay a particular knowledge base. Under interpretation 2, instead, the ontology, that underlies a knowledge base, is expressed in terms of suitable formal structures at the semantic level. In both cases, we may say that “the ontology of knowledge base A is dierent from that of knowledge base B”. Under interpretation 3, an ontology is nothing else then a logical theory. The issue is whether such a theory needs to have particular formal properties in order to be an ontology or, rather, whether it is the intended purpose which lets us consider a logical theory as an ontology. The latter position can be supported by arguing that an ontology is an annotated and indexed set of assertion about something: “leaving o the annotations and indexing, this is a collection of assertions: what in logic is called a theory” (Pat Hayes statement in [24]). According to interpretation 4, an ontology is not viewed as a logical theory, but just as the vocabulary used by a logical theory. Such an interpretation collapses into 3.a if an ontology is thought of as a specification of a vocabulary consisting of a set of logical definitions. We may anticipate that the Gruber’s interpretation (specification of conceptualization) collapses into 3.a as well when a conceptualization is intended as a vocabulary. Finally, under interpretation 5, an ontology is seen as a specification of a logical theory in the sense that it specifies the “architectural components” (or primitives) used within a particular domain theory. */

3 Ontology Structure

From the overview above we can see that an ontology can be perceived in basically two approaches. The first approach is an ontology as a representational vocabulary, where the conceptual structure of terms should remain unchanged during translation. The other ap- proach, that is discussed in this section, is an ontology as the body of knowledge describing a domain, in particular a common sense domain. An ontology can be divided in several ways. We will describe some of the proposals here. Particularly interesting is so called “upper ontology” that is intended to serve as an upper part of ontology of practically all knowledge based systems. Some of the ways of dividing presented here are intended to be used for merging to form an upper ontology standard in the IEEE Standard Upper Ontology Study Group [39]. On pages linked from [39] there are many other examples that could be used as some kind of an upper ontology. 根据以上看法可以得出一个本体基本上可以通过两个步聚来认识。第一个步骤是本体是一个抽象词汇表，在这个词汇表里术语的概念结构在转换的过程中应该保持不变。另一个步聚就是本节需要讨论的，本体是用来描述一个领域，特别是一个公共领域的一个知识体系。本体有几中划分方式。我们将在这里来讨论一些划分的建议。特别有趣的是一种“上层本体”，它试图用作几乎所有的基于知识的系统的本体的上层部分。在IEEE标准上层本体研究组中所描述的一些划分本体的方式试图用来合并成一个上层本体标准。在[39]的链接网页上有很多其它的例子可以作为一个上层本体。（感觉翻译不太好！） (figure 1)

Figure 1: How ontologies dier in their analyses of the most general concepts [8] It is interesting that many authors agree that the upper class1 of the ontology is “thing”, however even in the second level they do not agree on the separation, as can be seen in the figure 1. The initiative [39] tries to unify these views.

3.1 CYC

The ontology of CYC is based on a several terms that form the fundamental vocabulary of the CYC knowledge base. The universal set is #$Thing2 (see figure 1). It is the set of everything. Every CYC constant in the knowledge base is a member of this collection. In the prefix notation of the language CycL [10], we express that fact as (#$isa CONST #$Thing). Thus, too, every collection in the knowledge base is a subset of the collection #$Thing. In CycL, that fact is expressed as (#$genls COL #$Thing). The set #$Thing has some subsets, such as PathGeneric, Intangible, Individual, Sim- pleSegmentOfPath, PathSimple, MathematicalOrComputationalThing, IntangibleIndividual, Product, TemporalThing, SpatialThing, Situation, EdgeOnObject, FlowPath, ComputationalObject, Microtheory, plus about 1500 more public subsets and about 13600 unpublished subsets.

$Individual is the collection of all things that are not sets or collections. Thus,
$Individual includes (among other things) physical objects, temporal subabstractions of

physical objects, numbers, relations, and groups (#$Group). An element of #$Individual may have parts or a structure (including parts that are discontinuous), but no instance of

$Individual can have elements or subsets.
$Collection is the collection of all CYC collections. CYC collections are natural kinds

or classes, as opposed to mathematical sets. Their elements have some common attribute(s). Each CYC collection is like a set in so far as it may have elements, subsets, and supersets, and may not have parts or spatial or temporal properties. Sets, however, dier from collections in that a mathematical set may be an arbitrary set of things which have nothing in common (#$Set-Mathematical). In contrast, the elements of a collection will all have in common some feature(s), some ‘intensional’ qualities. In addition, two instances of #$Collection can be co-extensional (i.e. have all the same elements) without being identical, whereas if two arbitrary sets had the same elements, they would be considered equal.

$Individual and #$Collection are disjoint collections. No CYC constant can be an

instance of both.

$Predicate is the set of all CYC predicates. Each element of #$Predicate is a truth-

functional relationship in CYC which takes some number of arguments. Each of those argu- ments must be of some particular type. Informally, one can think of elements of #$Predicate as functions that always return either true or false. More formally, when an element of

$Predicate is applied to the legal number and type of arguments, an expression is formed

which is a well-formed formula (w) in CycL. Such expressions are called atomic formulas if they contain variables, or ground atomic formulas (gaf) if they contain no variables.

$isa:<#$ReifiableTerm> <#$Collection> expresses the ISA relationship. (#$isa EL

COL) means that EL is an element of the collection COL. CYC knows that #$isa distributes over #$genls. That is, if one asserts (#$isa EL COL) and (#$genls COL SUPER), CYC will infer that (#$isa EL SUPER). Therefore, in practice one only manually asserts a small fraction of the #$isa assertions — the vast majority are inferred automatically by CYC.

$genls:<#$Collection> <#$Collection> expresses similar relationship for collections

(generalization). (#$genls COL SUPER) means that SUPER is one of the supersets of COL. Both arguments must be elements of #$Collection. Again, as with the #$isa, CYC knows that #$genls is transitive, therefore, in practice one only manually asserts a small fraction of the #$genls assertions since the rest is inferred inferred automatically. More details about the structure of the CYC ontology and about how the CYC knowledge base is constructed can be found at http://www.cyc.com.

3.2 Russell & Norvig’s General Ontology Russell & Norvig’大本体

Yet another view of general ontology structure is presented in Russell & Norvig’s book [38]. Every category of their ontology (see figure 2) is discussed in detail on example axioms. An example of this ontology in KIF [18] can be found at http://ltsc.ieee.org/suo/ ontologies/Russell-Norvig.txt.

在Russell & Norvig的书 [38] 中提及了另一种关于大本体结构的观点。每个类别都有各自的本体（见图2），这在例程公理中已详细讨论过了。

这种本体的KIF [18]可以在

 Russell-Norvig.txt (http://ltsc.ieee.org/suo/ontologies/Russell-Norvig.txt) 找到。

(Figure 2)

Figure 2: Russell & Norvig’s general ontology structure [38] 图2：Russell & Norvig的大本体结构 [38]

3.3 Ontology Engineering

3.3 本体工程

Ontology engineering is a field in artificial intelligence or computer science that is concerned with ontology creation and usage. Report [31], that proposes and comments this field, declares that the ultimate purpose of ontology engineering should be “to provide a basis of building models of all things in which computer science is interested”.

本体工程是人工智能或者计算机科学的一个领域, 它关注于本体的建立和使用. 在Report [31]中提出了这一新的领域并对其进行了注解，它宣称本体工程的终极目标应该是"为计算机科学感兴趣的所有事物提供一个建立模型的基础".

3.3.1 Structure of Usage

3.3.1 用法的结构

An ontology can be divided into following subcategories according to [31] from the knowledge reuse and ontology engineering point of view as follows. This is rather a structure of ontologies from a point of view of their usage than a division of one general ontology. Some examples are included.
根据 [31]从知识重用和本体论工程指出的如下观点，本体论可以被分成以下子类。与其说是一个通用本体的分类，不如说是一个通过它们的用途划分的本体结构。包括一些例子。
Workplace Ontology
工作场所本体
This is an ontology for workplace which aects task characteristics by specifying several boundary conditions which characterize and justify problem solving behaviour in the workplace. Workplace and task ontologies collectively specify the context in which domain knowledge is intended and used during the problem solving. Examples from circuit troubleshooting: fidelity, eciency, precision, high reliability. Task Ontology Task ontology is a system of vocabulary for describing problem solving structure of all the existing tasks domain independently. It does not cover the control structure. It covers components or primitives of unit inferences taking place during performing tasks. Task knowledge in turn specifies domain knowledge by giving roles to each objects and relations between them. Examples from scheduling tasks: schedule recipient, schedule resource, goal, constraint, availability, load, select, assign, classify, remove, relax, add.

Domain ontology Domain ontology can be either task dependent or task independent. Task independent ontology usually relates to activities of objects. – Task-dependent ontology A task structure requires not all the domain knowledge but some specific domain knowledge in a certain specific organization. This special type of domain knowledge can be called task-domain ontology because it depends on the task. Examples from job-shop scheduling: job, order, line, due date, machine availability, tardiness, load, cost. – Task-independent ontology Activity-related ontology Object ontology. This ontology covers the structure, behaviour and function of the object. Examples from circuit boards: component, connection, line, chip, pin, gate, bus, state, role. Activity ontology. Examples from enterprise ontology: use, consume, produce, release, state, resource, commit, enable, complete, disable. Activity-independent ontology Field ontology. This ontology is related to theories and principles which govern the domain. It contains primitive concepts appearing in the theories and relations, formulas, and units constituting the theories and principles. Units ontology. Examples: mole, kilogram, meter, ampere, radian. Engineering mathematics ontology. Examples: linear algebra, physical quantity, physical dimension, unit of measure, scalar quantity, physical components. General or Common ontology Examples: things, events, time, space, causality or behaviour, function etc.

3.3.2 Ontology Engineering Subfields

We can also divide the ontology or ontologies from the point of view of ontology engineering as a field. The subjects which should be covered by ontology engineering are demonstrated in [31]. It includes basic issues in philosophy, knowledge representation, ontology design, standardization, EDI, reuse and sharing of knowledge, media integration, etc. which are the essential topics in the future knowledge engineering. Of course, they should be constantly refined through further development of ontology engineering. Basic Subfield – Philosophy(Ontology, Meta-mathematics) Ontology which philosophers have discussed since Aristotle is discussed as well as logic and meta-mathematics.

– Scientific philosophy Investigation on Ontology from the physics point of views, e.g., time, space, pro- cess, causality, etc. is made. – Knowledge representation Basic issues on knowledge representation, especially on representation of ontologi- cal stu, are discussed. Subfield of Ontology Design – General(Common) ontology General ontologies such as time, space, process, causality, part/whole relation, etc. are designed. Both in-depth investigation on the meaning of every concept and relation and on formal representation of ontologies are discussed. – Domain ontologies Various ontologies in, say, Plant, Electricity, Enterprise, etc. are designed. Subfield of Common Sense Knowledge – Parallel to general ontology design, common sense knowledge is investigated and collected and knowledge bases of common sense are built. Subfield of Standardization – EDI (Electronic Data Interchange) and data element specification Standardization of primitive data elements which should be shared among people for enabling full automatic EDI. – Basic semantic repository Standardization of primitive semantic elements which should be shared among people for enabling knowledge sharing. – Conceptual schema modeling facility (CSMF) – Components for qualitative modeling Standardization of functional components such as pipe, valve, pump, boiler, regis- ter, battery, etc. for qualitative model building. Subfield of Data or Knowledge Interchange – Translation of ontology Translation methodologies of one ontology into another are developed. – Database transformation Transformation of data in a data base into another of dierent conceptual schema. – Knowledge base transformation Transformation of a knowledge base into another built based on a dierent ontology. Subfield of Knowledge Reuse – Task ontology Design of ontology for describing and modeling human ways of problem solving.

– T-domain ontology Task-dependent domain ontology is designed under some specific task context. – Methodology for knowledge reuse Development of methodologies for knowledge reuse using the above two ontologies. Subfield of Knowledge Sharing – Communication protocol Development of communication protocols between agents which can behave coop- eratively under a goal specified. – Cooperative task ontology Task ontology design for cooperative communication Subfield of Media Integration – Media ontology Ontologies of the structural aspects of documents, images, movies, etc. are de- signed. – Common ontologies of content of the media Ontologies common to all media such as those of human behavior, story, etc. are designed. – Media integration Development of meaning representation language for media and media integration through understanding media representation are done. Subfield of Ontology Design Methodology – Methodology – Support environment Subfield of ontology evaluation – Evaluation of ontologies designed is made using the real world problems by forming a consortium.

posted on 2008-02-01 21:44 Shaird 阅读(299) 评论(0) 编辑收藏所属分类: AI:General

常用链接

留言簿(2)

随笔档案

文章分类

文章档案

搜索

最新评论

阅读排行榜

评论排行榜

摘要