Ontology Summit 2007: OntologySummit2007_Communique Draft

NIST, Gaithersburg, MD

April 24, 2007

Introduction

Under the appellation of "ontology" are found many different types of artifacts created and used in different communities to represent entities and their relations for purposes including annotating datasets, supporting natural language understanding, integrating information sources and to serve as a background knowledge in various applications.

The Ontology Summit 2007 "Ontology, Taxonomy, Folksonomy: Understanding the Distinctions" is an attempt to bring together various communities (computer scientists, information scientists, philosophers, domain experts) having a different understanding of what is an ontology, and to foster dialog and cooperation among these communities.

In practice, the name ontology covers a spectrum of artifacts, from formal upper-level ontologies expressed in first order logic (e.g., Basic Formal Ontology (BFO) and DOLCE) to the simple lists of user-defined keywords used, for example, to annotate resources on the Web. The latter are called "folksonomies" and play an important role in the Web 2.0. In between the two extremities of the ontology spectrum are taxonomies and controlled vocabularies (e.g., MeSH), often used for information indexing and retrieval, and whose organization of mostly hierarchical. Finally, there are richer ontologies, often based on formalisms such as frames or description logics, representing not only subsumption relations, but also other kinds of relations among entities (e.g., functional, physical.) Examples of such ontologies in the biomedical domain include the Foundational Model of Anatomy, SNOMED CT and the NCI Thesaurus.

The goal of the Ontology Summit is not to establish a definitive definition of the word "ontology", which has proved extremely challenging due to the diversity of artifacts it can refer to. Rather, we propose to identify a limited number of key dimensions along which ontologies can be characterized and to provide operational definitions for these dimensions. The relative position of ontologies in the space defined by these dimensions, the "Framework", is indicative of the similarities and differences between these ontologies. The Framework has been applied to the characterization of a dozen ontologies, whose descriptions were collected through a survey.

History

The ontology summit is an outgrowth of the work and discussions of the of the Ontolog Forum. Last year the Ontology Summit was concerned with an examination of Upper Ontologies. This year the Ontology Forum was concerned with characterizing a wide variety of ontology and ontology-like activities.

The Framework Dimensions

One major goal of the Ontology Summit 2007 was to bring together the various diverse communities working on ontology-like activities so as encourage cooperative efforts. Toward this end the summit has attempted to characterize what is an ontology, e.g., to construct a typology of ontologies. The framework of dimensions is comprised of two groups: semantic dimensions and and pragmatic dimensions. Semantic dimensions include expressiveness, structure, and representational granularity. Pragmatic dimensions include intended use, use of automated reasoning, and prescriptive vs descriptive.

Expressiveness is a property of the knowledge representation language which describes the extent and ease with which the KRL can describe increasingly complex semantics, cf. propositional logic, description logic(s), first order logic, sorted logics, modal logics, ...

Structure is a property of the ontology, which records how elaborate (or well organized) are the semantics encoded by the ontology. It may be the same as the expressiveness of the KRL in which the ontology is encoded, or it may be less the expressiveness of the knowledge representation language. Thus a simple taxonomy, e.g., a tree, may be encoded in RDF, a description logic language such as OWL-DL, or first order logic, e.g., Common Logic. Viewed from a graph theoretic perspective level of structure might be either a simple set of terms (glossary), a tree structures (taxonomy), a directed acyclic graph, e.g., a partial order (faceted classificiation schemes), or an arbitrary directed graph (e.g., RDF).

The granularity dimension concerns the level of detail at which the ontology is specified. A crude measure of granularity measure would be the number of concepts (nodes) and the number relation instances (links or edges in graph representations). However, this fails to recognize that some ontologies may have larger scopes (domains) than others. A coarse grained ontology might be suitable for use as an upper ontology, or a broad subject index while a fine-grained ontology (such as SNOMED CT with 300K concepts) may be better suited for encoding medical diagnoses.

Intended use is the dimension which records the orginal purpose(s) of the ontology. These may include semantically informed search, data semantics specification for databases or data entry, data integration across multiple data sources, agent communication languages, controlled vocabularies for recording medical diagnoses, etc.

Automated reasoning is a dimension which records the extent to which it is anticipated that an ontology will be used by automated reasoning software, e.g., for question answering, etc. If so, then one would expect that the ontology would likely be encoded as using some form of logic, e.g., First Order Logic.

Prescriptive vs. Descriptive is a dimension which characterizes whether the intent of the ontology developer is simply to describe contemporary semantic usage without much regard as to the scientific correctness of the encoded knowledge (e.g., a whale might (in common parlance) be described as a large fish. Examples of such descriptive ontologies include folksonomies and most linguistic ontologies. Alternatively, an ontology may be intended as a normative prescriptive document whose correctness is considerable concern, e.g., a whale is a mammal not a fish. Other prescriptive ontologies include medical diagnostic terminologies, legal or regulatory ontologies, accounting ontologies, mathematical or engineering ontologies, etc.

The governance dimension is concerned with how decisions concerning the structure and (esp.) content of an ontology are made. There was agreement at the summit that ontology developers need to defer to existing legal, regulatory, and professional organizations concerning the natural language definitions of concepts and semantic relationships. Ontology development should be viewed as an effort to organize and formalize concept definitions and relationships which are conventionally defined by existing institutions, not as an attempt to replace existing definitions with de novo definitions generated by autonomous computer scientists. As a corollary, it was observed that it is necessary to record the provenance of every definition, etc. incorporated into an ontology, e.g., the controlling legislation, regulation, standard, etc. from which a definition is taken.

Folksonomies and Formal Ontologies

One of the issues discussed was the relationship between social tagging and folksonomies and more traditional structured / formal ontologies such as taxonomies and axiomatized ontologies. Until recently these efforts have been viewed as competitive approaches. The consensus of the Ontology Summit was that social tagging efforts should be viewed as large scale corpora to be used for inferring and validating more formal ontologies, akin to the use of large text corpora in computational linguistics studies. In addition, more formal ontologies can be used to inform social tagging by providing improved tag sets, and faceted tagging.

Ontologies as software artifacts

Tom Gruber and Paola Di Maio both argued that ontologies should be considered a type of software artifact, and that ontological engineering should be thought of as a discipline akin to software engineering or database design - i.e., a standard component of the software professional's toolkit, taught routinely to every CS student.

Survey

In order to elicit the distinctions between various kinds of ontologies, an interactive study was designed and posted on the Web in order to engage various communities. The respondents were invited to identify the community of which they are a representative and to describe the value of ontologies, as well as issues with ontologies in this community. The last section of the survey invites the respondents to describe and characterize the ontologies or related artifacts in use in this community.

Over fifty respondents from 24 communities submitted entries to the survey. The best represented communities were Formal ontology, Applications development, Standards development, Web 2.0 and Biomedicine. 41 terms were identified as closely related to ontology, including formal ontology, upper ontology, concept system and controlled vocabulary. Some 70 ontologies from a variety of domains were characterized in the survey, including formal ontologies (e.g., BFO, DOLCE, SUMO), biomedical ontologies (e.g., Gene Ontology, SNOMED CT, UMLS, ICD), thesauri (e.g., MeSH, National Agricultural Library Thesaurus), folksonomies (e.g., Social bookmarking tags), general ontologies (WordNet, OpenCyc) and specific ontologies (e.g., Process Specificatin Language). The list also includes markup languages (e.g., NeuroML), representation formalisms (e.g., Entity-Relation model, OWL, WSDL-S) and various ISO standards (e.g., ISO 11179). This sample clearly illustrates the diversity of artifacts collected under "ontology".

This is the proposed first draft.

See the finalized and adopted version of the document at: OntologySummit2007_Communique