Actions

Ontolog Forum

Ontology Summit 2020 Whence Working Group

  • What brought the use of graphs in as persistence mechanisms. Will need to address other non-relational persistence mechanisms; historical background; this can include parts of 'why'.

First Draft

Knowledge Graph, Whence Team: Janet Singer, John Sowa, Geroge Hurlburt and Ravi Sharma Whence - Introduction

John Sowa By the 1970s, the developers of database and knowledge base systems recognized that standards for semantics are necessary for systems to process shared data correctly. The database community proposed a conceptual schema for representing the semantics of shared data. For the knowledge bases in artificial intelligence and the Semantic Web, an ontology specified in some version of logic contains equivalent information.

  • The top picture shows the work on conceptual schema and the conceptual schema is identical to what people now call an ontology, same word. So just, and if you look at the figure, it shows the conceptual schema or the ontology as a center for your database applications and user interface. You can see in yellow, the sharing between the applications and the database, and then in the violet is the sharing between the database and the user interface. And then the light blue is sharing between the user interface and the applications and the conceptual schema defines everything. And on the left side of figure, the diagram that comes straight out of the ANSI spark, a 1978 document. And that shows the conceptual schema in the middle, which relates to the external schema that is like the APIs that are used for various programs. And then down at the bottom is the internal schema, that is it!
  • Ravi Sharma: If I look at this Venn diagram, you are defining the knowledge graph in my opinion, because you got database, query, action, sharing, but you don't have a thing called results or findings?
  • John Sowa Replied: What, who, when, where, why, so just finding is just type of a query.

That is the basis for what people were doing in the 1970s and eighties and all of the database systems in the 1980s were based on the ANSI spark conceptual schema work on these schemas. Two basic things that users do is just ask a question to get some information, or they can, click on something to run an application, and then applications, by database, you could say, it's the entire semantic web. The entire worldwide web now is our database. And if you want to, rather than put more, bullet items saying what a database it has evolved from; the databases of the 1970s have evolved into the worldwide web. So, the entire web is a current database.

  • Janet Singer: Labels have changed, and some scopes have changed, but basically the same schema has persisted throughout all these years.
  • John Sowa: An ACM workshop in 1980 brought together researchers in database systems, artificial intelligence, and programming languages. John McCarthy in 1980s introduced First Order Theories on Concepts, Propositions and till 1990s revised and worked on contexts, till 2007 he published “From here to human-level AI”. The Cyc project, founded by Doug Lenat, in 1984, designed and implemented the world’s largest and most detailed formal ontology and reasoning system. Shared Reusable Knowledge Bases (SRKB), a DARPA-sponsored project from 1991 to 1996. Knowledge Interchange Format (KIF) was one of the deliverable. Efforts on KIF and Conceptual Graph were combined by 2007 into ISO/IEC standard 24707 for Common Logic.
  • From John’s URL on IKL one finds Ontology and Knowledge representation related technologies and standards. Semantic networks, by John Sowa in 1992, This article in the Wiley Encyclopedia of AI surveyed the development of network notations for logic and ontology since the 1960s. It relates networks used in AI and machine translation to the foundations in logic, linguistics, and philosophy. This version has been updated with references and discussion of more recent developments in knowledge representation, linguistics, and the Semantic Web.
  • In the following section we will discuss how RDF, RDFS and Logic stacks architecturally matured during 2000-2005 and how Tim Berners Lee, RV Guha, Pat Hayes, Deborah MacGuinness, and Jim Hendler contributed their efforts to developing the Architectural track

Figure 2 is the Figure 4 of Above cited URL (IKL) showing evolution of the Semantic Web, most readers will be familiar with architectural stacks labelled 2000 and 2005.

  • Previous Draft

As a possible starting point on our overview, here are some excerpts from Gutierrez and Sequeda A Brief History of Knowledge Graph's Main Ideas. They give accomplishments and foci for each of the eras followed by ‘realizations’ and ‘limitations’. A related report by Juan Sequeda is at http://www.juansequeda.com/blog/2019/05/11/2019-knowledge-graph-conference-trip-report/ Also Sequeda and others presented Whence related material at the Stanford Course Lectures (CS520) https://web.stanford.edu/class/cs520/

  • Introduction “Those who cannot remember the past are condemned to repeat it”- George Santayana
    • Knowledge Graphs can be considered to be fulfilling an early vision in Computer Science of creating intelligent systems that integrate knowledge and data at large scale. The term “Knowledge Graph” was introduced by researchers at the turn of this century and has rapidly gained popularity in academia and industry since Google popularized it in 2012. It is paramount to note that, regardless of the discussions on, and definitions of the term “Knowledge Graph”, it stems from scientific advancements in diverse research areas such as Semantic Web, Databases, Knowledge Representation and Reasoning, NLP, Machine Learning, among others. … The integration of ideas and techniques from such disparate disciplines give the richness to the notion of Knowledge Graph, but at the same time presents a challenge to practitioners and researchers to know how current advances develop from, and are rooted in, early techniques.
    • (Gary - I like this as an intro piece (above), whichever version of whence we use)
  • How is this paper written?
    • The essential elements involved in the notion of knowledge graphs can be traced to ancient history. If one would like to dig into their origins, several disciplines should be considered, among them mathematics, philosophy, linguistics, and psychology.[2] However, we do not have the time to go back to ancient times[3] and revisit broad areas of science. Thus, from a temporal point of view, we will concentrate on the evolution after the advent of computing in its modern sense (1950s). … We periodized by decades, but are conscious that the boundaries are much more blurry.[4] (Gary - can we make this material below a more condensed table??? We have different parts to the Table - Data and Knowledge. Some of it fits with what is called First Draft above. some may be communicated by an additional graphic showing flow of ideas.)
  • Advent of the digital age (1950s and 1960s)
    • Realizations during the decades of the 50s and 60s:
      • Importance and possibility of automated reasoning.
      • The problem of dealing with large search spaces.
      • Need to understand natural language and other human representations of knowledge
      • Potential of semantic nets (and graphical representations in general) as abstraction layers
      • Relevance of systems and high level languages to manage data.
    • Limitations of contemporary (50s and 60s) techniques:
      • Physical, technical and cost limitations of hardware
      • Gap between graphical representation and linear implementation
      • Gap between the logic of human language and data as handled by computer systems
  • Foundations Data and Knowledge (1970s)
    • Realizations:
      • The need of representational independence, having the relational model as the first example. This approach could also be implemented in practical systems.
      • The need to formalize semantic networks using the tools of formal logic.
      • The possibilities of combining logic and data by means of networks.
    • Contemporary Limitations:
      • On the DATA side, more flexible data structures were needed to represent new forms of data giving rise to Object Oriented and Graph data structures.
      • On the KNOWLEDGE side, more understanding was needed on the formalization of knowledge in logic giving rise to Description Logics.
  • Managing Data and Knowledge (1980s)
    • Realizations:
      • Combining logic and data needs to be tightly coupled (not just layer prolog/expert system on top of a database)
      • Tradeoff between expressive power of logical languages and computational complexity of reasoning tasks
    • Contemporary Limitations:
      • Negation was a killer. It was not well understood at this time.
      • Reasoning at large scale was still hard. Hardware was not going to be up to the task.
      • Realization of what would be known as the knowledge acquisition bottleneck
  • Data, Knowledge and the Web (1990s)
    • Realizations:
      • The Web was rapidly starting to change the world of data, information and knowledge
      • New types of data were spreading (particularly media: images, video, voice)
      • Data needs to be (and now can be) connected to get value
    • Limitations:
      • Computational power to handle the new levels of data produced by the Web
      • Pure logical techniques have complexity bounds that made infeasible scalability
  • Data and Knowledge at Large Scale (2000s)
    • Realizations
      • We learned to think about data and knowledge in a much bigger way (at Web scale)
      • Entering the era of Neural Networks due to new hardware and clever learning techniques
    • Limitations
      • Do not know how to integrate logical and statistical views
      • Statistical methods (particularly in neural networks) do not provide information about the process of “reasoning” or “deduction”, which generates problems in areas where explanation is needed
  • Where are we now?
    • Throughout this history, we observed two important threads:
      • Represent and manage data and knowledge at large scale
      • Integrate the most diverse, disparate and almost unlimited amount of sources of data and knowledge (structured data text, rules, images, voice, videos, etc.).
      • Furthermore, all of this must be available and accessible for “normal” users.
    • In 2012, Google announced a product called the Knowledge Graph, which is based on representing data in the form of graph connected with knowledge. … Later on, a myriad of companies ( e.g. Microsoft, Facebook, IBM) and organizations started to use the Knowledge Graph keyword to refer to the integration of data given rise to entities and relations forming graphs.[37] Academia began to use this keyword to designate loosely systems that integrate data with some structure of graphs, a reincarnation of the Semantic Web and Linked Data.

Discussion on 20200624 John Sowa Neuro-symbolic

  • Subsequently several dozens of email inputs have been received but thse are covered in the "What" sections of Knowledge Graphs (KG).

Good material on KG history to digest and incorporate

  1. It turns out Google was *not* the first to introduce even the KG label with their blog post announcement in 2012 https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html This promotional post included no specifics, but has still been widely cited as authoritative and original; it is cited in the WP article on KGs https://en.m.wikipedia.org/wiki/Knowledge_Graph
  2. According to https://research.utwente.nl/en/publications/25-years-development-of-knowledge-graph-theory-the-results-and-th, Knowledge graph theory was initiated [in 1982] by C. Hoede, a discrete mathematician at the University of Twente and F.N. Stokman, a mathematical sociologist at the University of Groningen, both in the Netherlands. Abstract: “The project on knowledge graph theory was begun in 1982. At the initial stage, the goal was to use graphs to represent knowledge in the form of an expert system. By the end of the 80's expert systems in medical and social science were developed successfully using knowledge graph theory. In the following stage, the goal of the project was broadened to represent natural language by knowledge graphs. Since then, this theory can be considered as one of the methods to deal with natural language processing. At the present time knowledge graph representation has been proven to be a method that is language independent. The theory can be applied to represent almost any characteristic feature in various languages. The objective of the paper is to summarize the results of 25 years of development of knowledge graph theory and to point out some challenges to be dealt with in the next stage of the development of the theory. The paper will give some highlight on the difference between this theory and other theories like that of conceptual graphs which has been developed and presented by Sowa in 1984 and other theories like that of formal concept analysis by Wille or semantic networks.”
  3. Below is an excellent article that does cite the work by Hoede, etc. al., and discusses the problem for the community of not having a clear definition of KGs. Abstract: “Recently, the term knowledge graph has been used frequently in research and business, usually in close association with Semantic Web technologies, linked data, large-scale data analytics and cloud computing. Its popularity is clearly in- fluenced by the introduction of Google’s Knowledge Graph in 2012, and since then the term has been widely used with- out a definition. A large variety of interpretations has ham- pered the evolution of a common understanding of knowledge graphs. Numerous research papers refer to Google’s Knowl- edge Graph, although no official documentation about the used methods exists. The prerequisite for widespread academic and commercial adoption of a concept or technology is a common understanding, based ideally on a definition that is free from ambiguity. We tackle this issue by discussing and defining the term knowledge graph, considering its history and diversity in interpretations and use. Our goal is to propose a definition of knowledge graphs that serves as basis for discussions on this topic and contributes to a common vision.” https://www.researchgate.net/profile/Wolfram_Woess/publication/323316736_Towards_a_Definition_of_Knowledge_Graphs/links/5a8d6e8f0f7e9b27c5b4b1c3/Towards-a-Definition-of-Knowledge-Graphs.pdf

Whence Chat notes 20200624

We had a productive conversation but did not take notes: we were relying on making a transcription of the recording but unfortunately lost the it due to internet connection problems. The topics covered were:

  1. Follow-on discussion to the presentation by George Hurlburt, including parallels between the history of general systems research and the history of semantic technologies, where pieces of earlier more comprehensive work are forgotten, to be rediscovered/re-branded often in simpler form. John related the story of how DL focus superseded the original call for a more sophisticated unifying logic layer in the semantic web vision.
  2. Discussion of “Good material” insights above underscored the need to develop a more coherent story about what led to the KG efforts of today so the communiqué doesn’t end up adding to the false historical narrative crediting Google with some landmark innovation in 2012.
  3. The two “whence” presentations we have are John’s and Chaitanya Baru’s, with the latter focused on the very recent era of the NSF OKN initiative. In order to present a clear summary story for the communiqué, we can draw on those but need agreement among ourselves on what KGs are in relation to database architectures, knowledge bases, conceptual schemas, ontologies, reasoners, and technologies for natural language processing, entity extraction, machine learning, etc. In John’s view KGs are identified with RDF, but he agreed it’s best not to identify a general definition with a specific technology. However the level of expressiveness of DL as opposed to other logics should be brought out.
  4. We agreed with Gary that it would be good to have a standard architecture diagram to refer to throughout the communiqué. Before next meeting Monday, Janet will collect candidate diagrams including Gary’s and others from John, Ravi and George, plus characterizations of the related concepts, above.