Ontolog Forum
The following is a preliminary summary of material about this which may also be useful for the Summit Communique. Material is roughly organized as follows:
- The History of Commonsense Knowledge and Explanations in AI.
- Additional Views of Explanation
- An overview of Building Applications
- Challenges and issues to investigate
- Preliminary findings
Commonsense and Explanation (Preliminary Synthesis) Commonsense reasoning and knowledge (CSK) was prominently featured as an early part of how Artificial Intelligence (AI) was conceptualized. It was assumed to be important in development and enhancement of human-like, intelligent systems including the ability of system to explain their reasoning and what they know. Commonsense research, since its inception, has has focused on studying the consensus reality, knowledge, causality, and rationales that are ubiquitous in everyday thinking, speaking and perceiving as part of ordinary human interaction with the world. And as the name suggest this knowledge and reasoning is common and available to the overwhelming majority of people, and manifest in human children's behavior by the age of 3 or 5. Commonsense and the ability to get along in an ordinary world was assumed the original Turing Test for example as discussed in Session 1 bFTy Michael Gruninger. Some examples of CSK include the types:
- Taxonomic: Cats are mammals
- Causality: Eating a pear makes you less hungry
- Goals: I don't want to get hot so let's find shade
Spatial: You often find a toaster in the kitchen
Functional: You can sit on park benches if tired
Planning: To check today's weather look in a paper Linguistic: "can't" is the same as “can not” Semantic: dog and canine have a similar meaning More recently with ML capabilities of describing images a visual Turing Test might involve question-answering based on real-world images, such as detecting and localizing instances of objects and relationships among the objects in a scene.
- What is Person1 carrying?
- What is Person2 placing on the table?
- Is Person3 standing behind a desk?
* Is Vehicle 1 moving?
This page summarizes some of the research ideas discussed in our commonsense reasoning and knowledge as part of the 2019 Ontology Summit on Explanations. While both CSK & explanation are current topics of interest in AI and Machine Learning (ML) they are related. The current emphasis on explanation for example, grows in part out of the opacity of Deep Learning (DL) solutions for such things are labeling images or translating text (as discussed in more detail in other sessions of this Summit). These efforts motivate some opportunities for related work on commonsense that may be supportive. Of note this session on CSK examined issues around both commonsense and explanations particularly as they have been developing under the influence of modern machine learning (ML) and Deep Learning (DL) models. In part the excitement around ML grows out of the impact of big data and the recognition that to move forward we must automate activities that currently require much manual intervention. Compounding the challenge of explainability is the fact that current, rapid advances in AI include these neural nets and deep learning (DL) approaches. Among these efforts are the interestingly named Long short-term memory (LSTM) which is a particular artificial recurrent neural network (RNN) architecture (where output from previous step are fed as input to the current step) used in the field of deep learning. These were developed to deal with the exploding and vanishing gradient problems that can be encountered when training traditional RNNs. LSTMs have proven a useful NN architecture for processing sequential data such speech/discourse or a series of visual images. However, understanding a series of images in a standard American movie requires a viewer to make numerous inferences such as the intentions of characters, what caused one action to follow another, the nature of physical objects, the roles that character play etc. (Davis and Marcus, 2015). Thus image ID application extensibility raises questions of what such systems really know and what they can say about their knowledge and judgments. It still seems true as Davis and Marcus (2015) assert that most NLP tasks can be carried out purely in terms of manipulating individual words or short phrases, without attempting any deeper understanding; commonsense knowledge is evaded, in order to focus on short term results. But it is difficult to see how human-level understanding can be achieved without greater attention to commonsense such as how time in expressed in common terms like "shortly after" which means very different things to a child or an adult or a geological reference versus a seasonal one.
1. Some History and Background to Commonsense and Explanation in AI There is a long history showing the relevance of commonsense knowledge & reasoning to explanation. Certainly AI founders, such as John McCarthy, believed so and argued that major long-term goal of AI has been to endow computers with standard commonsense reasoning capabilities. In “Programs with Common Sense” (McCarthy 1959) described 3 ways for AI to proceed: 1. imitate the human CNS, 2. study human cognition or 3. “understand the common sense world in which people achieve their goals.” Another, related goal, has been to endow AI systems with NL understanding and production. It is easy to see that a system with both CSK and a NL facility would be able to provide smart advice as well as explanation of this advice. We see the relation of this in the early conceptualization of a smart advice taker system from McCarthy's work that it would have causal knowledge available to it for: “a fairly wide class of immediate logical consequences of anything it is told and its previous knowledge.”
McCarthy further noted that this useful property, if designed well, would be expected to have much in common with what makes us describe certain humans as “having common sense.” He went on to use this idea to define CSK - “We shall therefore say that a program has common sense if it automatically deduces for itself a sufficiently wide class of immediate consequences of anything it is told and what it already knows”
In practice back in the 70s and 80s AI systems were not as the founders envisioned, They wee brittle with handcrafted production rules that encoded useful info about diseases, for example. In practice this rule knowledge was fragmented and opaque & would break down in the face of obvious errors due in part to a lack of common sense. It also came with a very simple, technical but not commonsense idea of what was called an “explanation facility.” This was a technical/proof trace of rule firings which was thought to provide an explanation. Proofs found by Automated Theorem Provers can provide a map from inputs to outputs. Indeed in a narrow, logical sense the “Gold Standard” concept of explanation is deductive proof done using a formal knowledge representation (KR). Indeed there are multiple formalizations of deductive models involving some form of knowledge and reasoning. But as made clear in out sessions there are other forms, styles or meanings of explanation. One concerns the provenance or source of some fact or statement. For example, “fact sentence F41 was drawn from document D73, at URL U84, in section 22, line 18.” This makes clear what is the documented source of data. That's important too and allows follow up. Another type of explanation is the transparency of the algorithmic inference process – is it using sub-classing or something more complex like abduction? In some cases it is very hard for a human to understand how the explanation was arrived at since the underlying structure is hidden. Other structuring of inference in presentation is possible using a clausal resolution graph or a Bayes net graph. But an important question to ask is “do these make something clear?” They may provide the “how” an answer was arrived at in steps and which rules were involved, but not the justifying “why” of a satisfactory explanation. If a tree structure is involved in an explanation process we might get more of a “why” understanding with the possibility of drilling down and browsing the tree, having a focal point of attention on critical information or have the option of displaying a graphic representation that a human can understand. An example provided by Niket Tandon concerns a vehicle controller AI system explanation of driving based on visual sensing. The system describes itself as “moving forward” as an activity while a human description is the more functional and common one of “driving down the street.” As to explanations the system says, “because there are no other cars in my lane” while the human explanation is “because the lane is clear.” These are similar but “clear” is a more comprehensive idea of a situation which might include construction, trees etc. A more elaborate example offered in how a smart system explains judgments of “what is a healthy meal.” As shown in the Figure below justifying explanations may point to specific items or qualify the overall variety of items in the meal in a multi-modal way. Commonsense assumptions and presumptions in a knowledge representation may be an important aspect of explanations and serve as a focus point. Ability to focus on relevant points may be part of the way a system is judged competent as is it's perceived correctness – that it provides a good answer and a good explanation. Involved in such a judgment may also be evaluation of ethicality, fairness, and, where relevant, legality and various roles involved such as.
Relational role Processual role and Social role
An example of this is that the role of legal advice is different in the context of a banking activity compared to that of lying under oath. Part of the reason for limited explanations if not brittleness, was discovered by Clancey (1983) who found that Mycin's individual rules play different roles, have different kinds of justifications, & are constructed using different rationales for the ordering & choice of premise clauses. Since this knowledge isn't made explicit it can't be used as part of explanations. And there are structural and strategic concepts which lie outside early AI system rule representations. These can only be supported by appealing to some deeper level of (background) knowledge. One solution approach taken was to use ontologies to capture and formalize this knowledge. This made the argument that ontologies are needed to make explicit structural, strategic & support knowledge which enhanced the ability to understand & modify the system (e.g. knowledge debugging as part of KB development) as well as support suitable explanations. To some extent efforts like CYC which started up in the 80s was an effort to avoid these problems by providing a degree of commonsense and modular knowledge. CYC can provide a partial inference chain constructed in response to queries such as: “Can the Earth run a marathon?” In terms of a commonsense explanation we have a “no” because the Earth is not animate and the role capability of running a marathon is detailed by the knowledge in a sports module (called a micro-theory or MT). Around this time the issue for the development and application of ontologies was that the commonsense context was seldom “explicitly stated” and is difficult to do. But the need a formal mechanism for specifying a commonsense context had become recognized and some approach to it, such as CYC' microtheories arose. These descended from J. McCarthy's tradition of treating contexts as formal objects over which one can quantify and express first-order properties. In the 80s CYC-type knowledge was also seen as important to associate systems which was arguing that “ systems should not only handle tasks automatically, but also actively anticipate the need to perform them....agents are actively trying to classify the user’s activities, predict useful sub-tasks and expected future tasks, and, proactively, perform those tasks or at least the sub-tasks that can be performed automatically.” Wahlster and Kobsa (1989). Around this time another influential CSK development was Pat Hayes' “Naive Physics Manifesto” (1978) which proposed to develop a formal theory encompassing the large body of knowledge of the physical world. The vision was of an axiomatically dense KB using a unified conceptualization whose axioms were represented in symbolic logic. Emphasis was on formalizing foundational common “everyday” physical concepts, including: measurements and scales; spatial concepts (shape, orientation, direction, containment); substances and states (solids, liquids), physical forces, energy and movement; manufactured objects and assemblies. Much ontological work has followed the spirit of this idea if not the exact program outlined. More recent work, reflecting the ability of ML systems to learn about visual information and even text has lead to more distinctions being made about CS knowledge. An example (see Figure) provide by Tanden et al (2018) distinguished the visual modality expressing a type of knowledge (is it a simply observed property like color or some simple relation like part) from implications (shiny things imply smoothness and so less friction). As shown in the Figure below properties of agents such as emotions in difficult circumstances, while commonly known, are more implicit as are the actions involved in fixing a tire or avoiding a traffic jam.
The KB would be supported by commonsense reasoners which might include answering “why” questions in an understandable way.
2. Additional Concepts of Explanations & Commonsense Understandings
It is worth noting in passing that CSK has been discussed in prior Ontology Summits.
In particular the 2017 Summit on AI, Learning, Reasoning, and Ontologies and its track on "Using Automation and Machine Learning to Extract Knowledge and Improve Ontologies is relevant but not all of it will not be discussed here. One prominent example from 2017 was NELL (Never-Ending Language Learner). Central to the NELL effort is the idea that we will never truly understand machine or human learning until we can build computer programs that shares some similarity to the way human learn. In particular such systems, like people learn:
many different types of everyday knowledge or functions and thus many contexts,
from years of diverse, mostly self-supervised experience,
in a staged curricular fashion, where previously learned knowledge in one context enables learning further types of knowledge, using self-reflection and the ability to formulate new representations and new learning tasks enable the learner to avoid stagnation and performance plateaus.
As reported at the 2017 Summit NELL has been learning to read the web 24 hours/day since January 2010, and so far has acquired a knowledge base with over 80 million confidence weighted beliefs (e.g., servedWith(tea, biscuits)). NELL has also learned millions of features and parameters that enable it to read these beliefs from the web. Additionally, it has learned to reason over these beliefs to infer (we might say using commonsense reasoning) new beliefs, and is able to extend its ontology by synthesizing new relational predicates. NELL learns to acquire two types of knowledge in a variety of ways. It learns free-form text patterns for extracting this knowledge from sentences on a largescale corpus of web sites. NELL also exploits a coupled process which learns text patterns corresponding type and relation assertions, and then applies them to extract new entities and relations. In practice it learns to extract this knowledge from semi-structured web data such as tables and lists. In the process it learns morphological regularities of instances of categories, and it learns probabilistic horn clause rules that enable it to infer new instances of relations from other relation instances that it has already learned. Reasoning is also applied for consistency checking and removing inconsistent axioms as in other KG generation efforts.
NELL might learn a number of facts from a sentence defining "icefield", for example:
- "a mass of glacier ice; similar to an ice cap, and usually smaller and lacking a dome-like shape; somewhat controlled by terrain."
In the context of this sentence and this new "background knowledge" extracted it might then extract supporting facts/particulars from following sentences: "Kalstenius Icefield, located on Ellesmere Island, Canada, shows vast stretches of ice. The icefield produces multiple outlet glaciers that flow into a larger valley glacier." It might also note not only the textual situation relating extracted facts but the physical location (e.g.Ellesmere Island) and any temporal situations expressed in these statements. A context that is important is that AI systems increasingly use advanced techniques such as deep learning which may in turn require some additional techniques to make them more understandable to humans and system designers as well as trusted.
Some factors making CSK (and explanation) more understandable (and hence trusted) were mentioned by Benjamin Grosof as part of his session presentation. These include:
Influentiality which that heavily weighted hidden nodes and edges can affect discrimination of output, in a deep learning neural network (NN) or that some constructed approximation of an output, even if partial might provide a more understandable explanation. Such approximations may represent the top-3 weighted sub-models within some overall ML ensemble model. There is also the concept of “Model the model” wherein the learned NN model can then be learned as a secondary and simpler decision-tree model. Lateral relevance which involves interactivity for neighborhood, centrality, or top-k within categories exploration within graphs structures. 3. Contemporary Applications & Benefits of Explanation (that make sense) There is an obvious benefit if semi- or fully-automatic explanations can be provided as part of decision support systems. This seems like a natural extension of some long used and understood techniques such as logical proofs. Benefits can be easily seen if rich and deep deductions could be supported in areas regarding policies and legal issues, but also as part of automated education and training, such as e-learning. An example of this is the Digital Socrates application developed by Janine Bloomfield of Coherent Knowledge. The interactive tutor system will provide an answer to a particular topical question but also provides the logical chain of reasoning needed to arrive at the correct solution. Knowledge reuse and transfer is an important issue in making such systems scalable. Some Examples from Industry Using Explanation Technology Among the examples offered by Benjamin Grosof were:
Coherent Knowledge – previously mentioned as providing eLearning systems with semantic logical deductive reasoning so proofs provide a natural deduction in programs using declarative, extended logic. But to make explanations understandable the systems employ NL generation along with a drill-down capability and interactive navigation of the knowledge space. Provenance of the knowledge is also provided.
Tableau Software provides a different ability with specialized presentation of information via bar etc. charts Kyndi – provides a more cognitive search in NL knowledge graphs (KGs). Here the capabilities include the focus on relevant knowledge including lateral relevance and provenance within an extended KG that is constructed using a combination of NLP + ML + knowledge representation and reasoning (KRR) A best practice architecture (see Figure) and functional example offered by Grosof et al (2014) concerning Automated Decision Support for Financial Regulatory/Policy Compliance using Textual Rulelog software technology implemented with ErgoAI Suite. The system encodes regulations and related info as semantic rules and ontologies which support fully, robustly automate run-time decisions and related querying.
Figure here
Understandable full explanations are provided in English reflecting a digital audit trail, with provenance information.
Rulelog helps handle increasingly complexity of real-world challenges such as found in data & system integration involving conflicting policies, special cases, and legal/business exceptions.
For example it understands of regulated prohibitions of banking transactions where a counterparty is an “affiliate” of the bank.
Notable, Textual Rulelog (TR) extends Rulelog with natural language processing and uses logic to help do both text interpretation and text generation. With a proper use of rules mapping is much simpler and closer than with other KR’s and rulelog’s high expressiveness is much closer to NL’s conceptual abstraction level. As an added feature the system allowed what-if scenarios to analyze the impact of new regulations and policy changes.
4.Issues and Challenges Today in the Field of CSK (and Explanation) The Issue of Making the Case for CSK Modern ML and Deep Learning (DL) techniques have proven powerful for some problems such as in the field of computer vision such as employed for navigation or image identification. Research now reliable shows the value of transfer training/learning as part of this. So in pre-training a neural network model works on a known task, using stored images from a general source like ImageNet. The resulting rained neural network (with an implied “model”) used for new, but related, purpose-specific model. Of course it can be difficult to find training data for all types of scenarios and specific situations of interest. There are problems of representativenes & the seduction of the typical to some generalizations such “shiny surfaces typically hard, but some are not. There is the problem of perspective. The moon in the sky and a squirrel under a tree, which may be in the same image may seem the same size, but we know from experience that they are at different distances. Many cognitive abilities developed in the first years of life provide the commonsense knowledge to handle these and problems like conservation of objects - if I put my toys in the drawer, they will still be there tomorrow. One may imagine handling such problems by building in commonsense knowledge or letting a system have training experience with conservation over time or place. These types of problems also arise with situational understanding – some important things are unseen but implied in a picture as part of the larger or implied situation such as an environmental or ecological one with many dependencies. An example offered by Niket Tandon was the implication of an arrow in a food web diagram which communicates “consumes” to a human (a frog consumes bugs). The problem for a deep net is that it is unlikely to have seen arrows used visually this way enough to generalize a “consumes” meaning. More recently related research has demonstrated that a similar “training” technique can be useful for many natural language tasks. An example is Bidirectional Encoder Representations from Transformers (or BERT). So there is some general usefulness here but the black box nature and brittleness/deistractiability of the NN knowledge calls for some additional work. For example, as speaker Niket Tandon suggested, adversarial training (see Li, Yitong et al, 2018) during learning could help with mis-identification and commonsense considerations is one way to generate adversarial data. There remain many problems with ordinary text understanding such as the implications and scope of negations and what is entailed. For example (Kang et. Al 2018,) showed the problems of textual entailment with sentences from the Stanford Knowledge Language Inference set and how guided examples of an adversarial commonsense nature could help reduce errors to sentences like “ The red box is in the blue box.” And we can borrow from the previous ecological example as a language understanding example. A trained NN would not have seen insects, frogs & raccoons in one sentence frequently. To a human the use of an arrow as indicating consumption may be communicated in one trial situational learning sentence – “this is what we mean by an arrow”. So again the higher level of situations and situational questions may require commonsense, to understood the circumstances. Handling focus and scale is another problem in visual identification. In a lake scene with a duck a ML vision system may see water features like dark spots as objects. Here as Niket Tandon argued deep neural nets can easily be fooled (see: evolvingai.org/fooling) so there may be a need for a model of the situation and what is the focus of attention – a duck object. Some use of commonsense as part of model-based explanations might help during model debugging and decision making to correct the apparently unreasonable predictions. In summary Niket Tandon suggests that the above reasons, commonsense aware models & representations may be assumed to be DL friendly may provide several benefits especially for the NN and DL efforts underway. They may: • help to create adversarial learning training data (for a DL) • help to generalize to other novel situations, alternate scenarios and domains • compensate for limited training data • be amicable to and facilitate explanation e.g. with intermediate structures
In the sub-sections below these points are further illustrated by some reasons for the challenges to DL.
To understand situations (hat exactly is happening?) a naive computational system has to track everything involved to a situation/event. This may involve a long series of events with many objects and agents. The ecological example provided before is illustrative as is visualizing a play in basketball even as simple as a made or missed dunk. Images of activities can be described by some NL sentences 1-3 He charges forward. And a great leap. He made a basket
But this may be understood in terms of some underlying state-action-changes. There are a sequence of actions such as jumping and there are associated but also implied states (1-3): 1. The ball is in his hands. (not actually said, but seen and important for the play) 2. The player is in the air. (implied by the leap) 3. The ball is in the hoop. (technically how a basket is made)
We can represent the location of things (1-3) simple as:
Location (ball) = player's hand Location (player) = air Location (ball) = hoop
The thing is that these all fit into a coherent action with the context of basketball and we know and can focus on the fact that the location of the ball at the end of the jump is a key result. On other hand as shown by Dalvi, Tandon, Clark a naive training activity it is expensive to develop a large train set for such activities and resulting state-action-change models have so many possible inferred candidate structures (is the ball still in his hand? Maybe it was destroyed.) that common events can evoke an NP complete problem. And without sufficient data (remember it is costly to construct) the model can produce what we consider absurd, unrealistic choices based on commonsense experience such as the player being in the hoop. But it is possible as shown in the images below.
A solution is to have a commonsense aware system that constrains the search for plausible event sequences. This is possible with the design and application of a handful of universally applicable rules. For example these constraints seem reasonable based on commonsense: • An entity must exist before it can be moved or destroyed. (certainly not likely in basketball) • An entity cannot be created if it already exists. In the work discussed by Niket Tanden these constrains were directly derivable from SUMO rules such as: MakingFn, DestructionFn, MotionFn. This provides preliminary evidence that ontologies even early ones such as SUMO could be good guides for producing a handful of generic hard constraints in new domains. So how much help do these constraints provide? Commonsense biased search improves precision by nearly 30% over State Of The Art DL efforts - Recurrent Entity Networks (EntNet) (Henaff et al., 2017). Query Reduction Networks (QRN) (Seo et al., 2017) and ProGlobal (Dalvi et al., 2018).
Deep learning can provide some explanations of what they identify is simple visual Dbs such as VQA and CLEVR. They can answer questions like” What is the man riding on?” in response to being shown the figure below:
Commonsense knowledge is more important when the visual compositions are more dynamic and involve multiple objects and agents such as in VCR.
4.1. How to acquire CSK As noted earlier it can be costly but some ruling constraints can be derived from existing ontologies. An extraction process from text using AI, ML or NLP tools may be less costly but much noisier and has the problem of placing knowledge in situational context. In some advanced machine learning cases prior knowledge (background knowledge) may be used to judge the relevant facts of an extract, which makes this a bit of a bootstrapping situation. However much of what is needed may be implicit and inferred and is currently only available in unstructured and un-annotated forms such as free text. NELL is an example of how NLP and ML approaches can be used to build CSK and domain knowledge but source context as well as ontology context needs to be taken into account to move forward. It seems reasonable that the role of existing and emerging Semantic Web technologies and associated ontologies is central to making CSK population viable and that some extraction processes using a core of CSK may be a useful way of proceeding. Another problem is that there are know biases such as reporting bias involved in common human understanding. Adding to this is the knowledge exist in multi-modal and related forms which makes coverage challenging across different contextual situations. Some of these contextual issues were discussed as part of the 2018 Summit.</p?
4.2. Confusion about explanation and commonsense concepts It is perhaps not surprising that this notable among non-research industry and the main street, but also the technical media covering computation and systems but not AI. Like many specialties this needs to be addressed first in the research community where useful consensus can be reached and a common way of discussing these be developed and scaled down to be understood by others.
4.3 The Range of Ontologies Needed for an Adequate CSK Michael Gruninger's work suggested some significant types of ontologies the might be needed to support something as reasonable as a Physical Embodied Turing Test? The resulting suit is called PRAxIS (Perception, Reasoning, and Action across Intelligent Systems) with the following components: SoPhOs (Solid Physical Objects) Occupy (location -Occupation is a relation between a physical body and an spatial region { there is no mereotopological relationship between spatial regions and physical bodies) PSL (Process Specification Language) ProSPerO (Processes for Solid Physical Objects) OVid (Ontologies for Video) FOUnt (Foundational Ontologies for Units of measure)
4.4 Mission creep, i.e., expansivity of task/aspect Again this is a common phenomena in AI among theorists and in hot, new areas like Deep Learning. One sees the expansion of the topic, for example, at IJCAI-18 workshop on explainable AI a range of topical questions drawn from several disciplines, including cognitive science, human factors, and psycholinguistics, such as: how should explainable models be designed? How should user interfaces communicate decision making? What types of user interactions should be supported? How should explanation quality be measured? A perennial problem is ignorance of past research and a sense of what’s already practical. For example in the area of deep policy/legal deduction for decisions is it difficult to provide full explanation of extended logic programs, along with NL generation and interactive drill-down navigation? We seem to have reached a point where we can have a good version of cognitive search that takes into account provenance along with focus and lateral relevance, using extended knowledge graphs. (See Kyndi research).
4.5. How to evaluate explanations or CSK – what is a quality explanations (or a good caption for an image?) Evaluation remains a research issue in both CSK and explanation. As we have seen there are many criteria for judging the goodness of automated process from simple labeling to full explanations. How do we validate the knowledge? In some cases regular domain knowledge will not support commonsense reasoning and some enhancement is needed. Micheal Gruninger cited the example of translating a domain ontology to a domain state ontology. In this enhancement activity occurrences correspond to mappings between models of the domain ontology. And we would need to classify activities with respect to possible changes. An implication here is that some additional ontological steps are needed such an enhancement. As example Gruniger offered an approach looking at fluent - verbs. For each verb, the process is to:
- Identify the fuents (i.e. states of the world, leveraging PSL and the situation calculus) that are changed when the process corresponding to the verb occurs.
- Axiomatize the domain state ontologies that contain the
fluents in their signature.
- Identify the domain ontologies for these domain state ontologies. This may require additional axiomatizations if such ontologies have not previously been considered. One commonsense assumptions is that events are atomic. That is, one event occurs at a time, and a reasoner need only consider the state of the world at the beginning and the end of the event, not the intermediate states
while the event is in progress. Of course an attempt to dunk a ball and having it slip out of one's hand after a takeoff and before a landing is possible.
- Axiomatize the domain process ontologies.
- The process corresponding to the verb will either be an atomic activity in one of the domain process ontologies or it is a complex activity that is composed of activities in the domain process ontologies.
Others KE taks might include adding CSK modules as in CYC. Since these activities are not typically covered in ontological engineering methods some additional education may be needed. It is not as simple as saying that a system provides an exact match of words to what a human might produce given the many ways that meaning may be expressed. And it is costly to test system generated explanations or even captions against human ones due to the human cost. One interesting research approach is to train a train a system to distinguish human and ML/DL system generated captions (for images etc.). After training one can use the resulting criteria to critiques the quality of the ML/DL generated labels (Cui et al 2018). A particular task is evaluating the quality of knowledge, both CSK and non) extracted from text. Provenance information or source context is one quality needed. In some cases, and increasingly so, a variety of CSK/information extracted is aligned (e.g. some information converges from different sources) by means of an extant (hopefully quality) ontology and perhaps several. This means that some aspect of the knowledge in the ontologies provides an interpretive or validating activity when building artifacts like KGs. Knowledge graphs can also be filled in by internal processes looking for such things as consistency with common ideas as well as from external processes which adds information from human and/or automated sources. An example currently used is to employ something like Freebase's data as a "gold standard" to evaluate data in DBpedia which in turn is used to populate a KG. We can again note that a key requirement for validatable quality of knowledge involves the ability to trace back from a KB to the original documents (such as LinkedData) and if filled in, from other sources such as humans to make it understandable or trustworthy. It is useful to note that this process of building such popular artifacts as KGs clearly shows that they are not equivalent in quality to supporting ontologies. In general there some confusing of equating the quality of extracted information fork text, KGs, KBs, the inherent knowledge in DL systems and ontologies.
4.6. Alternative values and disconnects between users, funders and investors Users often perceive critical benefits/requirements for things like commonsense explanation. Their context is that their current AI systems may produce confusing results with explanations that do no make sense and/or are complex and unwieldy, while often being misleading or inaccurate. Funders certainly consider big picture benefits such as more overall effectiveness of systems with less exposure to risk of non-compliance. But they should also look at research issues, such as agile development and knowledge maintenance involved and current methods for producing explanations and/or commonsense reasoning are expensive and perhaps not scalable or take too much time to update. Investors (both venture and enterprise-internal) may look for a shorter and more dramatic feature impact & business benefits compared to currently deployed methods. Among those benefits is accurate decisions but a case should be made for better explanations that can be cost effective by requiring less labor or the precious time of subject matter experts which now can participate in a closer knowledge refinement loop. Investors fail to perceive value of what they consider add-ons like explanation which seem costly, but especially commonsense reasoning which is seen as simple.
4.7. How to represent CK knowledge useful for explanations Both formal/logical and informal or commonsense knowledge of assertions, queries, decisions, answers & conclusions along with their explanations has to be represented. As noted in the work of Gruniger single ontologies are not likely to be suitable as work expands and more contexts are encountered. This will require multiple ontologies. Big knowledge, with its heterogeneity and depth complexity, may be as much of a problem as Big Data especially if we are leveraging heterogeneous, noisy and conflicting data to create CSK and explanations. it is hard to imagine that this problem can be avoided. The ontology experience is that as a model of the real world we need to select some part of it based on interest and conceptualization of that interest. Differences of selection and interpretation are impossible to avoid and it can be expected that different external factors will generate different context for any intelligent agent doing the sections and interpretation needed as part of a domain explanation. For some temporal phenomena one can use the situation calculus (SC) , with a branching model of time, such as characterize planning that consider alternative possible actions. However, as Davis and Marcus (2015) not SC does not work well for things like narrative interpretation, since the order of events may not be known. For narrative interpretation, a different representation, the event calculus (EC) is more suitable for expressing the many temporal relations that may be present in discourse and narratives. Despite this understanding success is still limited in EC applying it in the interpretation of natural language texts. The Figure below from Tanden et al (2018) provides one view of some of the CSK, organized by visual modalities, available in a variety of forms. As can be seen CYC is well represented. Figure here Need In the context of applying highly structured knowledge such as ontologies to DL system knowledge there is an issue of how to first connect the noisy space of data to the perfect world of ontolgies/KBs and their modules, such as CYC microtheories, Such connections may not be scalable, and the effort not easily made crowdsourcable (Tanden et al, 2018). Various approaches exist for different forms of CSK and the integration of these is challenging. Linked data may view some formal knowledge as a set of linked assertions but for integration these may be linked to regular sentences expressing these assertions from which sentences may be generated. While commonsense is an important asset for DL models, its logical representation such as microtheories has not been successfully employed. Instead, tuples or knowledge graphs comprising of natural language nodes has shown some promise, but these face the problem of string matching i.e., linguistic variations. More recent work on supplying commonsense in the form of adversarial examples, or in the form of unstructured paragraphs or sentences has been gaining attention recently. But experience in this area suggests that construction is noisy/approximate when scaling up to significant amounts of diverse data and that task or application specific KBs may be more effective if scale is involved. In the previously cited example (Financial Regulatory/ Policy Compliance), NL-syntax sentence may have 1 or more logic-syntax sentences associated with it that formally encode the understood assertions and allow some reasoning (unlike free text which can represent CSK). For explanations they may also assert its provenance, or even represent its text interpretation in some other expression (this means that). Obviously logic-syntax sentences may have 1 or more NL-syntax sentences that can be expressively associated with it which can be output by a text generation process. There may also be a round trip view in that there are source sentence used in text interpretation, that produced a logical representation from which other expression of the knowledge may be produced. Work at this level is still in its early stages.
4.8. Robustness and Understandability How robust is commonsense reasoning and decision making? As noted there are now many ML applications which are increasingly looked on as mature enough to use for some ordinary tasks. Visual recognition is one of these but Shah et. al (2019), suggests that some such applications are not robust. Simple alternative NL syntactic formulations lead to different answers. For example, “What is in the basket” and “What is contained in the basket” (or “what can be seen inside the basket”) evoke different answers. Humans understand these as similar, commonsense meanings, but ML systems may have learned something different. Another problem is that DLs, by themselves, are black-box in nature. So while these approaches allows powerful predictions, by themselves their outputs cannot be directly explained. Additional functionality is needed including a degree of commonsense reasoning. Such focused, good, fair explanations may use natural language understanding even be part of a conversational dialogue human-computer interaction (HCI) in which the system uses previous knowledge of user (audience) knowledge and goals to discuss output explanations. Such “Associate Systems” may get at satisfactory answers because they include a capability to adaptively learn user knowledge and goals and are accountable for doing so over time as is commonly true for human associates.
4.9 Enhancing Ontology Engineering Practices We will need to arrive at a focuses understanding of CSK that can be incorporated into ontological engineering practices. For efforts like CSK base building this should include guidance and best practices for the extraction of rules from extant, quality ontologies. If knowledge is extracted from text and online information building of CSK will require methods to clean, refine and organize them probably with the assistance of tools. Extracted information is often focused on taxonomic aspects but things like ordinary as causal relations and roles are important and challenging. In light of this future work, will need to refine a suite of tools and technologies to make the lifecycle of CSKBes easier and faster to build.
4.10 Fundamental Understanding Despite progress and noted in Michael Gruninger's talk many of the domains involved in commonsense reasoning are only partially understood and lightly axiomized without general consensus on which modules are needed, how they are formalized and how they are related. This situation is similar to problem expressed about the CYC MT approach. it is easy to agree with Davis and that
- "We are far from a complete understanding of domains such as physical processes, knowledge and communication, plans and goals, and interpersonal interactions."
Conclusions It seems clear that both CSK and explanation remain important topics as part of AI research and its surging branch of ML. Further they are mutually supportive although explanation may be the more active area of diverse work just now. We seem close to AI systems that will do common tasks such as driving a car or give advice on common tasks like eating. It seems clear that such everyday tasks need to exhibit robust commonsense knowledge and reasoning to be trusted. Thus as intelligent agents become more autonomous, sophisticated,and prevalent, it becomes increasingly important that humans interact with them effectively to answer such questions as “why did my self-driving vehicle take an unfamiliar turn?” Current AI systems are good at recognizing objects, but can’t explain what it sees in ways understandable to laymen. Nor can it read a textbook and understand the questions in the back of the book which leads researchers to conclude it is devoid of common sense. We agree as DARPA’s Machine Common Sense (MCS) proposal put it that the lack of a common sense is “perhaps the most significant barrier” between the focus of AI applications today (such as previously discussed), and the human-like systems we dream of. And at least one of the areas that such an ability would play is with useful explanations. It may also be true, as NELL researchers argue, that we will never produce true NL understanding systems until we have systems that react to arbitrary sentences with ”I knew that, or didn't know & accept or disagree because X”. Some General Recurring Questions we suggest are worth considering include:
- How can we leverage the best of the two most common approaches to achieving commonsense?
formal representations of commonsense knowledge (e.g. encoded in an ontology's content as in Cyc or Pat Hayes’ Ontology of Liquids) vs. strategies for commonsense reasoning (e.g. default reasoning, prototypes, uncertainty quantification, etc.)
- How to best inject commonsense knowledge into machine learning approaches?
Some progress on learning using taxonomic labels, but just just scratches the surface
- How to bridge formal knowledge representations (formal concepts and relations as axiomatized in logic) and representations of language use (e.g. Wordnet)
Commonsense knowledge and reasoning could assist in addressing some challenges in DL as well as explanation and in turn interest in these can turn to CSK and reasoning for assistance. Many challenges remain including those of adequate representative knowledge, how to acquire the proper knowledge and the number of different but consistent and supportive ontologies may be needed even for simple task. How to use these to mitigate the brittleness of DL models in light of various adversarial/confusing alternative inputs is a problem as is the brittleness explanations.. Among the problems that DL systems face as they address increasingly complex situations is producing generalization as humans seem to do for unseen situations when faced with limited training data/experience. Another problem is that DL systems are not aware of an overall context when processing subtle patterns such as social interactions. A recent commonsense aware DL model makes more sensible predictions despite limited training data. Here, commonsense is the prior knowledge of state-changes e.g., it is unlikely that a ball gets destroyed in a basketball game scenario. Commonsense knowledge and reasoning can compensate for limited training data and make it easier to generate explanations, given that the commonsense is available in an easily consumable representation. The model injects commonsense at decoding phase by re-scoring the search space such that the probability mass is driven away from unlikely situations. This results in much better performance as a result of adding commonsense. References:
- Cui et al 2018, Learning to Evaluate Image Captioning
- Dalvi, Tandon, & Clark: ProPara task: data.allenai.org/propara
- Davis, Ernest, and Gary Marcus. "Commonsense reasoning and commonsense knowledge in artificial intelligence." Commun. ACM 58.9 (2015): 92-103.
- Deep neural networks are easily fooled: evolvingai.org/fooling
- Grosof, Benjamin, et al. "Automated decision support for financial regulatory/policy compliance, using textual Rulelog." Financial Times (2014)
- Kang et. al 2018, AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples
- Li, Yitong, Timothy Baldwin, and Trevor Cohn. "What's in a Domain? Learning Domain-Robust Text Representations using Adversarial Training." arXiv preprint arXiv:1805.06088(2018).
- Shah et. al (2019), Cycle-Consistency for Robust Visual Question Answering
- Tandon et. al: Commonsense tutorial at CIKM 2017