Session	Demos of information extraction via hybrid systems
Duration	1 hour
Date/Time	1 Nov 2023 16:00 GMT
	9:00am PDT/12:00pm EDT
	4:00pm GMT/5:00pm CET
Convener	Andrea Westerinen and Mike Bennett

Ontology Summit 2024 Fall Series Demos of information extraction via hybrid systems

Agenda

Andrea Westerinen, Creator of DNA, Deep Narrative Analysis
- Title: Populating Knowledge Graphs: The Confluence of Ontology and Large Language Models
- Abstract: Ontology-based Knowledge Graphs (KGs) stand at the forefront of semantic data representation, providing structured views of the data in complex domains. Traditionally, populating these KGs from unstructured text involved convoluted natural language analyses and custom code, but the environment has changed with the use of Large Language Models (LLMs). This talk explores one use case - the population of a KG from news articles. The evolution of the DNA application from employing spaCy APIs to OpenAI is described, and the current (open-source) implementation discussed. Implementation issues such as sourcing the data, LLM prompts, mapping the LLM responses onto the ontology, and populating the knowledge graph are overviewed.
- Slides
Prasad Yalamanchi, Lead Semantics CTO
- Title: Harvest Knowledge From Language - Harness the power of Large Language Models and Semantic Technology
- Abstract: Language (both text and voice) holds much of the accessible knowledge to humans. It is also the best store of the collective human knowledge. Historically, accessing this knowledge, was manual and up until recent times has progressed to varying degrees of semi-automated methods! But, with the advent of Language Models and particularly Large Language Models in the last couple of years, a fully automated access to knowledge carried in language is now becoming a reality! TextDistil, the software product from Lead Semantics applies LLMs and Ontologies to extract computable knowledge in the form of RDF triples from Text.
- Slides
Video Recording

Conference Call Information

Date: Wednesday, 1 November 2023
Start Time: 9:00am PDT / 12:00pm EDT / 5:00pm CET / 4:00pm GMT / 1600 UTC
- Note that Daylight Saving Time has ended in Europe but not in the US or Canada.
- ref: World Clock
Expected Call Duration: 1 hour
Video Conference URL: https://bit.ly/48lM0Ik
- Conference ID: 876 3045 3240
- Passcode: 464312

The unabbreviated URL is: https://us02web.zoom.us/j/87630453240?pwd=YVYvZHRpelVqSkM5QlJ4aGJrbmZzQT09

Participants

Prasad Yalamanchi
Ken Baclawski
Andrea Westerinen
Mike Bennett
Ravi Sharma
Michael DeBellis
Susanne Vejdemo
Todd Schneider
Janet Singer
Douglas Miles
John Sowa
Bart Gajderowicz
Gary Berg-Cross
Zefi Kavvadia
Dan (Telicent)
Helena (Telicent)
Mark Underwood
Sundos Al Subhi
Jeff
James Logan
Gian Piero Zarri
Mariusz Bronowicki
Ram Sriram
Jim Rhyne
Mark Ressler
Robin McEntire
Ibrahim Gh
Matt Turner
Asiyah Yu Lin
Mark Fox
Alex Shkotin
Victor Agroskin

Complete list of participants was not captured

Discussion

Ravi Sharma: What is the normal accuracy and does the accuracy of triples vary by language?
- Ravi Sharma: Namely would it just depend on the language only or on domain concept would affect the KG?

Michael DeBellis: Is the ontology COMPLETELY created from the Corpus, or do you start from a foundation ontology and extend it based on the Corpus?

Ravi Sharma: You implied preprocessing and semantic understanding by humans before the KG is generated? how much effort is it?

Ravi Sharma: Is there a possibility to reduce the duration by compromising the accuracy somewhat?

Susanne Vejdemo: I understand we have a KG constructed from the unstructured docs. And then there’s translation of your query to triples? I am a bit uncertain where the LLM comes into this?
- Janet Singer: My question as well — how exactly does the LLM come in?
- Andrea Westerinen: I will address this in my presentation, but can’t answer for Prasad.
- Prasad Yalamanchi: LLM is coming in multiple places in the TextDistil pipeline. Once at the final summary string of the result items. It comes in building the KG, as well
- Bart Gajderowicz: Prasad, what does the LLM do in building the KG?

Ravi Sharma: Can it show the visuals during progress such as the KG?

Douglas Miles: Chatgpt-3.5 in some instances can work well enough to be used ovber ChatGPT-4 ?
- Andrea Westerinen: It MAY, but I have found profound differences. Linguistic analysis is much better in 4
- Douglas Miles: ChatGPT-3.5 can be so much faster with its return results.. I've considered running both to see when 3.5 was sufficient.. admnittely mostly it isn't.. but "convert this to owl" often is "acceptable"
- Andrea Westerinen: @Douglas Miles Not sure that I agree about acceptability.
- Douglas Miles: ok true.. I mean 3.5 cant even begin to convert to CLIF of CycL .. whereas it at least tried with RDF/OWL

Ravi Sharma: If there are no human interventions, how much is the KG affected?
- Michael DeBellis: Ravi, that's my question as well. IMO it is usually better to have a person in the loop because creating a well designed ontology completely from a Corpus seems like the resulting ontology may not be well designed. That's why I asked the question about starting from a basic ontology and then extending that ontology, rather than creating the entire ontology from scratch

Douglas Miles: Prasad presentation was very awesome!

Ravi Sharma: Prasad - If you do two such exercises, is the result the same/repeatable?
- Prasad Yalamanchi: There is a language interpreatation to map the query string to the Ontology.
- Prasad Yalamanchi: SO, if two queries (exercises as you mentioned) result in the same interpretation, then the final answers will be the same

Douglas Miles: Something that impresses me and is unique about Andrea's work (even year or two ago.. ) ... She actually supports full modality representations in RDF-ish languages.. Stuff that normally I would only dare to use CLIF to represent!

Ravi Sharma: Are there similarities to the rhetoric possibilities of metaphor, context, explanation, etc to improve your results?
- Andrea Westerinen: I am not sure what is being asked. I am exposing the use of rhetorical devices to help readers understand how the text might be affecting their interpretations of it.

Ravi Sharma: Andrea does ML or AI enter this exercise? And results you showed, if so where?
Ravi Sharma: I mean what ML and learning sets were used in OpenAI
- Andrea Westerinen: OpenAI's complete technology stack is not disclosed but their website says "We build our generative models using a technology called deep learning, which leverages large amounts of data to train an AI system to perform a task."

Douglas Miles: To help answer how LLMs can be useful in translation: https://chat.openai.com/share/039d72c3-8432-48d1-98b8-63e15614bbef

Mark Underwood: Excellent presentations and important work for the ontology community

Sundos Al Subhi: Thank you all!! Great information.

Janet Singer: Excellent presentations — Looking forward to seeing these ideas integrated in the future session(s)

Douglas Miles: (there is no question that the KR Andrea is doing is rock solid!) Here is my question though: Are any of the RDF reasoners good enough to do the reasoning/query that Andrea expects?
- Andrea Westerinen: @Douglas Miles Yes, I use Stardog. Also allows use of Voicebox which encode NL queries in SPARQL!

Douglas Miles: Thank you that was great!

Dan (Telicent): Thank you for your presentations 🙂

Zefi Kavvadia : thank you!

Mariusz (Telicent) : Thank you.

Resources

Previous Meetings

	Session
ConferenceCall 2023 10 25	A look across the industry, Part 2
ConferenceCall 2023 10 18	A look across the industry, Part 1
ConferenceCall 2023 10 11	Setting the stage
... further results

Next Meetings

	Session
ConferenceCall 2023 11 08	Broader thoughts
ConferenceCall 2023 11 15	Synthesis
ConferenceCall 2024 02 21	Overview
... further results

Ontolog Forum

Contents

Ontology Summit 2024 Fall Series Demos of information extraction via hybrid systems

Agenda

Conference Call Information

Participants

Discussion

Resources

Previous Meetings

Next Meetings