Ontolog Forum
Session | Demos of information extraction via hybrid systems |
---|---|
Duration | 1 hour |
Date/Time | 1 Nov
2023 16:00 GMT |
9:00am PDT/12:00pm EDT | |
4:00pm GMT/5:00pm CET | |
Convener | Andrea Westerinen and Mike Bennett |
Ontology Summit 2024 Fall Series Demos of information extraction via hybrid systems
Agenda
- Andrea Westerinen, Creator of DNA, Deep Narrative Analysis
- Title: Populating Knowledge Graphs: The Confluence of Ontology and Large Language Models
- Abstract: Ontology-based Knowledge Graphs (KGs) stand at the forefront of semantic data representation, providing structured views of the data in complex domains. Traditionally, populating these KGs from unstructured text involved convoluted natural language analyses and custom code, but the environment has changed with the use of Large Language Models (LLMs). This talk explores one use case - the population of a KG from news articles. The evolution of the DNA application from employing spaCy APIs to OpenAI is described, and the current (open-source) implementation discussed. Implementation issues such as sourcing the data, LLM prompts, mapping the LLM responses onto the ontology, and populating the knowledge graph are overviewed.
- Slides
- Prasad Yalamanchi, Lead Semantics CTO
- Title: Harvest Knowledge From Language - Harness the power of Large Language Models and Semantic Technology
- Abstract: Language (both text and voice) holds much of the accessible knowledge to humans. It is also the best store of the collective human knowledge. Historically, accessing this knowledge, was manual and up until recent times has progressed to varying degrees of semi-automated methods! But, with the advent of Language Models and particularly Large Language Models in the last couple of years, a fully automated access to knowledge carried in language is now becoming a reality! TextDistil, the software product from Lead Semantics applies LLMs and Ontologies to extract computable knowledge in the form of RDF triples from Text.
- Slides
- Video Recording
Conference Call Information
- Date: Wednesday, 1 November 2023
- Start Time: 9:00am PDT / 12:00pm EDT / 5:00pm CET / 4:00pm GMT / 1600 UTC
- Note that Daylight Saving Time has ended in Europe but not in the US or Canada.
- ref: World Clock
- Expected Call Duration: 1 hour
- Video Conference URL: https://bit.ly/48lM0Ik
- Conference ID: 876 3045 3240
- Passcode: 464312
The unabbreviated URL is: https://us02web.zoom.us/j/87630453240?pwd=YVYvZHRpelVqSkM5QlJ4aGJrbmZzQT09
Participants
- Prasad Yalamanchi
- Ken Baclawski
- Andrea Westerinen
- Mike Bennett
- Ravi Sharma
- Michael DeBellis
- Susanne Vejdemo
- Todd Schneider
- Janet Singer
- Douglas Miles
- John Sowa
- Bart Gajderowicz
- Gary Berg-Cross
- Zefi Kavvadia
- Dan (Telicent)
- Helena (Telicent)
- Mark Underwood
- Sundos Al Subhi
- Jeff
- James Logan
- Gian Piero Zarri
- Mariusz Bronowicki
- Ram Sriram
- Jim Rhyne
- Mark Ressler
- Robin McEntire
- Ibrahim Gh
- Matt Turner
- Asiyah Yu Lin
- Mark Fox
- Alex Shkotin
- Victor Agroskin
Complete list of participants was not captured
Discussion
- Ravi Sharma: What is the normal accuracy and does the accuracy of triples vary by language?
- Ravi Sharma: Namely would it just depend on the language only or on domain concept would affect the KG?
- Michael DeBellis: Is the ontology COMPLETELY created from the Corpus, or do you start from a foundation ontology and extend it based on the Corpus?
- Ravi Sharma: You implied preprocessing and semantic understanding by humans before the KG is generated? how much effort is it?
- Ravi Sharma: Is there a possibility to reduce the duration by compromising the accuracy somewhat?
- Susanne Vejdemo: I understand we have a KG constructed from the unstructured docs. And then there’s translation of your query to triples? I am a bit uncertain where the LLM comes into this?
- Janet Singer: My question as well — how exactly does the LLM come in?
- Andrea Westerinen: I will address this in my presentation, but can’t answer for Prasad.
- Prasad Yalamanchi: LLM is coming in multiple places in the TextDistil pipeline. Once at the final summary string of the result items. It comes in building the KG, as well
- Bart Gajderowicz: Prasad, what does the LLM do in building the KG?
- Ravi Sharma: Can it show the visuals during progress such as the KG?
- Douglas Miles: Chatgpt-3.5 in some instances can work well enough to be used ovber ChatGPT-4 ?
- Andrea Westerinen: It MAY, but I have found profound differences. Linguistic analysis is much better in 4
- Douglas Miles: ChatGPT-3.5 can be so much faster with its return results.. I've considered running both to see when 3.5 was sufficient.. admnittely mostly it isn't.. but "convert this to owl" often is "acceptable"
- Andrea Westerinen: @Douglas Miles Not sure that I agree about acceptability.
- Douglas Miles: ok true.. I mean 3.5 cant even begin to convert to CLIF of CycL .. whereas it at least tried with RDF/OWL
- Ravi Sharma: If there are no human interventions, how much is the KG affected?
- Michael DeBellis: Ravi, that's my question as well. IMO it is usually better to have a person in the loop because creating a well designed ontology completely from a Corpus seems like the resulting ontology may not be well designed. That's why I asked the question about starting from a basic ontology and then extending that ontology, rather than creating the entire ontology from scratch
- Douglas Miles: Prasad presentation was very awesome!
- Ravi Sharma: Prasad - If you do two such exercises, is the result the same/repeatable?
- Prasad Yalamanchi: There is a language interpreatation to map the query string to the Ontology.
- Prasad Yalamanchi: SO, if two queries (exercises as you mentioned) result in the same interpretation, then the final answers will be the same
- Douglas Miles: Something that impresses me and is unique about Andrea's work (even year or two ago.. ) ... She actually supports full modality representations in RDF-ish languages.. Stuff that normally I would only dare to use CLIF to represent!
- Ravi Sharma: Are there similarities to the rhetoric possibilities of metaphor, context, explanation, etc to improve your results?
- Andrea Westerinen: I am not sure what is being asked. I am exposing the use of rhetorical devices to help readers understand how the text might be affecting their interpretations of it.
- Ravi Sharma: Andrea does ML or AI enter this exercise? And results you showed, if so where?
- Ravi Sharma: I mean what ML and learning sets were used in OpenAI
- Andrea Westerinen: OpenAI's complete technology stack is not disclosed but their website says "We build our generative models using a technology called deep learning, which leverages large amounts of data to train an AI system to perform a task."
- Douglas Miles: To help answer how LLMs can be useful in translation: https://chat.openai.com/share/039d72c3-8432-48d1-98b8-63e15614bbef
- Mark Underwood: Excellent presentations and important work for the ontology community
- Sundos Al Subhi: Thank you all!! Great information.
- Janet Singer: Excellent presentations — Looking forward to seeing these ideas integrated in the future session(s)
- Douglas Miles: (there is no question that the KR Andrea is doing is rock solid!) Here is my question though: Are any of the RDF reasoners good enough to do the reasoning/query that Andrea expects?
- Andrea Westerinen: @Douglas Miles Yes, I use Stardog. Also allows use of Voicebox which encode NL queries in SPARQL!
- Douglas Miles: Thank you that was great!
- Dan (Telicent): Thank you for your presentations 🙂
- Zefi Kavvadia : thank you!
- Mariusz (Telicent) : Thank you.
Resources
Previous Meetings
Session | |
---|---|
ConferenceCall 2023 10 25 | A look across the industry, Part 2 |
ConferenceCall 2023 10 18 | A look across the industry, Part 1 |
ConferenceCall 2023 10 11 | Setting the stage |
... further results |
Next Meetings
Session | |
---|---|
ConferenceCall 2023 11 08 | Broader thoughts |
ConferenceCall 2023 11 15 | Synthesis |
ConferenceCall 2024 02 21 | Overview |
... further results |