Actions

Ontolog Forum

Ontology Summit 2012: (Track-4) "Large-scale domain applications" Synthesis

Mission Statement

This track will help to ground the discussions in the other tracks and bring key challenges to light by describing current large-scale systems and systems of systems that either use, or could use, ontologies in their deployment. "Large-scale" can mean either very large data sets, very complex data sets, federated systems, highly distributed systems, or real-time, continuous data systems. Examples of large data sets might include scientific observations and studies; complex data sets could be technical data packages for manufactured products, or electronic health records; federated systems could include information sharing to combat terrorism, highly distributed systems includes items such as the smart electrical grid (aka Smart Grid), and real-time systems include network management systems. Of course, some big systems might include all five aspects.

see also: OntologySummit2012_Applications_CommunityInput


In implemented systems, ontologies are...

  • Strong for:
    • Supporting change and aggregation
    • Enabling community aggregation, annotation
    • Automated data ingestion
    • Data validation
    • Ensuring consistency of terms across many data sets (Distributed systems)
    • Supporting reasoning
    • Self describing systems
    • Systems with many complex constraints, rules, laws, with frequent changes (Dynamically changing systems)
    • Data mining / semantic signature extraction
    • Rapid system building
  • Weak for:
    • Being understandable by software engineers and customers
    • Query performance (compared to relational databases)

Needs

  • Need better standards for common elements:
    • Datatypes
    • Ontology patterns (e.g. whole/part patterns)
    • Collect ontological primitives from observation data
  • Need repositories
    • Repositories of ontological patterns could be more useful than repositories of ontologies
  • Need industrial strength semantic services resident in the cloud
  • Need better visualization tools and approaches
  • Need better tools to help interpret legacy systems, transform into semantic systems.
  • Need to establish feedback mechanisms from end users to ontology designers directly from point of use.

Recommendations

  • Look for the 80-20 rule of semantic development
  • Use well defined and narrow use cases to demonstrate benefits of semantic approaches
  • Having explicit vocabularies (classifiers) is a must in a distributed system;
  • Community should be included in the development and evolution of vocabularies
  • It is critical to capture and evolve domain knowledge in a form that the community is comfortable with
  • Transition from implicit domain knowledge to explicit encoding requires community consensus - and an organization to manage the consensus
  • Some have recommended exposing users to SKOS semantics; use more complicated constructs only on back end if necessary.

Other Observations / Lessons learned

  • UML to OWL is a common requirement for legacy systems
    • Starting from scratch is rare.
  • Ontology patterns are very helpful, and encourage model reuse
  • Semantic techniques work best when not compromised by implementation tradeoffs
  • Semantic methods are faster to implement and easier to maintain
  • Semantic approaches particularly suited to systems with many complex constraints, rules, laws, with frequent changes
  • Incremental implementation is possible through federation of datastores
  • Ontologies are not always applied to enable reasoners - sometimes just as a more rigorous data modeling approach
  • Engineers turned ontologists often don't have the necessary background/skills
  • Existing infrastructure supports traditional software development far better than large-scale ontology development
  • There are many ontologies of dubious quality
  • Service-oriented architectures allow separation of code and ontology updates
  • Reasoner and query engine performance is highly dependent upon the exact formulation of rules and queries
  • No single technology/tool currently provides the best solution across all large system use cases

--

maintained by the Track-4 champions: Steve Ray & Trish Whetzel ... please do not edit