Actions

Ontolog Forum

OntologySummit2014_Hackathon - Project:

Optimized SPARQL performance management via native API

Project roster page: OntologySummit2014_Hackathon_ReferenceDataForSPARQLPeformanceBenchmarking (this page).

Team lead: VictorChernov (MSK, UTC+4) vchernov at nitrosbase.com

Event starts 29th of March 2014 14:00 MSK / 10:00 UTC / 03:00 PST all over the world via mikogo.com (the session # will come later)


The Goals of the project are

Studying the kinds of queries revealing the advantages of one or another RDF database. The goals imply:

  • Selection of a SPARQL subset from SP2Bench
  • Formin a dataset and loading it to all triple-stores.
  • Implementing measurement aids, testing
  • Accurate time measurement, getting min, max, average and median times.
  • Reflection on the results, advantages and disadvantages of the triplestores on each selected query.

The following triplestores will be compared:

The triplestores have the following important advantages:

  • Very high performance on demonstrated on sp2bench benchmark
  • Linux and Windows versions
  • Native API for fast query processing

It is important to use native API for fast query execution. All 3 tools provide native API:

Virtuoso
Jena, Sesame and Virtuoso ODBC RDF Extensions for SPASQL
Stardog
the core SNARL (Stardog Native API for the RDF Language) classes and interfaces
NitrosBase
C++ and .NET native API

We suppose writing additional codes needed for accurate testing:

  • Accurate time measurement;
  • Functions for getting min, max, average and median times;
  • Functions for getting time of scanning through the whole query result;
  • Functions for getting time of retrieving first several records (for example, the first page of web grid);
  • Etc.

The following steps are needed for loading test dataset:

  • Selecting a data subset from sp2bench benchmark
  • Measuring data loading time

Note: Data are considered as loaded as soon as the system is ready to perform a simplest search query. This is done to eliminate background processes (eg. indexing).

We are going to explore the query execution performance by the databases under consideration (Virtuoso, Stardog, NitrosBase).

The queries should be fairly simple and cover the different techniques, for example:

  • search the small range of values
  • search the big range of values
  • Sorting
  • Aggregation
  • Several different join queries
  • Retrieving part of result
  • Retrieving whole result
  • etc.

Note: During testing each database may allocate a lot of resources, that can affect the performance of other databases. That’s why each test should be stared from system reboot.