Sunday, October 23, 2011

Query Federation and Linked Closed Data

This is a blog post with highlights from the 10th International Semantic Web Conference taking place in Bonn, that I picked up while following the event on distance.

>> Updated 2 November with a presentation by Peter Haase on Fedbench, see below. And also with this the nice blog post by Ivan Herman's (the leader for W3C's Semantic Web work): Some notes on ISWC2011…

>> Updated 13 November with a link to a paper on federated search in life science, see below.

Today and tomorrow, Sunday - Monday 23-24 October, are the workshops days with 16 workshops arranged before the main conference. Through the day I have on and off been catching up on the busy Twitter feeds on my iPhone while being out walking in the nice weather on the West Coast of Sweden. And now in the evening I have picked up two things of that I did find extra interesting from an enterprise perspective on linked data and URI:s.

Query Federation
Being able to do federate querying of data from different internal and external data sources is a key capability required in an enterprise context. An interesting paper presented in the Consuming Linked Data Workshop describes how this can be done using the VOID standard (Vocabulary of Interlinked Datasets): SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions by Olaf Görlitz and Steffen Staab.

The paper use a scenario of researchers in the life science domain have numerous databases at hand which contain detailed information about pathways, genes, proteins, drugs and so forth. It describes and evaluate a framework called SPLENDID including an Index Manager, a Query Optimizer, and a Query Executor.

A take away highlighted in the Twitter feed from the presentation of the paper is the value of publishing VOID data for Linked Data set. The paper also includes references to an interesting product that I have spotted in other tweets earlier on: FedX, a framework for transparent access to Linked Data sources through a federation using optimization techniques. See also a recent discussion thread: SPARQL Federated Query Clients, on W3c's Linked Open Data email-list. I also find this paper from 2009 by key people in W3C's interest group for semantic web in life science highly relevant: A journey to Semantic Web query federation in the life science.

Later on in the conference a research paper was presented that I adress topic of central repositories vs. federated querying and processing

Linked Closed Data
In an enterprise context the recognition of the use of transparency and the value of open sharing of data is getting more and more traction. By applying the Linked Data principles corporations can enable meaningful use of data. See my previous blog post on Corporate Transparency and Linked Data. At the same time there are of course datasets for which access to and use of the data is subject to legal, business, data privacy or ethical restrictions which go beyond attribution and share-alike obligations.

A vision paper presented at the Consuming Linked Data Workshop outlines A research agenda for Linked Close Data by  Marcus Cobden, Jennifer Black, Nicholas Gibbins, Les Carr, and Nigel
Shadbolt. The authors defines  Linked Closed Data as Semantic Web datasets which are published in accordance with Linked Data principles, but which include access and licence restriction.

I was glad to see that Ivan Herman in his blog post also highlight this: "we can and we should speak about Linked Closed Data alongside Linked Open Data is important if we want the Semantic Web to be adopted and used by the enterprise world as well."

Thanks Kudos to @juansequeda and @ivan_herman for your great tweets today.
(While writing this blog post I can see on Twitter that the folks at the conference in Bonn now have a linked data gather and getting ready to play "#semanticbeerpong" :) For me it's time for a cup of tea instead ...)