Tuesday, June 11, 2013

Standards for common aspects

Through the last three years I have been engage with different groups working on standards, both for data exchange, such as CDISC, and for vocabularies such as MedDRA MSSO and NCI EVS. As they now start to see the value of using "standards for standards".

Push Back
From Flickr bitpuddle

Standards for standards

So, "I push back" to standard organisations to use semantic web standards and linked data principles to make their standards directly usable for humans and for machines.

A good example is CDISC and their growing interest in using semantic web standards (based on RDF, Resource Description Framework): CDISC2RDF. For some background see Clinical studies and the road to Linked Data. Today FDA, CDISC, pharma:s, CRO:s and software vendors are working together on this in a FDA working group for Semantic Technology organised by PhUSE.

Standards for common aspects

The last year or so, I have also tried to keep up to date with groups developing RDF-based standards for common aspect such as:
  • data descriptions (VoID)
  • data provenance and versioning (PROV and PAV)
  • concept based vocabularies and value sets (SKOS)
  • multi-dimensional statistical data (RDF Data Cube)
I try to ensure that we have a good view of the maturity and applicability of these standars so we can use them in our internal“integration factory”. But most of all “push back” to vendors. I foresee that we in the same way started to add requirements on web-interfaces for better end user usability back in the late 90:ies, we now should start to add requirements on web-interfaces for better machne usability. So we need to to understand how to incorporate these common aspects in our URS:s, RFI:s RFP:s etc..

For software vendors to use RDF-based standards for common aspects, for example:
  • MediData's Rave and Perceptive's IMPACT to describe datasets using VoID.
  • Accelrys' Pipeline Pilot to use W3C PROV.
  • Microsoft's SharePoint to use term sets for tagging in SKOS.
  • SAS Institute's Drug Development to create analysis results using RDF Data Cube.

So, this interview with Reza B'Far, Vice President of Development, Oracle on the W3C blog made me vryy glad: Oracle on Data on the Web
Oracle to use W3C provenance standard to create a single audit time line across systems
"One of the hugest problems we faced was maintaining transaction audit trails in a heterogeneous environment in a standard and compatible way. Audit trails are described with literally millions of different formats in different organizations. This used to mean it was impossible to create a single audit time line. PROV solves this problem. We now provide (and consume) a PROV feed that unifies the audit trails generated by transactions across heterogeneous systems."
See also the Implementation report with 60+ examples of usage of the W3C Provenance specifications.

For a nice intro to the W3C Provenance Specifications, see the tutorial by Paul Groth (@pgroth) at the Extended (European) Semantic Web conference.

No comments: