Sunday, May 6, 2012

Semantic models for CDISC based standard and metadata management

In mid April we did a presentation at the 2012 CDISC (Clinical Data Interchange Standards Consortium) Interchange Europe with the title: Semantic models for CDISC based standard and metadata management (see our slides and short paper). This time in a sunny, but chilly, Stockholm at a very nice hotel (Elite Marina Tower). Last year Frederik Malfait,  consulting at Roche, and I, working for AstraZeneca, had two different presentations at the 2011 conference in Brusses. See my blog post: Linking Clinical Data Standards

Since then we have seen more interest in semantic web standards in the CDISC community, see for example the article in Applied Clinical Trials Online (@Clin_Trials): Digital Data, the Semantic Web, and Research, by  Wayne Kubick, the new CTO of CDISC. This year Frederik and I did a joint presentation with a key messsage to the CDISC organisation: "Put semantics into the semantics". That is, to start using semantic web standards and linked data principles for the whole suite of CDISC standards. See below our list of proposals.

In my introduction I described the current situation when the question now is "Not when, but how" to best adopt CDISC standards. At the same time the different CDISC standards are not linked and published in different formats and so called metadata registeres (MDR) are requested for robust life cycle management of standards. 

Real world use 

In my brief introduction (see slide 5-11) to the core semantic web standard, the so called RDF triple, I showed an example of how Google use RDF based standards to improve search (see my previous blog post on And I also showed how NCI use RDF to publish the NCI Thesaurus, see RDF/OWL download of NCIt via LexEVS. And also how RDF is used for an early version of  the domain model for biomedical research (BRIDG), see RDF/OWL representation of BRIDG/ISO21090. In both these cases the RDF is published as XML, but RDF triples can also be published in different serialisation formats (i.e. XML, JSON, Turtle, and N-Triples). I also showed the latest version of the Linked Open Data cloud, with even more linked datasets than the one Frederik and I had in our presentations last year. I then turned over to the main part of our presentation describing two real world use of how two sponsors now start to use semantic web standards and linked data principles.

Linked Data cloud to grow across AstraZeneca R&D

Photo from CDISC Facebook
In AstraZeneca we have a new program called Integrative Informatics (i2) establishing the components required to let a linked data cloud grow across R&D. A key component is the URI policy for how to make for example a Clinical Study linkable by giving it a URI, that is a Uniform Resource Identifier, e.g. This is an identifier for a clinical study with the study code D5890C00003 that should be persistent and not dependent on any system. In the same way we will give guidance on how to use URI:s to make other key entities such as Investigator and Lab linkable. Also standard data elements from CDISC and internal ones to be managed in a future MDR should have URI:s to make them linkable. For more information on how URI:s are being used in for example the UK and US governments, see my URI design page.

A semantic web standard based MDR in Roche

Photo from CDISC Facebook
Frederik described the schema, content and architecture of Roche Biomedical MDR. And then he went through a demo using a RDF representation of a CDISC standard example and of an internal Roche standard (you will find the screenshoots from the demo in end of the slide deck). He first showed how the standards could be viewed using a general tool (TopBraid Composer from TopQuadrant, but could be any other RDF tool such as Protégé, a common open source tool). On slide 20-28 you can see how SDTM model v.1.2, SDTM IG v3.1.2, and SDTM CT:s, all are linked together (for example Observation Class: Event - Domain: AE - Variable:  AEOUT - Submission value: NOT RECOVERED/NOT RESOLVED). And then he showed the same RDF representation via the application Roche Global Standard Data Browser (slide 29-37). Frederik also showed how the linked data standards can be exported in SAS and Excel formats (slide 42-50). And finally, he showed an example from a Roche standard questionnaire.

Proposals to CDISC

In the slides you can see that Frederik had to transform CDISC standards into RDF using a schema he developed for Roche and give them URI:s in a Roche namespace (e.g. for one of the data elements). This is not a ideal way, instead we would like CDISC to provide these. Hence the drive from our leadership in Roche and AstraZeneca for Frederik and myself to push back to CDISC. 

Below a draft list of proposals to CDISC: 
  • Decide on a URI design for CDISC standards (e.g.
  • Review the schema Frederik has proposed for the core MDR in CDISC SHARE. 
  • Publish the new SDTM v1.3 and SDTM IG v.3.1.3 as RDF in XML, JSON, Turtle, and N-Triples formats using the reviewed schema and URI design. (As options to current publication formats, i.e PDF, html, csv, xml/odm.) 
  • Work together with NCI on enhancing the RDF/OWL version of NCI Thesaurus. Also review the option to use the RDF/SKOS standard and apply linked data principles. Publish coming versions of CDISC CT:s as RDF in XML, JSON, Turtle, and N-Triples. 
  • Work together with NCI on enhancing the RDF/OWL representation of BRIDG/ISO21090 model and apply linked data principles to make all BRIDG classes, properties and ISO21090 data types linkable.
  • Extend the MDR schema for CDISC SHARE for linkage to relevant BRIDG classes and properties and to ISO21090 data types.
  • Start exploring semantic web standards and linked data principles also for clinical data, including making invidual clinical data points linkable using URI:s and annotating them using existing and emerging clinical standard terminilogies and ontologies. 


Frederik Malfait said...

Couldn't agree more with the proposal. This is the right time to do it, given the need to harmonize the different CDISC standards, the upcoming volume of standards related to therapeutic areas, and the need to put lifecycle management and proper semantics into the standards. And then there is metadata for lab measurements, biomarkers, questionnaires, conformance checks, and much more.

The only way to manage this properly is to model it properly. An excellent candidate is to create a model based on semantic standards (RDF, OWL, SKOS), metadata management standards (ISO 11179), and an established domain analysis model (BRIDG).

If we want to be serious about modeling the standards, then it is time to leave XML, Excel, PDF, and Mindmap pictures behind. These are fine as communication and transport formats, but not to capture the intrinsic qualities of a model. We really need to get professional about this.

Gokce Banu Laleci said...

In an FP7 Project named SALUS (, we are conducting very similar research. Initial results can be followed from:

As presented in our SIMI 2012 Paper (, the basis of our Semantic Interoperability approach will be an ISO 11179 based MDR, which will help us to build SALUS Common Ontology.

DaveG said...

The only way to manage this properly is to model it properly.

This is so true. let's give up the procrustean activity of enforcing RDB models on sets of tables that are different.

Alfred Avina said...

The article is so appealing. You should read this article before choosing the Big Data Solutions Developer you want to learn.

Fuel Digital Marketing said...

great article poster.Thanks for providing such a great information.Our team manages all your social accounts and organised regular engaging social media campaigns so that your business is always creating a buzz on social media.

digital marketing consultants in chennai | Leading digital marketing agencies in chennai | digital marketing agencies in chennai | Website designers in chennai