Monday, May 2, 2011

Linking Clinical Data Standards

This is a follow-up to an earlier blog post where I outlined the background, audiance and intention of three presentations. Two of them have been published on Slideshare:
Here I focus on the second presentation, a presentation I did in the CDISC (Clinical Data Interchange Standards Consortium) conference in Brussels recently. One of the key people in the CDISC community, Dave Iberson-Hurst, lists semantic web as one of three themes and kindly refers to my presentation in a recent blog post

My presentation, and also a very nice presentation from Roche, triggered interesting questions. Questions both on what I proposed as pragmatic first steps for linking clinical data standards, and also on what I see as future opportunities. Below you find the questions and my "answers", or rather thoughts. In a coming blog post I will discuss what all of this could mean for CDISC SHARE (metadata repository).

In my presentation - the last one on the first day - I  urged the CDISC community to consider the use of semantic web standards and linked data principles for clinical data standards. It was very nice to be able to refer back to two of the presentations in the earlier sessions. 

Pragmatic steps for CDISC
Firstly, to the presentation by Rebecca Kush, President of CDISC, on the value of open and free standards. The key message in my presentation pointed out:




Roche use Semantic Web for clinical data standards
And secondly, to the presentation from Roche on the development of a "Global Data Standard Repository" (GDSR) using semantic web standards and a ontology tool (TopBraid Composer). My first slides introducing the idea of "Triples" (the RDF standard model) and "Global Identifiers" (URI:s) was a recap for the audience as Frederik Malfait (IMOS Consulting presenting on behalf of Roche) in a really good way already had introduced these. 

Questions and Answers
Even though it was the last presentation for the day (just before  a very nice evening with TinTin at the Brussels Comic Strip Center) many people stayed around and I got the opportunity to sort out a key question, and also to outline two future opportunities: 
Q: Do you mean we should publish the actual clinical data openly? 
A: No! What should be made publicly available is another topic. My key message is that the free and open clinical data standards as they are currently constructed should be made available as linked open clinical data standards 1]. This means, using semantic web standards. (I propose the use of RDF/XML format as an alternative to Excel and ODM/XML.) And, also applying the Linked Data principles. (For example, assigning URI:s as global identifiers as an alternative to text strings for the submission values.)
Q: Does this relates to ontologies for bioinformatics?
A:
Yes. The insights from developing for example the Gene Ontology are highly applicable when representing and structuring the entities and relations in the clinical reality. In some extra slides to my presentation I propose explorative work to construct the next generation of clinical data standards using modern ontologies 2] based on the so called Open Biological and Biomedical Ontologies (OBO) Foundry.
Q: Do you mean that this would take away the need for manual transformation of clinical data?
A:
Yes and No.
Yes, because the above outlined next generation of clinical data standards (i.e. using semantic web standards, applying linked data principles and being based on modern ontologies) would improve the research utility of clinical datasets. That is, firstly, a very normalized, flexible way to convey clinical data. And, secondly, machine-processable clinical data ready for automatic transformation and direct querying, and ready for inferencing and reasoning.
No, because existing data needs to be transformed according to the above. And, No for quite some time as there are many things to explore and learn. A  highly pragmatic, incremental and stepwise approach is required 3] 

1]  
See my presentation slide 31-36 for more details on the pragmatic steps I propose for CDISC, and NCI.
2]  The two OBO Foundry based ontologies I am referring to are the Translational Medicine Ontology, TMO (a.k.a. the Pharma Ontology) and the Computer-Based Patient Record (CPR) Ontology. See also an excellent article on biomedical ontologies: More Than Words, in the Clinical and Translational Science Network.
 

Kudos to Frederik Malfait and Jonathan Chainey (Roche), 
Dave Iberson-Hurst (@Assero_UK),
Bron Kisler (@CDISC), Philippe Verplancke and
Isabelle de Zegher 
 for great discussions F2F in Brussels.