Monday, October 7, 2013

The future of CDISC CT:s

A poll posted by Lex Jansen (@lexjansen) in the LinkedIN group for CDISC (Clinical Data Interchange Standards Consortium) triggered me to write down some thoughts on the future of CDISC's so called Controlled Terminologies (CT:s):

When you import CDISC Controlled Terminology from NCI EVS at http://evs.nci.nih.gov/ftp1/CDISC, which format do you use?
  (Excel, Text, ODM XML, or OWL/RDF)

My vote goes to the formats with the best potential for the future, that is the formats serializing RDF modeled data e.g. turtle, json-ld, n-triples, and xml (See the blog post: Understanding RDF serialisation formats)

Today's RDF version

The recently published OWL/RDF version of the CT:s (serialized in xml) uses the first version of the CDISC2RDF schema 1) implementing the model behind the existing export of a limit part of  the content in NCI Thesaurus (NCIt). 

It is modeled to support today's use of the CT:s only as text strings to populate variables in CDISC defined data sets (e.g. SDTM domains) with submission values.That is, it provide study specific clarity making it easy for humans to read the clinical data and metadata.

Next RDF version

Based on very useful discussions with the terminology expert Julie James (LinkedIn profile) working for HL7, IMI EHR4CR and FDA/PhuSE Metadata definition project, these are my thoughts for the next RDF version:

To provide cross study semantic interoperability making it easy for machines to directly integrate and query clinical data and metadata across health care and clinical research we need an enhanced model.

That is, a model that fully leverage the content in NCIt. And address the issues people have experienced when using the CT:s in attempts to implement them in BRIDG / ISO21090. Using the insights from the IMI EHR4CR project and from the development of the IHE DEX profile (Data Element Exchange).

I think there is also an opportunity to leverage the work on binding value sets to data elements part of the HL7 FHIR (Fast Healthcare Interoperability Resources) development 2). Julie also pointed me to a new ISO standards: ISO/CD 17583 3) The next version should also apply both the OID (Object identfier) standard and the URI (Uniform Resource Identifier) standard to identify each value set and each value.


References:
1)  CDISC2RDF poster (presented at DILS 2013, Data Integration in Life Science conference) and FDA/PhUSE Semantic Technology project 
2) http://www.hl7.org/implement/standards/fhir/terminologies.htm
3) ISO/CD 17583: Health informatics -- Terminology constraints for coded data elements expressed in (ISO 21090) Harmonized Data Types used in healthcare information interchange.