The coming two weeks I'll be working on presentations for three events I have got the opportunity to participate in. I will use this blog post as a way to shape my thinking and a new blog post when developing the slides and manuscripts.
- Linked Data in Pharma
A brief presentation of a short paper we have got accepted for the first
international workshop on linked web data management in Uppsala, 25 March. The title of the paper is; Linked Data, an opportunity to mitigate complexity in pharmaceutical research and development (link to be added). I have written it together with my colleague Bosse Andersson.
- Semantics for Clinical Data
- Linked Clinical Data
I did find it hard to start working on this with all the terrible news on what is happening in Japan just now. Kudos to Jim Hendler and Ivan Herman for their tweets today on the power of linked open data with an interactive map using open earthquake data.
Background, Audiences and Intentions
Some brief notes on the background to my participation in the three events, and also on what I know about the audiences, and my intentions with what I will to talk about.
1.
Linked Data in Pharma
The first one is an event I learned about on the Twitter feed for
#linkeddata. It's a workshop on linked data management arranged in conjunction with a conference on database technology. We saw this as an opportunity to go to a workshop here in Sweden on this interesting topic. We decided to re-write an article from last year for an internal publication to describe some insights from working in the
W3C interest group for semantic web in Health Care and Life Science (HCLS), and in the
Large Knowledge Collider (LarKC) EU-project.
The article we started from had an intended audience of colleagues in a pharma company with no knowledge of the standards and principles behind the huge cloud of linked open data.
While the participants in the workshop will be highly knowledgeable researchers and practitioners in linked data management. My hope is that we during 2011 will have more internal experiences to report from in an extended paper as the linked data idea now also get a lot of interest internally.
2.
Semantics for Clinical Data
The second event is the result of interactions we have had with Bernhard de Bono, leading the Drug Disease Modeling Resources (DDMoRe) one of the projects in
Innovative Medicines Initiative (IMI). I met Bernard in an EBI industry workshop on ontology engineering last year and we talked about existing metadata standards for clinical data and the opportunities in ontology based annotations of clinical data.
The list of attendees includes people from many of the European pharma companies and also from research centers such as
EBI and
INSERM. I assume many of the them work in the pre-clinical / drug discovery phase and have a bioinformatics focus, so together with the people from
CDISC I hope to to be able to add a clinical perspective.
My contribution will be some reflections on different approaches to provide semantics along with clinical data. As it has been done when a lot of the semantics, that is the knowledge on what clinical data represents, have been implicit and carried by people and documents And how semantics now is made explicit for humans as standardized data exchange containers, e.g. the CDISC SDTM domain for Lab test data, and as text strings of standardized codes and labels, so called controlled terminologies e.g. the list of lab test procedure codes, to simplify the programming to transform, integrate and analyze data. By linking to Bernard's presentation on the RICORDO 2] toolkit for semantic integration of biomedical resources I will outline how clinical data can be annotated with ontology based standards making the semantics explicit using formal and machine processable formats. I will also briefly talk about how clinical metadata registries could be used to support ontology based annotation.
3.
Linked Clinical Data
The abstract I proposed for the third event was triggered by the frustration I interpreted from the
FDA representatives at CDISC Interchange US in 2009. And a follow-up to the brief discussions I had with some of the CDISC folks on linked data principles and semantic web standards. Here is how Jay Levin, expressed it in the FDA panel in November 2009:
We want to separate the analysis view from how clinical data is exchange. To have a very normalized, flexible way to convey the data as it actually was collected, as it occurred. And than from that create any number of disease area specific views and analysis specific views. You have tremendous options. So, instead of being looked into this difficult dance that I see happening with SDTM then you always try to decide how useful it’s going to be for correct analysis vs. how consistent it could be if you free up the potential ways data can be represented for disease specific areas. 1]
My key message will be some proposed pragmatic steps for how the CDISC standards can be published using the
5-star rating scheme for linked open data described in my second blog post.
The title of the CDISC track is "eHRs and the World Beyond", and Patient Controlled Health Records (PCHR) or
Personal Heath Records (PHR) e.g. Google Health, could be the next big thing. So, I will also as food-for-thought include a slide from the explorative work we do on leveraging semantics developed for PCHR also for clinical research data. That is, the
Computer-Based Patient Record (CPR) ontology developed by Chimezie Ogbuji, Case Western Reserve University's Center for Clinical Investigation, previously Cleveland Clinics.
1] Jay Levin refereed to the HL7 standards as a the "normalized, flexible way". He and others from FDA earlier in 2009 did some initial statement on moving from CDISC's SDTM standards to HL7's CDA (Clincial Document Architecture) standard for submissions of clinical data. This was not well received by CDISC, nor by the representatives from pharma and CRO companies. During 2010 FDA and CDISC came to a common agreement on CDISC SDTM. (That is, the 40+ different container with standardized variable names, and the evolving controlled terminologies.) See two posts on CDISC's blog: Clear Messages from FDA CDER and CBER and FDA CDER Data Standards Plan V 1.0 and PDUFA IV IT Plan Update
2] Researching Interoperability using Core Reference Datasets and Ontologies for the Virtual Physiological Human (RICORDO)