Monday, February 16, 2015

Clinical Trial Data Transparency and Linked Data

I've with great interest been following the discussions about clinical trial transparency and sharing of clinical trial data for the last three years. More precisely - my first tweet about this is from early 2012:


There has been a lot of debates over these years of how much of results of clinical trial results being published - is 50% or much more? Journal article publications vs trial registries? A lot of issues around summary level data vs. patient level data, and around de-identification of data and redaction of documents etc.

All interesting topics but my interest in all of this is the opportunities in making data in, about and related to clinical trials, useful using semantic web standards and linked data principles. In the spring 2013 I wrote a post on my blog: Talking to Machines, about this after listening to Ben Goldacre, one of the key people behind the AllTrials initiative where he also acknowledged this:




Here are a couple of recent events, early 2015, related to Clinical Trial Data Transparency and Linked Data:
  • AAAS Panel on Innovations in Clinical Trial Registry
  • Public consultation EMA Clinical trial database
  • IoM report: Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk

AAAS Panel on Innovations in Clinical Trial Registry

So, I really liked what I saw in the program for a session yesterday evening (15 February, 2015) from the American Association for the Advancement of Science annual meeting in San Jose (#AAASmtg) in a panel on Innovations in Clinical Trial Registers
Documents relating to trials -- protocols, regulatory summaries of results, clinical study reports, consent forms, and patient information sheets -- are scattered in different places. It is difficult to track the information that is available, in order to audit for gaps in information and for doctors and regulators to be sure they have all the information they need to make decisions about medicines. There is an unprecedented opportunity to refine how clinical trial data are shared and linked.

Public consultation EMA Clinical trial database

This is similar to what I wrote last week when I tried to "act courageously" and responded to "the public consultation on how the transparency rules of the European Clinical Trial Regulation will be applied in the new clinical trial database is launched by the European Medicines Agency (EMA)."
Make use of modern data standards and access methods to make the access to the clinical trial database developer-friendly, data machine-processable and the trials and their components linkable. Leverage initiatives and use principles, such as CDISC Standards in RDF (under review), that uses modern data standards from W3C stack of semantic web standards, openFDA that uses developer-friendly REST APIs JSON (openFDA API reference), and the linked data principles.

IoM report: Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk

A couple of weeks ago the Institute of Medicine (IOM) released an excellent report: Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk.

Short summary, as I interpret the core message of the report: Instead of just designing and planning a study, scientists need to plan and document how they're going to share the data from that study so that its usable to others who may want to re-analyze it.

The report has a well written section on “legacy trials” and an interesting listing of challenges:

Infrastructure challenges—Currently there are insufficient platforms to store and manage clinical trial data under a variety of access models. 
Technological challenges—Current data sharing platforms are not consistently discoverable, searchable, and interoperable. Special attention is needed to the development and adoption of common protocol data models and common data elements to ensure meaningful computation across disparate trials and databases. A federated query system of “bringing the data to the question” may offer effective ways of achieving the benefits of sharing clinical trial data while mitigating its risks. 
Workforce challenges—A sufficient workforce with the skills and knowledge to manage the operational and technical aspects of data sharing needs to be developed. 
Sustainability challenges—Currently the costs of data sharing are borne by a small subset of sponsors, funders, and clinical trialists; for data sharing to be sustainable, costs will need to be distributed equitably across both data generators and users.

And for a ”clinical trial data and metadata nerd” as me this is like music :-)

Just because data are accessible does not mean they are usable. Data are usable only if an investigator can search and retrieve them, can make sense of them, and can analyze them within a single trial or combine them across multiple trials. Given the large volume of data anticipated from the sharing of clinical trial data, the data must be in a computable form amenable to automated methods of search, analysis, and visualization.


To ensure such computability, data cannot be shared only as document files (e.g., PDF, Word). Rather, data must be in electronic databases that clearly specify the meaning of the data so that the database can respond correctly to queries. If data are spread over more than one database, the meaning of the data must be compatible across databases; otherwise, queries cannot be executed at all, or are executable but elicit incorrect answers. In general, such compatibility requires the adoption of common data models that all results databases would either use or be compatible with.