Sunday, March 31, 2013

Talking to machines

The last week I remotely followed two events while commuting, two events related to Evidence Based Medicine (EBM), both took place in Oxford:

+Ben Goldacre did speak at both events. At the Cochrane event he talked about getting better in talking to the Public, to Policy makers and to Machines. In the last part of his talk: Talking to Machines he says "That it's odd how we share results of RCTs (Randomised, Controlled Trials) in C19th essay format!" This is also how Cochrane Collaboration share reviews and meta-analyses of clinical trial data.


Structured data in RDF

Instead we should use "C21th structured data standards". I was especially pleased to hear how he was even more explicit: "Publish in RDF a good, quality standard, nice data format" [at 36.50 mins]

See also what the web development director at Cochrane, +Chris Mavergamessay in his excellent presentation on how linked data can help free content from the 'container of the article'.


This is related to our the work we do on linked clinical data standards, see my recent blog post: CDISC2RDF. That is, a semantic web versions of  data standards for clinical data on subject/participant level.

Clinical Data Transparency 

Given the recent move towards clinical data transparency (see a good summary in Nature this week Drug-company data vaults to be opened) I foresee a discussion also on data standards for the summary level data in clinical study reports and per-reviewed papers using semantic web standards.

An alternative could be to represent tables in the reports and paper as RDF using the RDF Data Cube Vocabulary (for multi-dimensional statistical data), see the CSVImport and the CubViz projects (Representing and browsing multi-dimensional statistical data as RDF using the RDF Data Cube Vocabulary, previously called Stats2RDF) This EU/FP7 project has used this vocabulary to publish biomedical statistical data, e.g. the WHO's Global Heath Observatory dataset (see Publishing and Interlinking the Global Health Observatory Dataset).

A challange is to express the clinical trial design and other contextual information as structured data to make it easier to make informed decisions for trial reviews and cross trial analyses.