Linked Data for Enterprises: November 2016

I have followed the development of OpenTrials (@opentrials) since Ben Goldacre's (@bengoldacre) first comments about the lack of an open infrastructure to improve the sharing of information about clinical trials. See my blog post from 2013 Talking to machines.

It was nice to be able to give some initial feedback on the human user interface earlier this year. And very happy to see the API for programmatic data access. In this blog post I ask for some clarifications about Study URIs as a key enabler to link information about studies.

Intro to OpenTrials

Hello #opendata fans! Have you tried https://t.co/0I5xFcGDm1 yet? It brings together all data and documents about all #clinicaltrials pic.twitter.com/E0LwdAUvJN
— OpenTrials (@opentrials) November 2, 2016

I couldn't make it to the recent Hack day in Berlin just before beta version was launched at the World Health Summit. But it was great to follow the two events via the Twitter feed.

For a short intro to OpenTrials, watch this short video from the launch with Ben Goldacre.

Human user access and Programmatic data access

The user access to search the 300.000+ trials is via https://explorer.opentrials.net/. For programmatic access via APIs I find the blog post from the hack day excellent. It includes links to the API documentation (in Swagger), to a notebook showing sample code (Python) and to another example using R.

Code from the OpenTrials Hack Day in Berlin (photo by benmeg / CC BY)

I point colleagues in industry to this, and also to the OpenFDA, as two great examples of access to data both for humans and for programs. We have lots to learn from these two open data initatives, both when we define requirements and develop solutions.

I was also glad to see a comment from Ben Meghreblian (@benmeg), OpenTrials community manager, in an interview for the AllTrials initative the other day: "While API access is very useful, the best way a registry can offer its entire database is as a regular download, similar to what the FDA does with its OpenFDA website."

Study URIs

In the same interview Ben also concluded:

#OpenTrials community manager @benmeg on the change needed for greater #clinicaltrials #transparency: https://t.co/FwlLQ7Ma8S #alltrials pic.twitter.com/QZswPpIdBD
— OpenTrials (@opentrials) November 3, 2016

One thing we (IMHO), both in open data initatives and in industry, "need to spend a little on making sure the information is discoverable, machine readable, and impactful" is to establish persistent URIs as Identifiers of studies. So, instead of a text string such as "D5135C00001" as a secondary/sponsor identifier in e.g. CT.gov. I am pushing for study http-based URIs such as:

http://clinicaltrials.astrazeneca.net/study/D5135C00001

A first step is an internal process to assign URIs to both old and new studies, and also an internal study look-up API service. This study lock-up API provide basic study descriptions, such as study phase and acronym and is presented on a study "home" page with the same http address as the URL. On this page we also provide other identifiers for the same study e.g. CT.gov's NCT number: "NCT01732822" and link to it using the URL.

https://clinicaltrials.gov/ct2/show/NCT01732822

I have argued for study URIs from CT.gov but my understanding from interactions with some of the people behind it - they see their URLs as pragmatic, persistent study URIs. So Study URIs = study Page URLs.

I would like to also include the identifier for the same study represented in OpenTrials. Either as a study URI distinct from the study page URL, or deliberately using the same http schema for them. I may think the current ones are locating study pages (URLs) rather than identifying studies (URIs), for example:

https://explorer.opentrials.net/trials/9b48fd6a-2c6c-4455-bcc2-b1aff574298e

It would be great to have some clarifications about this. What I would like to have are namespaces for study identifiers (e.g. azct, nct, and opentrials) so I make assertions like these about the same study.

<azct:D5135C00001> <owl:sameAs> <nct:NCT01732822>
<azct:D5135C00001> <owl:sameAs> <opentrials:9b48fd6a-2c6c-4455-bcc2-b1aff574298e>
<azct:D5135C00001> <azct:hasAcronym> "EUCLID"

I have also posted this as an issue (#552) on OpenTrials Github

Linked Data for Enterprises

Pages

Saturday, November 5, 2016

OpenTrials

Intro to OpenTrials

Human user access and Programmatic data access

Study URIs