Monday, July 29, 2013

Semantic Trilogy report part 1

It's been two very nice summer weeks of vacation after I got home from a week at the Semantic Trilogy events in Montreal, Qc, Canada. (See my previous blog post: Semantic Trilogy preparations.) Here's the first part of my report from seven intensive days of conferences, tutorials, workshops and great discussions with researchers in biomedical ontologies and data integration in life sciences.

It was very nice to meet colleagues from other pharma companies; Sanofi, UCB and NovoNordisk, and to discuss with early adopters in traditional software vendors, such as Siemens, and with experts from niche vendors, such as IO Informatics. It was also nice to discuss common topics, such the use of semantic web standards and linked data principles on for example clinicaltrials.gov, with key individuals such as Olivier Bodenreider, NLM (National Library of Medicine).

Notes
During the two main conferences I used Twitter as my note book and in the evenings I gather tweets and related links in two Storify items:
  • ICBO2013
    Storify: 4th Interational Conference on Biomedical Ontology (ICBO), 7-9 July
  • DILS2013
    Storify: 9th Conference on Data Integration in the Life Sciences (DILS), 11-12 July 
My poster
The last evening I presented a CDISC2RDF poster on our joint AstraZeneca and Roche CDISC2RDF project, now part of the FDA/Phuse Semantic Technology working group. I really enjoyed the discussions it triggered.

I'll be back in mid August, after couple of days of trecking in the Swedish mountains, with more details about the papers, presentations and discussions I did find most interesting. (For a first glimpse of two of them see this blog post from HL7 Watch by Barry Smith: An OGMS-Based Model for Clinical Information.) 

Monday, June 24, 2013

Semantic Trilogy preparation

The Swedish Midsummer weekend is over and it's time to look forward. Saturday 6th to Friday 12th of July I'll attend the Semantic Trilogy in Montreal, Qc, Canada.

I plan to attend these events during the week:
In 2011 I, together with three colleagues, attended the ICBO 2011 event (see my three blog post: Preparations part 1 and part 2,  report). So, I look forward to reconnect with people in the OBO (The Open Biological and Biomedical Ontologies) community.

And to meet F2F interesting people in the W3C HCLS (Semantic Web Health Care and Life Sciences Interest Group). And people interested in ontologies and semantic web working for e.g. Sanofi, Novo Nordisk, Mayo Clinic.

I'm also very happy that I'll get the opportunity to attend my third semantic web related event in Canada.
  • In 2007 I attended the WWW2007 conference in wonderful Banff.
"During the WWW2007 conference a breakthrough of the Linked Data idea happened in a session where web experts demonstrated the power of a new generation of the web, a web of data. For us attending the session it was hard to imagine the full potential on what this idea would mean for individual scientists and for a pharmaceutical company." 
From  Linked Data, an opportunity to mitigate complexity in pharmaceutical research and development, Bo Andersson and Kerstin Forsberg, LWDM 2011 

And yes, I do hope to also get some time during the weekend to visit the Jazz Festival.

Tuesday, June 11, 2013

Standards for common aspects

Through the last three years I have been engage with different groups working on standards, both for data exchange, such as CDISC, and for vocabularies such as MedDRA MSSO and NCI EVS. As they now start to see the value of using "standards for standards".


Push Back
From Flickr bitpuddle
(Twitter
@eric_d_hancock)

Standards for standards

So, "I push back" to standard organisations to use semantic web standards and linked data principles to make their standards directly usable for humans and for machines.

A good example is CDISC and their growing interest in using semantic web standards (based on RDF, Resource Description Framework): CDISC2RDF. For some background see Clinical studies and the road to Linked Data. Today FDA, CDISC, pharma:s, CRO:s and software vendors are working together on this in a FDA working group for Semantic Technology organised by PhUSE.



Standards for common aspects

The last year or so, I have also tried to keep up to date with groups developing RDF-based standards for common aspect such as:
  • data descriptions (VoID)
  • data provenance and versioning (PROV and PAV)
  • concept based vocabularies and value sets (SKOS)
  • multi-dimensional statistical data (RDF Data Cube)
I try to ensure that we have a good view of the maturity and applicability of these standars so we can use them in our internal“integration factory”. But most of all “push back” to vendors. I foresee that we in the same way started to add requirements on web-interfaces for better end user usability back in the late 90:ies, we now should start to add requirements on web-interfaces for better machne usability. So we need to to understand how to incorporate these common aspects in our URS:s, RFI:s RFP:s etc..

For software vendors to use RDF-based standards for common aspects, for example:
  • MediData's Rave and Perceptive's IMPACT to describe datasets using VoID.
  • Accelrys' Pipeline Pilot to use W3C PROV.
  • Microsoft's SharePoint to use term sets for tagging in SKOS.
  • SAS Institute's Drug Development to create analysis results using RDF Data Cube.

So, this interview with Reza B'Far, Vice President of Development, Oracle on the W3C blog made me vryy glad: Oracle on Data on the Web
Oracle to use W3C provenance standard to create a single audit time line across systems
"One of the hugest problems we faced was maintaining transaction audit trails in a heterogeneous environment in a standard and compatible way. Audit trails are described with literally millions of different formats in different organizations. This used to mean it was impossible to create a single audit time line. PROV solves this problem. We now provide (and consume) a PROV feed that unifies the audit trails generated by transactions across heterogeneous systems."
See also the Implementation report with 60+ examples of usage of the W3C Provenance specifications.

For a nice intro to the W3C Provenance Specifications, see the tutorial by Paul Groth (@pgroth) at the Extended (European) Semantic Web conference.


Saturday, May 25, 2013

Three Linked Data meetings in Sweden

I'm back after two nice day in the south of Sweden. Yesterday, 24 May, I attended the first meetup for Linked Data in Malmö.


This was the third Linked Data meeting in Sweden. They have all been great events with more than 30 attendees each. I do hope these will encourage more friends and colleagues In Sweden across academia, industry, consult companies and government to start applying the Linked Data principles and use the stack of Semantic Web standards. 

Links to all three events:
Kudos to Bosse Andersson (@bBalsa), Marie Gustavsson-Friberg (@mariegus)
and Eva Blomqvist (@evabl444) for arranging. I look forward the next one!

Sunday, March 31, 2013

Talking to machines

The last week I remotely followed two events while commuting, two events related to Evidence Based Medicine (EBM), both took place in Oxford:

+Ben Goldacre did speak at both events. At the Cochrane event he talked about getting better in talking to the Public, to Policy makers and to Machines. In the last part of his talk: Talking to Machines he says "That it's odd how we share results of RCTs (Randomised, Controlled Trials) in C19th essay format!" This is also how Cochrane Collaboration share reviews and meta-analyses of clinical trial data.


Structured data in RDF

Instead we should use "C21th structured data standards". I was especially pleased to hear how he was even more explicit: "Publish in RDF a good, quality standard, nice data format" [at 36.50 mins]

See also what the web development director at Cochrane, +Chris Mavergamessay in his excellent presentation on how linked data can help free content from the 'container of the article'.


This is related to our the work we do on linked clinical data standards, see my recent blog post: CDISC2RDF. That is, a semantic web versions of  data standards for clinical data on subject/participant level.

Clinical Data Transparency 

Given the recent move towards clinical data transparency (see a good summary in Nature this week Drug-company data vaults to be opened) I foresee a discussion also on data standards for the summary level data in clinical study reports and per-reviewed papers using semantic web standards.

An alternative could be to represent tables in the reports and paper as RDF using the RDF Data Cube Vocabulary (for multi-dimensional statistical data), see the CSVImport and the CubViz projects (Representing and browsing multi-dimensional statistical data as RDF using the RDF Data Cube Vocabulary, previously called Stats2RDF) This EU/FP7 project has used this vocabulary to publish biomedical statistical data, e.g. the WHO's Global Heath Observatory dataset (see Publishing and Interlinking the Global Health Observatory Dataset).

A challange is to express the clinical trial design and other contextual information as structured data to make it easier to make informed decisions for trial reviews and cross trial analyses.

Tuesday, February 12, 2013

CDISC2RDF

In a recent article from semanticweb.com (The Voice of Semantic Web Technology and Linked Data Business) the project CDISC2RDF is nicely decribed: Clinical Studies And The Road To Linked Data.

The project will be presented at the Conference on Semantics in Health Care & Life Sciences (CSHALS) meeting at the end of February by Charlie Mead, co-chair of the W3C’s Health Care and Life Sciences Interest Group (HCLSIG).

Here is a slide deck describing the first deliverable of the project. A refined slide deck will be presented at the CSHALS meeting together with a couple of CDISC2RDF blog post to describe the transformation process.

Saturday, December 29, 2012

My MOOCs Spring 2013

Great to see that the news program on SVT (Swedish Television) described MOOC (Massive Open Online Courses) in a new story the other day.

SVT Nyheter, 27 Dec. 2012:Toppuniversitet ger gratiskurser på nätet.  

During 2012 I have followed a few courses via one of the organisations mentioned in the news program: Coursera.Two of the courses were excellent: Model Thinking and  Fundamentals of Pharmacology, and they are on Coursera's list of 211 (!) courses. While the course in "Software Engineering for Software as a Service (SAAS") was not of the same high quality, and it's not on the list anymore.

For the Spring 2013 I have enrollod three MOOCs. So, now I know what to do while commuting 2 hours per day also the coming months :-)

It's great to see how all of this have taken off during 2012 offering courses not only for data nerds as myself but also for many others.

So, I was thinking of my sister when I read these teasers from Coursera:
  • "Ever wonder why people do what they do? This course offers some answers based on the latest research from Social Psychology."
  • "In the course Introductory Human Physiology students learn to recognize and to apply the basic concepts that govern integrated body function (as an intact organism) in the body's nine organ systems."