Sunday, December 11, 2011

Linked Enterprise Data Patterns Workshop

Earlier this week I followed yet another event remotely. This time the workshop arranged by W3C on Linked Enterprise Data Patterns, in Cambridge, MA. So, I had some nice hours on the bus in the dark evenings and mornings over here in Sweden when I followed things on the:

Here's a couple of things I did find extra interesting:

An article on IBM developerWorks presented by Martin Nally: Toward a Basic Profile for Linked Data, A collection of best practices and a simple approach for a Linked Data architecture

New role proposed by Tim Berners-Lee (@timberners_lee) "Chief Identity Officer".

IBM DB2 will include RDF support sometime in 2012.

I have followed the work of Eric Prud'hommeaux, W3C, on access controls and policy medication to enable networks of parties across industry, health care, and academia to share sensitive data such as clinical records.
Two papers on identity and URI:s with interesting people as co-authors that I'll read in more detail:
And, finally, a quite interesting discussion on 'silo folks & data integration folks' between David Wood and Bradley P. Allen captured by Sandro Hawke (@Sandhawke) in the irc channel log/scribe from the first day.

davidw: Where RDF really shines is in crossing silos, connecting things where traditional approaches have left off. 
davidw: Some orgs that have succeeded well (DoD, O'Reilly), they built a new team and hire ontologists if they need them, they get consultants in, they build a skunk works to do that bit between the silos.  They leave the DBAs in place, because the DBA stuff still needs to get done.
davidw: And they have consultants/new team to build out that bridging infrastructure.  You're not going to convert your silo folks -- really good at silos -- into data integration folks. 
Allen: That's what we're doing, with a startup group, showing we can solve this interop problem. 
Allen: When people see this, they perk up, and want to know more.


Other blog posts from the conference:




Sunday, December 4, 2011

Large organisations using Semantic Web

Earlier his week the east version of the Semantic Tech & Biz Conference took place in Washington, DC. And I followed it via the #semtechbiz feed on Twitter. The activity in this feed was lower than at the much larger west version that took place in San Francisco early June. An event I also followed remotely, see my blog post: SemTech2011 report

Below I highlight one of the many case studies presented in the conference in Washington, DC, on the theme "here is what we did", that is what U.S. military (DoD) do in their so called Enterprise Information Web. Further down you find examples of what Chevron and Statoil did in the oil industry. In two side notes I wunder about the use of semantic technologies in Norway, and I am reminded of some explorative work I did ten years ago on Topic Maps and Published Subject Identifiers (PSI:s). 

Enterprise Information Web
One of the many case studies presented in the conference was the U.S. military (DoD Defense Information Systems Agency) Enterprise Information Web. In the recent RFI, Request for Interest, they write "the envisioned EIW is built on semantic web, which will allow better enterprise-wide collection, analysis and reporting of data necessary for managing personnel information and business systems, as well as protecting troops on the ground with crucial intelligence."

A YouTube video with Dennis E. Wisnosky, Chief Technical Officer and Chief Architect at DoD
See also: DoD
Turns to Semantic Web To Improve data Sharing

As being a non-American I do find it a bit hard to relate to DoD and some of the critical comments to the YouTube video. However, as I wrote in one of my tweets: 30+ years ago U.S. military needed Internet - now they use Semantic Web standards and Linked Data principles. And I think this video gives some really nice explanations.

How two large organisations in oil industry use semantic web
This week I also saw another interesting case study, that is how the semantic web standard OWL is used in the oil industry. In an interview with Roger Cutler, published on the W3C blog, he describes the typical situation in most large organisation where information "lives in different forms in number of different systems and is handled separately by different organizations with different data models", and he talks about how this traditionally have beed adressed:
People use point-to-point solutions or big data warehouses, but neither approach scales gracefully. Point-to-point solutions become very complex and hard to maintain. Data warehouses create replication issues and tend to be fragile. So, the possibility of a smarter, more agile, more cost-effective way of dealing with integration would have a great deal of value to us. The Semantic Web is not guaranteed to be the solution, but it looks plausible and we’d like to see if it lives up to its promise in practice.
I also noted that Roger Cutler, Research Consultant at Chevron Information Technology Company, talks about the "expressiveness and reasoning achievable with OWL". I like that because I sometimes hear comments a long the lines that OWL, and OWL2, is too complex and maybe not so useful in an industrial setting. In the interview Roger say:
We have demonstrated a case in which similar objectives were obtained in the context of an ontology with about fifteen lines of readily comprehensible rules and in a relational database context with over 1000 lines of pretty complex code.
I also see that there exists a W3C Oil, Gas and Chemicals Business Group also with an representative from Statoil, Jennifer Sampson. And I now also see an interesting case study presented by Jennifer at the SemTech conference in San Francisco: Semantic Technologies and Statoil's Integration Layer for Plant Information Systems.

Side note: Semantic technologies in Norway
The Statoil presentation looks really interesting and is a trigger for me to catch up with how semantic technologies are used in Norway. Have been thinking about that for some time. I visited Statoil's office in Stavanger a couple of years ago to talked about metadata standards. And I see some interesting signals that semantic technologies have much been more used in Norway than in Sweden.

Side note: Topic Maps and Published Subject Identifiers (PSI:s)
Back in 2002, before the OWL standard existed and Linked Data principles was defined, I supervised a master thesis with an Evaluation of Topic Maps for information navigation in cardiovascular research. Topic Maps is a semantic technology that has a strong presence in Norway. The master students I supervised worked together with Steve Pepper, the Topic Maps guru. A key learning I took away from some really good discussions back in 2002 with Steve, and also Lars Marius Garshol (@larsga), was the idea of Published Subject Identifiers (PSI:s). In a future blog post I will do a recap of PSI:s and try to relate it today's http-based URI:s as a one of the Linked Data principles.


Kudos to Bernadette Hyland (@BernHyland) and Dave Smith (@DruidSmith)
for their #semtechbiz tweets. And also to @semanticweb for the great news service:
"Voice of Semantic Web Technologies and Linked Data Business" and to the @W3C blog.


Sunday, October 23, 2011

Query Federation and Linked Closed Data

This is a blog post with highlights from the 10th International Semantic Web Conference taking place in Bonn, that I picked up while following the event on distance.

>> Updated 2 November with a presentation by Peter Haase on Fedbench, see below. And also with this the nice blog post by Ivan Herman's (the leader for W3C's Semantic Web work): Some notes on ISWC2011…

>> Updated 13 November with a link to a paper on federated search in life science, see below.

Today and tomorrow, Sunday - Monday 23-24 October, are the workshops days with 16 workshops arranged before the main conference. Through the day I have on and off been catching up on the busy Twitter feeds on my iPhone while being out walking in the nice weather on the West Coast of Sweden. And now in the evening I have picked up two things of that I did find extra interesting from an enterprise perspective on linked data and URI:s.

Query Federation
Being able to do federate querying of data from different internal and external data sources is a key capability required in an enterprise context. An interesting paper presented in the Consuming Linked Data Workshop describes how this can be done using the VOID standard (Vocabulary of Interlinked Datasets): SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions by Olaf Görlitz and Steffen Staab.

The paper use a scenario of researchers in the life science domain have numerous databases at hand which contain detailed information about pathways, genes, proteins, drugs and so forth. It describes and evaluate a framework called SPLENDID including an Index Manager, a Query Optimizer, and a Query Executor.

A take away highlighted in the Twitter feed from the presentation of the paper is the value of publishing VOID data for Linked Data set. The paper also includes references to an interesting product that I have spotted in other tweets earlier on: FedX, a framework for transparent access to Linked Data sources through a federation using optimization techniques. See also a recent discussion thread: SPARQL Federated Query Clients, on W3c's Linked Open Data email-list. I also find this paper from 2009 by key people in W3C's interest group for semantic web in life science highly relevant: A journey to Semantic Web query federation in the life science.

Later on in the conference a research paper was presented that I adress topic of central repositories vs. federated querying and processing


Linked Closed Data
In an enterprise context the recognition of the use of transparency and the value of open sharing of data is getting more and more traction. By applying the Linked Data principles corporations can enable meaningful use of data. See my previous blog post on Corporate Transparency and Linked Data. At the same time there are of course datasets for which access to and use of the data is subject to legal, business, data privacy or ethical restrictions which go beyond attribution and share-alike obligations.

A vision paper presented at the Consuming Linked Data Workshop outlines A research agenda for Linked Close Data by  Marcus Cobden, Jennifer Black, Nicholas Gibbins, Les Carr, and Nigel
Shadbolt. The authors defines  Linked Closed Data as Semantic Web datasets which are published in accordance with Linked Data principles, but which include access and licence restriction.

I was glad to see that Ivan Herman in his blog post also highlight this: "we can and we should speak about Linked Closed Data alongside Linked Open Data is important if we want the Semantic Web to be adopted and used by the enterprise world as well."


Thanks Kudos to @juansequeda and @ivan_herman for your great tweets today.
(While writing this blog post I can see on Twitter that the folks at the conference in Bonn now have a linked data gather and getting ready to play "#semanticbeerpong" :) For me it's time for a cup of tea instead ...)

Monday, September 19, 2011

Semantic Interoperability in 4 tweets




Thursday, September 1, 2011

ICBO2011, Disease terminologies and ontologies

This is my fourth blog post from the International Conference on Biomedical Ontology (ICBO) 2011, in Buffalo, NY. This time I will focus on disease vocabularies. In earlier blog posts I have highlighted the differences between two types of vocabularies:  
  • Vocabularies of terms for concepts organized as terminology hierarchies (e.g SNOMED CT), classification systems (e.g. ICD and MedDRA) being used as coding nomenclatures for diseases, or rather diagnoses, in EHR, clinical trials and patient safety databases.
  • Vocabularies of terms for types of entities in reality, and of the relationships between such entities, structured in ontologies according to the best current scientific understanding of physiological and pathological processes. 

In my previous blog post from ICBO I listed examples of high quality, "true", ontologies, and also different approaches to manage "Mapping mania" for the legacy of terminologies. See also another blog post that describes very well how terminologies, relates to ontologies, and also to information models etc.:  Why Do We Need Ontologies in Healthcare Applications.

In this blog post I use a review of a common terminology, that is SNOMED CT, and the Mental Disease Ontology under development, as examples to highlight problems and potentials with these two types of vocabularies. 

Terms for concepts organized as terminology hierarchies
While working on this blog post I saw a posting on Google+ (that is the new social media tool excellent for online discussions) pointing to a recent report  from practical use of SNOMED CT in a commercial clinical system focused on cardiovascular and respiratory diseases, and diabetes mellitus. The Google+ posting came from Alan Ruttenberg, one of the key people in the biomedical ontology (OBO) community and organiser of the ICBO event. The main author of the paper Alan pointed to, Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications, is Alan Rector, one of the key people in the biomedical terminology community.

Systematized Nomenclature of Medicine–Clinical Terms (SNOMED CT) is now mandated in the USA, UK, and several other countries for coding of clinical problems in EHR. The SNOMED identifiers, codes such as 38341003 for the term 'hypertensive disorder', provide a stable reference point for coding of diagnoses.  And it is one the key terminologies in the EHR4CR IMI-project, for example when querying EHR data for protocol feasibility.
"When doctors apply SNOMED codes to a patient, they are stating that those codes and all their ancestors in the hierarchy apply to that patient. When researchers use codes in queries, they are querying for those codes and all of their descendants." 


Source: Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications, J Am Med Inform Assoc. 2011 July; 18(4): 432–440. 

The article lists, and exemplifies, the major types of problems when using SNOMED-CT hierarchies. It also illustrates existing hierarchies, for example for hypertensive disorder (A) and a suggested revised hierarchy for Hypertension (B). 

The authors’ conclusion is quite tough:
“… anyone using SNOMED codes should exercise caution. Errors in the hierarchies, or attempts to compensate for them, are likely to compromise interoperability and meaningful use.”
Terms for types of entities in reality structured in ontologies
In preparations for the conference I studied one of the disease area ontologies under development: Mental Disease Ontology. I do not have any medical insights into this disease area, but became interested in it because it uses the Ontology of General Medical Science (OGMS)
OGMS is a so called mid-level ontology. The objective for it is to support research on Electronic Health Record (EHR) technology and integration of clinical and research data. My interested in OGMS started at the Clinical Trial Ontology workshop at the NIH Campus in Bethesda, MD., in 2007. When the OBO community took the insights and best practice from developing large biology ontologies (such as the Gene Ontology and the Protein Ontology) the framework called OBO Foundry, into the clinical space a couple of things were often confused:

  • The process of observing, the results of the observation and what is being observed
  • Disorders and diseases on the one hand and diagnoses on the other

To address these, and other confusions, the development of OGMS started.
"OGMS comprises representations of highly general universals in the domains of anatomy, physiology and pathology, of diagnosis and treatment, and of information artifacts such as clinical histories and lab test results.” 
From the paper:  Research Foundations for a realist ontology of mental disease, authored by Barry Smith and Werner Ceusters, two of the key people in the biomedical ontology (OBO) community. In this paper the authors describe how the development of an ontology for mental disease addresses the need for acceptable definitions for 'mental disorder', 'disease' and 'illness' as it has been called out in the research agenda for the new edition (DSM-V) of the Diagnostic and Statistical Manual, scheduled for release in May 2013. 

The authors defines three different list of types of entities according to the best current scientific understanding in the domain of mental diseases:

  • Mental health related entities that can exist in the absence of any mental disorder, using terms to denote these entities such as behavior and interpersonal process
  • Mental disorder related core entities, e.g. using terms to denote these entities such as pathological mental process and mental disease course
  • Diagnosis related core entities using terms to denote these entities such as disease picture components and  collection of marker features for disease X (e.g. Diagnostic Criteria for Asperger's Syndrome and for ADHD) 

I find this statement of the authors highly interesting: 
“We do not suggest that all the terms proposed in the above should be used by clinicians, although moves in this direction would help to make medical jargon less ambiguous (while at the same time potentially bringing other costs). What is more important is a broad recognition of the existence of the types of entities denoted by these terms, since without this broad recognition we will not achieve the sort of terminological clarity that is needed for computational purposes such as integration of mental health data with biological and other sorts of data. Finding better terms for the entities in question is, in this light, a secondary issue.”
Some reflections
As outlined in one of my earlier blog post in preparation for ICBO I hoped to better understand the emerging trend of well design “true” ontologies. And at the same time understand how we better can use legacy terminologies, such as SNOMED CT, and data coded with their aid can be successfully used for information-driven clinical and translational research. By attending ICBO I have got a much better understanding of the problems and potentials of the two different types of vocabularies. However, I still struggle to understand how to combine them short and long term.


Kudos to @alanruttenberg for a great ICBO conference and for the Google+ posting,
and also to @jamoussou for the great blog post on why we need ontologies.

Wednesday, August 31, 2011

Ideas on Linked Open Transportation Data for TravelHack

Earlier this summer I saw some tweets about a nice event here in Gothenburg: West Coast TravelHack 2011, 8-9 October. As I am a daily commuter (with Västtrafik's trams, buses and trains) and an information architect addicted to the linked data idea, and I also have a background as researcher in mobile informatics, I got two ideas and wrote them up as tweets (tweet 1 and tweet 2)

Today, I saw some tweets linking to two articles about the interesting FixMyTransportation:
Looking for hackers
I was reminded of my two ideas and also of my time as a part-time industrial PhD researcher. My research in the Mobile Informatics group, at the Victoria Institue and IT University, concerned the mechanisms needed to provide highly mobile professionals, such as new journalists, with contextualized information using mobile applications: "Mobile Newsmaking" (thesis, presentation)

So, I posted a tweet about FixMyTransportation it in Swedish and Karl-Petter Ă…kesson (@kallep), an old friend from my time as part-time researcher, kindly replied and said in his tweets back (tweet 1 and tweet 2): Why not get together with a couple of hackers and show how your ideas for a linked data infrastructure could enable nice apps and services for commuters. Great, I tweeted back -- but, I don't know that many great hackers as it's ten years since I did my research on mobile applications.

So, now I am looking for some great hackers to potentially explore my ideas on Linked Open Transportation Data at the TravelHack event 8-9 October.


Give every bus stop, tram route and train station etc. a URI
Identify "things" globally by using http based URIs (Uniform Resource Identifiers) - today all public schools, roads, ministers, and many bus stops, in UK have URIs

For example the URI http://transport.data.gov.uk/id/stop-point/1800SJH1081 identifies a bus stop in Manchester. Assigning a http based URI is what the two first principles of Linked Data say. 

The third principle say that you should provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML. So, if you you put this URI http://transport.data.gov.uk/id/stop-point/1800SJH1081  in a web browser it will give you a nice html documentation of the metadata describing the busstop. A app or service could choose between for example a RDF/XML file or a JSON file.  See my Linked Data page for some nice videos, books, blogs etc.

Use a common vocabulary for transportation
And all these "things" can also be typed, described and linked using classes, properties and relationships from a range of vocabularies for different domains. 

For the transportation domain I have seen some nice tweets pointing me to TRANSIT: A vocabulary for describing transit systems and routes

There are also many general vocabularies and ontologies that are commonly used to publish linked data. You can 'cherry-pick' from some of the most common, for example Friend-of-a-Friend (FOAF) provides terms for describing people and their social network, SIOC Semantically-Interlinked Online Communities, and Dublin Core defines general metadata attributes.

Kudos to @peterkz_swe, @egonwillighagen, @wieselgren, @kallep
for nice interactions on Twitter inspiring me to write this blog post

Thursday, August 18, 2011

A prediction 3-5 years from now

Making predictions can be tricky. However, a former colleague, and actually also my manager for a short time, Jean-Peter Fendrich (@carokanns) recently published a few predictions 3-5 years from now in the LinkedIn group Volvo IT Innovation Centre
  • Inspired by iPad and its competitors there will come a new device that replaces the Laptop as we know it now. 
  • Html5 will make all these app's and app technologies obsolete. 
  • We will finally have standards and infrastructure that support "mobile wallet" - replacing cash, credit cards and other payment systems.

JP asked for feedback and more predictions, so I posted the following:

As JP and I, together with Martin Börjesson (@futuramb), Annika Eriksson, Christian Forsäng  and Else-Marie (Emma) Malmek, were some of the folks introducing the first generation of web technology (Web 1.0) in the Volvo organisation back in the mid 90ies it was nice to highlight the third generation (Web 3.0) in this Volvo IT group.

The focus in my blog postings and tweets the last year or so has been on two of the fundaments for Web 3.0, i.e. the Linked Data principles and in particular the use of http based URIs. For more details, see one of my first blog posts: Corporate Transparency and Linked Data. See also my list on URI Design that I try to keep updated.

"Data is the new electricity. URIs are the conduction mechanism."
Quote by Kingsley Uyi Idehen (@kidehen)
  

Tuesday, August 9, 2011

ICBO2011 Reports

The last week in July I and three colleagues attended the International Conference on Biomedical Ontology (ICBO) 2011, in Buffalo, NY. As I have been a "remote hang-around" on Twitter following other conferences on distance (see for example my blog post following the SemTech conference earlier this summer) it was great fun this time to be active on Twitter IRL in Buffalo:  My #ICBO2011 tweets


And yes, I did see the Niagara Falls again -- this time I did get really close to them on a boat tour with the "Maid of the Mist".


Now, after a long journey home, and a couple of relaxing days on the Swedish west coast and in central London, it's time to use my tweets, the conference presentations and proceedings (pdf) to pull together some of my insights and learnings. Here's my first report with some notes and reflections from the conference and follow up to my previous blog posts in preparation for the conference (part 1 and part 2). See also my fourth blog post from ICBO published 1 September.


High quality, "true", ontologies 
It was nice to see presentations and read papers on ontologies from a broad spectrum of domains, such as:
  • Genes
    See a recent paper: How the Gene Ontology Evolves, describing the ways in which curators of the Gene Ontology (GO) have incorporated new knowledge. 
  • Protein complex and supra-complex
    See the presentation on this topic in the panel the first day: From proteins to diseases, by Bill Crosby (Department of Biological Sciences, University of Windsor)
  • Emotions and Chronic pain
    See the presentation and paper on how to represent emotions based on research in affective disorders such as bipolar, depression and schizoaffective disorder, by Janna Hastings, (European Bioinformatics Institute, UK, and, Swiss Centre for Affective Sciences, University of Geneva, Switzerland). See also the announcement of the development of an ontology for Chronic pain and a nice video: Toward a New Vocabulary of Pain.
  • Demographics
    See the presentation describing how "demographic data in current information systems is ad hoc, and current standards are insufficient to support accurate capture and exchange of demographic data", and the proposed use of the Demographics Application Ontology to as a solution. 
  • Adverse Events
    In the workshop on representing adverse events we learned about interesting work on adverse ontologies. (See a video of the workshop organizer MĂ©lanie Courtot: Towards an Adverse Event Reporting Ontology). We also learned about the development of ontologies to represent temporal relationships (e.g. Clinical Narrative Temporal Relation Ontology) which is a key aspect in handling safety issues and regular ongoing pharmacovigilance in pharmaceutical research and development.
All of these are examples of high quality "true"1) and modular ontologies developed beneath the Basic Formal Ontology (BFO) providing formal definitions for types of entities in reality and for the relationships between such entities (so called ontological realism). Such ontologies are designed to allow annotations of experimental and clinical data "to be unified through  disambiguation of the terms employed in a way  that allows complex statistical and other  analyses to be performed which lead to the  computational discovery of novel insights"2)


My own reflections: 
So far we have seen none, or very little, uptake of such high quality "true" ontologies for clinical data. Something I also highlighted in my earlier blog post on clinical data standards.  In a coming blog post I will present a demo using the Demographics Application Ontology showing how a high quality "true" ontology can be used to support accurate capture and exchange of demographic data. I will also outline some ideas on how this could be used also for clinical study data (CRF:s and databases). 

"Mapping mania" for the legacy of terminologies
A common theme in several of the presentations, papers and panels was the mappings (matching, alignment) needed between terms and concepts organized as terminologies and coding nomenclatures, such as SNOMED CT, LOINC, ICD, CDISC SDTM CT:s (derived from NCI Thesaurus), and MedDRA. Here are some examples:
  • Extraction of the anatomy value set from SNOMED CT to be reused for the 11th revision of the International Classification of Diseases (ICD-11). See a presentation on the problems and proposed patterns by some well known people (Harold Solbrig and Christopher Chute at Mayo Clinic, Kent Spackman working for IHTSDO, and Alan L. Rector at University of Manchester)
  • The Ontology Evaluation Alignment Initiative (OAEI) was mentioned by several presenters as a forum to discuss the problems of direct matching between different terminological resources.
  • The use of a ontology matching tool called AgreementMaker was presented.
  • In a panel on: National Center for Biomedical Ontology (NCBO) Technology in Support of Clinical and Translational Science, the basic lexical term mappings was mentioned as an example of a service available both via BioPortal's graphical interface and as REST services.
These are all example of a legacy already in use, or in the process of being used, for the annotations of EHR, clinical trials and patient safety data. For example for the huge US initiative on meaningful use of EHR as highlighted by Roberto Roch in his keynote on Practical Applications of Ontologies in Clinical Systems.


My own reflections:
In my previous blog post preparing for the conference I refereed to the mapping problem as  comparing "Apples and Oranges" and sometimes I think of it as a "mapping mania". In the conference I did hear the comment "Mappings are hard" several times,  and also the question "Who will create, validate and maintain all the mappings?


After some more days of vacation I will get back later on in August with more notes and reflections from the conference:.
  • I will report from the debate on how to accurately connect data from measurements and questionnaires (information entities) to ontologies (real world entities). I think this is a key aspect to get machine-processable clinical data ready for automatic transformation and direct querying, and ready for inferencing and reasoning. 
  • Another theme I would like to cover is referent tracking, i.e. assign globally unique identifiers for each entity in reality about which information is stored. For example diagnoses, procedures, demographics, encounters, hypersensitivity, and observations as they are reported in EHRs. This is something I think is a key enabler for accurate secondary use of EHRs.

Friday, July 22, 2011

ICBO2011 Preparations, part two

Via the email lists for the Clinical Data Interchange Consortium (CDISC) Terminology team, and for the Electronic Health Records for Clinical Research (EHR4CR) one of the Innovative Medicines Initiative (IMI) project, I have see some recent discussions on cross-terminology mapping challenges. Challenges  due to the fact that terminologies and coding nomenclatures, such as SNOMED CT, LOINC, CDISC SDTM CT:s, and MedDRA, all have been developed for different purposes, with disparate approaches and structures.

Together with attendances from NCI, NCBO, FDA, Mayo, SAS, Stanford and other organizations, I and a few colleagues, will attend the International Conference on Biomedical Ontology (ICBO) next week . See my previous blogpost with some more background. 


Photo (Flickr): Automania

Apples and Oranges
In preparations for the workshop the first day, Representing Adverse Events, I did find this paper highly interesting as it compare and contrast SNOMED CT and MedDRA, and also describes the challenges in mapping between them: Heterogeneous but “standard” coding systems for adverse events: Issues in achieving interoperability between apples and oranges.



OBO Foundry based ontologies as "catalyst"
I hope the adverse event workshop, and the whole ICBO event, will be an opportunity for me to learn more about the Open Biology and Biomedical Ontologies (OBO) Foundry approach, and to discuss the challenges and opportunities in a “common language with which to energize cross-disciplinary research1)


I hope to better understand “how legacy terminologies, such as SNOMED CT, and the data coded with their aid can be successfully used for information-driven clinical and translational research2). My understanding is that the approach to be discussed at this event is the use of OBO Foundry based high-level reference ontologies, such as the Ontology for General Medical Science (OGMS), as a kind of catalyst instead of direct terminology-to-terminology mappings.


Yet another "standard", or ...
At the same time I did find this cartoon, circulating on Twitter this week, quite amusing. So, I think it will be a hot and interesting week in Buffalo, NY..
xkcd: Standards
Here's a brief introduction to the use case I and a colleague will present at the adverse event workshop:
"A use case will be presented describing how a query from a regulatory authority is handled as part of the regular ongoing pharmacovigilance in pharmaceutical research and development. It will illustrate how databases and literature are being reviewed manually, exemplify how different databases are structured and highlight some of issues in the coding of data. With this use case, we hope to provide a background to our interest in an ontologically based approach to enable a more automatic way to access, structure and analyze patient safety related data."

Tuesday, June 28, 2011

ICBO2011 Preparations

In a couple of weeks I will attend the International Conference on Biomedical Ontology (ICBO) 2011, in Buffalo, NY. 
In July, hundreds of international scientists from dozens of biomedical fields will meet at the University at Buffalo seeking a common language with which to energize cross-disciplinary research.“ From ICBO News: For the Sake of Research and Patient Care, Scientists Must Find Common Language
And yes, it will be a great opportunity for me to see the Niagara Falls again. This time  from the American side. Last time I saw it was in 1999 from the Canadian side when I attended the W3C conference in Toronto. The WWW8 conference where I was absolutely thrilled by the power of the simple and elegant model of RDF triples. At the WWW8 I also heard Tim Berners-Lee talk about the Semantic Web for the first time.

The coming weeks I hope to able to do a re-cap of a couple of ontology related papers and articles, and also read and digest some new ones listed for the events I have signed up for:

I will use one or two forthcoming blog posts to write up my insights and reflections coming to my mind while reading.

Here's a quote I think well captures my motivation to learn more about ontologies and getting my ICBO2011 attendance approved by my managers. It's taken from this great article More than Words: Biomedical Ontologies with references to the work of several of the international scientists who will get together at the ICBO2011.
“… true ontologies are more than just controlled terms. They capture, in a logical, systematic way, what scientists regard as the basic truths about a topic. Like equations in physics or axioms in mathematics, they can even be the basis for computational models. When connected to databases, scientific papers, and software applications, ontologies ‘help cope with the ever-growing, chaotic accumulation of text and facts" in biomedical and translational research.“

Sunday, June 12, 2011

SemTech2011

The last couple of days the Twitter feeds for #semanticweb and #linkeddata have been very busy and #semtech peaked with more than one tweet per minute during the Semantic Technology Conference 2011 in San Fransisco 5-9 June.
See the #SemTech 2011 Twitterscript for agreat overview of all the #semtech tweets sent during the conference, aligned with the sessions going on at the time. Kudos to  @glenn_mcdonald and @needlebase.
For me, here over in Sweden,  it's been a couple of late evenings and some busy mornings catching up on Twitter while commuting. Below some of the presentations, discussions, and blogs I did find extra interesting.

schema.org
A couple of days before the conference the news came out on Twitter about the announcement from Google, Yahoo and Microsoft (Bing) on their joint schema.org. A global, single vocabulary and the use of Microdata to encode structured data into web-pages using this vocabulary for search engines to do a better job.
A graph centric visualization of the schema.org vocabulary
with "Thing" in the center of it


The first comment I re-tweeted as a "I Liked" on this topic was a tweet on Friday 5 June by Darin L. Stewart (@darinlstewart) pointing to his posting on Gartner's blog: Schema.org: Webmaster One-Stop or Linked Data Land Grab? With some early critique. At the same time came the first version of a RDF Schema version of the vocabulary on schema.rdfs.org. Great job done by Michael Hausenblas (@MHausenblas) et al.. And I did find it interesting to read the quick, positive comment from Chris Bizer, the Linked Data guru behind DBpedia, on Google's official webmaster blog. During the conference schema.org was also the *hot* topic and late Wednesday evening my time I followed a heated online IRC discussion from the BOF on structured data in HTML and vocabularies. For more reading on this topic see the link bundle called schema.org is in town compiled by Michael Hasheke (@hashek)

Linked Data Tutorial and Cookbook
Among all the tutorials and presentations at the conference I picked up two great Linked Data resources, First of all Juan Sequeda's (@juansequeda) tutorial series, and also  a presentation "I liked, very much"- The Joy of Data - A cookbook for publishing and consuming Linked Data by Bernadette Hyland (@BernHylland). These two triggered me to create a separate Linked Data Resource Page with my favorites, including these two.

Linked Health Data
The last day of the conference I spotted some tweets that toke me to the presentation I liked most of all: Clinical quality linked data on health.data.gov, presented by George Thomas (@georgethomas). See also his blog post on data.gov with an excellent argumentation for linking  publicly available health data such as hospital compare data:
In addition to making flatfiles available to download on the Web, and providing applications that enable programmatic access to backend databases through the Web, imagine using the Web itself as a database: a massively distributed, decentralized database. This is what Linked Data is about – putting data in the Web.

Two technologies to catch up with
Many tweets talked two Calimachus, a framework for data-driven applications based on Linked Data principles allowing Web authors to quickly and easily create semantically-enabled Web applications. I will have a look at the Calimaschus videos they published. And a presentation on Semantic Architecture & Composing Resource Oriented System, by Brian Sletten (@bsletten), made me curios to learn more about the architecture thinking called  NetKernel.

Other blog posts 
I look forward to read several reflective blog posts the coming week when the participants are back home. For example, I look forward to see what Darin L. Stewart (@darinlstewart) will report from SemTech 2011 on his Gartner blog. I will update this blog post with links to what I find interesting.  

Monday, May 2, 2011

Linking Clinical Data Standards

This is a follow-up to an earlier blog post where I outlined the background, audiance and intention of three presentations. Two of them have been published on Slideshare:
Here I focus on the second presentation, a presentation I did in the CDISC (Clinical Data Interchange Standards Consortium) conference in Brussels recently. One of the key people in the CDISC community, Dave Iberson-Hurst, lists semantic web as one of three themes and kindly refers to my presentation in a recent blog post

My presentation, and also a very nice presentation from Roche, triggered interesting questions. Questions both on what I proposed as pragmatic first steps for linking clinical data standards, and also on what I see as future opportunities. Below you find the questions and my "answers", or rather thoughts. In a coming blog post I will discuss what all of this could mean for CDISC SHARE (metadata repository).

In my presentation - the last one on the first day - I  urged the CDISC community to consider the use of semantic web standards and linked data principles for clinical data standards. It was very nice to be able to refer back to two of the presentations in the earlier sessions. 

Pragmatic steps for CDISC
Firstly, to the presentation by Rebecca Kush, President of CDISC, on the value of open and free standards. The key message in my presentation pointed out:




Roche use Semantic Web for clinical data standards
And secondly, to the presentation from Roche on the development of a "Global Data Standard Repository" (GDSR) using semantic web standards and a ontology tool (TopBraid Composer). My first slides introducing the idea of "Triples" (the RDF standard model) and "Global Identifiers" (URI:s) was a recap for the audience as Frederik Malfait (IMOS Consulting presenting on behalf of Roche) in a really good way already had introduced these. 

Questions and Answers
Even though it was the last presentation for the day (just before  a very nice evening with TinTin at the Brussels Comic Strip Center) many people stayed around and I got the opportunity to sort out a key question, and also to outline two future opportunities: 
Q: Do you mean we should publish the actual clinical data openly? 
A: No! What should be made publicly available is another topic. My key message is that the free and open clinical data standards as they are currently constructed should be made available as linked open clinical data standards 1]. This means, using semantic web standards. (I propose the use of RDF/XML format as an alternative to Excel and ODM/XML.) And, also applying the Linked Data principles. (For example, assigning URI:s as global identifiers as an alternative to text strings for the submission values.)
Q: Does this relates to ontologies for bioinformatics?
A:
Yes. The insights from developing for example the Gene Ontology are highly applicable when representing and structuring the entities and relations in the clinical reality. In some extra slides to my presentation I propose explorative work to construct the next generation of clinical data standards using modern ontologies 2] based on the so called Open Biological and Biomedical Ontologies (OBO) Foundry.
Q: Do you mean that this would take away the need for manual transformation of clinical data?
A:
Yes and No.
Yes, because the above outlined next generation of clinical data standards (i.e. using semantic web standards, applying linked data principles and being based on modern ontologies) would improve the research utility of clinical datasets. That is, firstly, a very normalized, flexible way to convey clinical data. And, secondly, machine-processable clinical data ready for automatic transformation and direct querying, and ready for inferencing and reasoning.
No, because existing data needs to be transformed according to the above. And, No for quite some time as there are many things to explore and learn. A  highly pragmatic, incremental and stepwise approach is required 3] 

1]  
See my presentation slide 31-36 for more details on the pragmatic steps I propose for CDISC, and NCI.
2]  The two OBO Foundry based ontologies I am referring to are the Translational Medicine Ontology, TMO (a.k.a. the Pharma Ontology) and the Computer-Based Patient Record (CPR) Ontology. See also an excellent article on biomedical ontologies: More Than Words, in the Clinical and Translational Science Network.
 

Kudos to Frederik Malfait and Jonathan Chainey (Roche), 
Dave Iberson-Hurst (@Assero_UK),
Bron Kisler (@CDISC), Philippe Verplancke and
Isabelle de Zegher 
 for great discussions F2F in Brussels.

Monday, March 21, 2011

When will we see the first data.xyz.com?



"http://data.xyz.com is the home of our open linked data"
                   Say the CIO of Corporation XYZ
When will we see such an announce from a corporation?

I really liked the tweet today from Milton Keynes, UK (@mdaquin) pointing me to data.open.ac.uk, that is the home of open linked data from The Open University.

I would love to see an announcement from a corporation with high ambitions on corporate transparency and understanding of the value of sharing of pre-competitive data.  With a CIO with good insights on open data and linked data principles. A corporation that clearly state the applied open license (such as PDDL, ODC-by or CC0), and also have earned a 5 star ranking (see Linked Data star scheme by example)

Or, does this already exist? Let me know if you know of something similar in an enterprise context.

For more information about the benefits on Linked Data, see a nice blog post by Stuart Brown (@stuartbrown) on the LUCERO Project, Linking University Content for Education and Research Online, blog. See also my previous post on Corporate Transparency and Linked Data.

Sunday, March 13, 2011

Three presentations

The coming two weeks I'll be working on presentations for three events I have got the opportunity to participate in. I will use this blog post as a way to shape my thinking and a new blog post when developing the slides and manuscripts.
  1. Linked Data in Pharma
    A brief presentation of a short paper we have got accepted for the first international workshop on linked web data management in Uppsala, 25 March. The title of the paper is; Linked Data, an opportunity to mitigate complexity in pharmaceutical research and development (link to be added). I have written it together with my colleague Bosse Andersson.
  2. Semantics for Clinical Data
    Some reflections on different approaches to provide semantics for clinical data to be discussed in the EBI Industry Workshop on Biomedical Data and Model Interoperability 
in Cambridge, 28-29 March.
  3. Linked Clinical Data
    An introduction to Linked Data principles and pragmatic examples for the CDISC Interchange Europe 2011 conference in Brussels, 13-14 April.
I did find it hard to start working on this with all the terrible news on what is happening in Japan just now. Kudos to Jim Hendler and Ivan Herman for their tweets today on the power of linked open data with an interactive map using open earthquake data.
See Ivan Herman's blog post

Background, Audiences and Intentions 
Some brief notes on the background to my participation in the three events, and also on what I know about the audiences, and my intentions with what I will to talk about.

1.  Linked Data in Pharma
The first one is an event I learned about on the Twitter feed for #linkeddata. It's a workshop on linked data management arranged in conjunction with a conference on database technology. We saw this as an opportunity to go to a workshop here in Sweden on this interesting topic. We decided to re-write an article from last year for an internal publication to describe some insights from working in the W3C interest group for semantic web in Health Care and Life Science (HCLS), and in the Large Knowledge Collider (LarKC) EU-project.

The article we started from had an intended audience of colleagues in a pharma company with no knowledge of the standards and principles behind the huge cloud of linked open data. 
The Linking Open Data cloud diagram
While the participants in the workshop will be highly knowledgeable researchers and practitioners in linked data management. My hope is that we during 2011 will have more internal experiences to report from in an extended paper as the linked data idea now also get a lot of interest internally.

2.  Semantics for Clinical Data
The second event is the result of interactions we have had with Bernhard de Bono, leading the Drug Disease Modeling Resources (DDMoRe) one of the projects in Innovative Medicines Initiative (IMI). I  met Bernard in an EBI industry workshop on ontology engineering last year and we talked about existing metadata standards for clinical data and the opportunities in ontology based annotations of clinical data.

The list of attendees includes people from many of the European pharma companies and also from research centers such as EBI and INSERM. I assume many of the them work in the pre-clinical / drug discovery phase and have a bioinformatics focus, so together with the people from CDISC I hope to to be able to add a clinical perspective.

My contribution will be some reflections on different approaches to provide semantics along with clinical data. As it has been done when a lot of the semantics, that is the knowledge on what clinical data represents, have been implicit and carried by people and documents  And how semantics now is made explicit for humans as standardized data exchange containers, e.g. the CDISC SDTM domain for Lab test data, and as text strings of standardized codes and labels, so called controlled terminologies e.g. the list of lab test procedure codes, to simplify the programming to transform, integrate and analyze data. By linking to Bernard's presentation on the RICORDO 2] toolkit for semantic integration of biomedical resources I will outline how clinical data can be annotated with ontology based standards making the semantics explicit using formal and machine processable formats. I will also briefly talk about how clinical metadata registries could be used to support ontology based annotation.

3.  Linked Clinical Data 
The abstract I proposed for the third event was triggered by the frustration I interpreted from the FDA representatives at CDISC Interchange US in 2009. And a follow-up to the brief discussions I had with some of the CDISC folks on linked data principles and semantic web standards. Here is how Jay Levin, expressed it in the FDA panel in November 2009:
We want to separate the analysis view from how clinical data is exchange. To have a very normalized, flexible way to convey the data as it actually was collected, as it occurred. And than from that create any number of disease area specific views and analysis specific views. You have tremendous options. So, instead of being looked into this difficult dance that I see happening with SDTM then you always try to decide how useful it’s going to be for correct analysis vs. how consistent it could be if you free up the potential ways data can be represented for disease specific areas. 1]
In my presentation I want to provide show examples of RDF  data model (triples) as such a "very normalized, flexible way to convey the data" (see also my comments on this blog post Wondering why the FDA hasn't more actively promoted CDISC standards). I'll also share the good news on how linked data principles now are applied by key players such as the UK and US governments, as described in my first blog post on The Open Government Data Movement. And also use the practical example of how RDF triples of linked data look like using the payment example from a local authority in UK that I also used in  my previous blog post on publishing linked data.
 
five star open Web data




My key message will be some proposed pragmatic steps for how the CDISC standards can be published using the 5-star rating scheme for linked open data described in my second blog post.

The title of the CDISC track is "eHRs and the World Beyond", and Patient Controlled Health Records (PCHR) or Personal Heath Records (PHR) e.g. Google Health, could be the next big thing. So, I will also as food-for-thought include a slide from the explorative work we do on leveraging semantics developed for PCHR also for clinical research data. That is, the Computer-Based Patient Record (CPR) ontology developed by Chimezie Ogbuji, Case Western Reserve University's Center for Clinical Investigation, previously Cleveland Clinics.


1]  Jay Levin refereed to the HL7 standards as a the "normalized, flexible way". He and others from FDA earlier in 2009 did some initial statement on moving from CDISC's SDTM standards to HL7's CDA (Clincial Document Architecture) standard for submissions of clinical data. This was not well received by CDISC, nor by the representatives from pharma and CRO companies. During 2010 FDA and CDISC came to a common agreement on CDISC SDTM. (That is, the 40+ different container with standardized variable names, and the evolving controlled terminologies.) See two posts on CDISC's blog: Clear Messages from FDA CDER and CBER and FDA CDER Data Standards Plan V 1.0 and PDUFA IV IT Plan Update
2] Researching Interoperability using Core Reference Datasets and Ontologies for the Virtual Physiological Human (RICORDO)