Friday, September 13, 2013

Justifications of Mappings

A common theme in the Semantic Trilogy events in Montreal this summer (see Semantic Trilogy preparations and Semantic Trilogy report part 1) was mappings such as the mappings provided via the NCBO BioPortal

For example the mappings in the Bioportal expressed as skos:closeMatch are the result of using the LOOM lexical algorithm. Examples of not so good mappings, such as this one, were highlighted:

<NCI Thesaurus: Chairperson (subclass to Person)> 
<skos:closeMatch> 
<Int. Classification for Patient Safety: Chair (subclass to Piece of Furniture)>

One view was: ‘Don’t use them!’ (tweet). Another view was “Give us the justification of the mappings so we can decide when it makes sense to use them.”

Mappings in chemical informatics

When I came back from the Semantic Trilogy and read about mappings, or linksets as they are called, in the new version of the Open PHACTS specification "Dataset Descriptions for the Open Pharmacological Space" I saw some opportunities to make mappings more explicit and hence more useful.

I think the editor, Alasdair Gray (@gray_alasdair), and the whole team of authors, have done a great job on this specification.
"The Dataset Descriptions for the Open Pharmacological Space is a specification for the metadata to described datasets, and the linksets that relate them, to enable their use within the Open PHACTS discovery platform. The specification defines the metadata properties that are expected to describe datasets and linksets; detailing the creation and publication of the dataset."
I especially liked the part on making the justification of mappings explicit. For example, what is the justification behind stating that there is a close match (skos:closeMatch), or exact match (skos:exactMatch), between what is described in two different chemical datasets, such as the RDF datasets sourced from ChemSpider and ChEMBL.

The figure depicts four distinct linksets: two sourced from ChemSpider
depicted in blue which use different link predicates; one sourced from ChEMBL
depicted in red; and one sourced from a third party depicted in green.
My understanding is that for the chemical informatics community the Open PHACTS specification will establish a vocabulary to express the justifications for links/mappings between chemical entities. This enables them to explicitly state justifications such as "Has isotopically unspecified parent" or "Have the same InChI key" (see B.2 Link Justification Vocabulary Terms to also get the URIs for these terms).

Mappings between medical terminologies

Together with members of the EU projects EHR4CR and SALUS, MedDRA MSSO, and W3C HCLS, I am now exploring the idea of establishing a similar approach for the medical terminology community. That is, a vocabulary of terms to express the justifications for different mappings between concepts/terms in terminologies across healthcare and clinical research, such as ICD9, SNOMED CT and MedDRA.

This is part of a broader discussion on the use of terminologies in semantic web focused environments, with formal representations in RDF of both the terminologies themselves and of the mappings between them. Here's an example of a visualization from such a formal representations of MedDRA and SNOMED-CT terms and mappings between them in SKOS/RDF.

The example show the hierarchy of cardiac disorders in both the MedDRA and
SNOMED-CT concept schemes, expressed using the skos:broader property. Mappings between
similar concepts in both concept schemes are stated using the skos:exactMatch property.
From:
SALUS Harmonized Ontology for Post Market Safety Studies