The Metropolitan Museum of Art's CIDOC Mapping Project
From SEMUSE
Contents |
Introduction
The Metropolitan Museum of Art is working on an experimental project, to evaluate the usage of semantic technologies for collection information management.
One major aspect, which I'll cover here, is mapping our existing data structures to an owl ontology like the CIDOC CRM (owl implementation).
Background
Most of our data is stored within an application that is wrapped around a Relational Database (SQL Server). We have some institutional experience extracting data directly from this data structure, so it's not strictly a "black box" application.
The Implementation
D2RQ
The first step is getting the data out of the SQL database, and into a rough RDF format. We are using D2R server for this purpose. D2R server relies on mapping files, which define how to turn tables and columns into rdf classes and properties, and the turn table rows and values into instances of those classes and properties. I'll not get into the details of how it works (see the D2r pages), but I'll go into our implementation experience here.
- Implementation experience here
- Memory Issues
- uriPattern
- namespaces
To do this implementation, we create a series of mapping files, one for each source table. I'll be posting those files up here as I complete them
- TODO : post D2R files here
Sesame
Sesame beta2.2 with a mulgara data store, is what we're currently using for a triple store. This may very well change in the future.
TopBraid Composer
We're using TopBraid Composer, Maestro Edition (TBC) for our knowledge management tasks. TBC is an IDE for RDF-based knowledge/ontology management. In this section I'll report on issues/bugs/workarounds, as well as best practices as I've experienced them, for this particular task.
- Namespaces:
- s2r file (connect to remote repository)
- inferencing
- SPARLQMotion
SPARQL Constructs
Using SPARQL CONSTRUCT statements, you can create new triples, based on the presence and values of existing triples. I think this is a pretty clear way to do the mappings from the rough rdf to the CIDOC target ontology.
Fortunately, the CIDOC ontology has been broken down into sections (I'll call them "modules" for fun), which show how to express particular concepts with CIDOC: http://139.91.183.17:81/tiki/tiki-view_faq.php?faqId=13
So, I'll be breaking down the mapping process into corresponding modules. Each mapping module will consist of a series of SPARQL Construct statements. I'll be posting those mapping modules here.
- TODO : post mapping modules.
Other Fun Stuff
- Using Google API to turn city/state/country information into geo:lat and geo:long
- TODO : post SPARQLMotion script
