The Metropolitan Museum of Art's CIDOC Mapping Project

From SEMUSE

Jump to: navigation, search

Contents

Introduction

The Metropolitan Museum of Art is working on an experimental project, to evaluate the usage of semantic technologies for collection information management.


One major aspect, which I'll cover here, is mapping our existing data structures to an owl ontology like the CIDOC CRM (owl implementation).

Background

Most of our data is stored within an application that is wrapped around a Relational Database (SQL Server). We have some institutional experience extracting data directly from this data structure, so it's not strictly a "black box" application.

The Implementation

D2RQ

The first step is getting the data out of the SQL database, and into a rough RDF format. We are using D2R server for this purpose. D2R server relies on mapping files, which define how to turn tables and columns into rdf classes and properties, and the turn table rows and values into instances of those classes and properties. I'll not get into the details of how it works (see the D2r pages), but I'll go into our implementation experience here.

  • Implementation experience here
  • Memory Issues
  • uriPattern
  • namespaces

To do this implementation, we create a series of mapping files, one for each source table. I'll be posting those files up here as I complete them

  • TODO : post D2R files here


Sesame

Sesame beta2.2 with a mulgara data store, is what we're currently using for a triple store. This may very well change in the future.

TopBraid Composer

We're using TopBraid Composer, Maestro Edition (TBC) for our knowledge management tasks. TBC is an IDE for RDF-based knowledge/ontology management. In this section I'll report on issues/bugs/workarounds, as well as best practices as I've experienced them, for this particular task.

  • Namespaces:
  • s2r file (connect to remote repository)
  • inferencing
  • SPARLQMotion


SPARQL Constructs

Using SPARQL CONSTRUCT statements, you can create new triples, based on the presence and values of existing triples. I think this is a pretty clear way to do the mappings from the rough rdf to the CIDOC target ontology.

Fortunately, the CIDOC ontology has been broken down into sections (I'll call them "modules" for fun), which show how to express particular concepts with CIDOC: http://139.91.183.17:81/tiki/tiki-view_faq.php?faqId=13

So, I'll be breaking down the mapping process into corresponding modules. Each mapping module will consist of a series of SPARQL Construct statements. I'll be posting those mapping modules here.

  • TODO : post mapping modules.


Other Fun Stuff

  • Using Google API to turn city/state/country information into geo:lat and geo:long
    • TODO : post SPARQLMotion script
Personal tools