Extending Fedora

One of the technology developments of the Project has been a set of Java Libraries to extend some of the semantic functionalities of Fedora without modifying the distribution itself. This way people can use a standard Fedora installation and use the extension Java libraries. On the other hand, using the extension libraries requires to use Fedora out-of-the-box distribution combined with a standalone Mulgara installation instead of using the embedded version which is shipped with the repository.

Brief overview of Fedora RI

Fedora provides different data indexing mechanisms such as Dublin Core metadata indexing, full text index using search engines such as Solr (based on Lucene) or Apache Lucene, among others. We are particularly interested in the Resource Index (RI). The RI uses Mulgara triplestore as the indexes database and it allows the creation of indexes from various types of information in the repository: Dublin Core annotations, object relationships information and even full text indexing in the form of RDF graphs.

This module enables to express object relationships, based on Fedora Object Model, in a machine-readable way so that the relationship information can then be indexed in the triplestore. This type of information builds on RDF standards and vocabularies developed within the Semantic Web community such as RDF Schema (RDFS) but it also includes its own relationships ontology, developed as an extension of the RDFS ontology, which enables to express more Fedora specific collection/resources relationships such as ‘isMemberOf’, ‘hasMember’, ‘isAnnotationOf’, etc.

Fedora includes a Search interface that uses the RI functionality and the user can then query the triplestore to obtain information from the repository in various output formats such as n3, spo (subject, predicate, object) or RDF.

Example of a simple Sparql query to retrieve from the RI RDF statements from Dublin Core Metadata annotations included in the repository

	--Will create RDF description records
	--containing dc:title and dc:creator statements
	--from all the objects in the repository
	construct {
		?s <dc:title> ?o;
		   <dc:creator> ?o1.
	}
	from <http://my-fedora-repository#ri>
	where {
	  ?s <dc:title> ?o.
	  ?s <dc:creator> ?o1.
	}

Example using Fedora RI Search Interface

Multiple output format can be selected: RDF/XML, N3, Turtle, etc.

Double Loop Extension

Fedora RI is a very good starting point for those interested in using triplestores and exposing their data in semantic-ready formats. It is also a good search mechanism but one of the limitations that it presents is that only allows to index Dublin Core Metadata records and the objects relationships information. This is one of the main aims for developing our libraries extension, which adds the possibility of indexing the previously mentioned sets of information but also enables to add other types of information present in the repository such as metadata annotations from other standards (e.g. DCMI Terms) or even complete datasets available in suitable formats such as RDF.

The approach implemented keeps the significant functionalities of the Fedora RI but fulfils the present limitations such as the management of semantic-ready data by its aggregation into the triplestore. Furthermore, by using a standalone instance of Mulgara we are able to aggregate in the triplestore new datasets, to query our datasets coming from the repository combined with data coming from other sources and lastly, we can use inferencing mechanisms to add new statements into the datastore by using multiple inference engines such as the ones provided by Mulgara or third party ones like Jena framework.