Muninn Documents Ontology Specification - 0.01

The Muninn ontology to specify and work with documents.

Working Draft — 19 February 2012

This version:
http://rdf.muninn-project.org/ontologies/documents-0.01.html (owl)
Latest version:
http://rdf.muninn-project.org/ontologies/documents.html (owl)
Last Update: 0.01
Date: 19 February 2012
Authors:
Robert Warren, The Muninn Project

Abstract

This specification defines a set of classes and properties used to represent document and their structure.

Status of this Document

This document is based on the current practices used by The Muninn Project to represent documents and digitally scanned images within the catalog.

Table of Contents

  1. Introduction
    1. Ontology Objectives
    2. Ontology Limitations
  2. Graves ontological constructs at a glance
  3. Ontology use cases
  4. Cross-reference for Graves classes and properties
  5. Conclusion and future work

Introduction

This project was created as a result of some of the data and modeling problems encountered in the Muninn WW1 Project. The ontology records the bibliographic, provenance and digitization infromation of archival documents. It provides classes that are subclasses of FOAF for better compatibility while adding some needed functionality to manage documents.

The major differences with core FOAF classes has to do with properties that support document pages, digital representation of each of the pages, digital rights support and support for forms.

Ontology Objectives

The initial design objectives for the ontology were:

  1. The ability to markup a document with the minimum amount of structure required for analysis.
  2. Provide some provenance hooks for the source archive and re-distribution infromation.
  3. Provide linkages between the original document and its transcribed and transformed form.
  4. The ability to identify certain pages of the document as a filled out form.

Ontology Limitations

The ontology is a limited representation of the full document structure in that it represents the physical parts of the document and not its logical organization. The properties for a page reflect this in its previous_page and next_page organisation.

Ontological constructs at a glance

Class organization revolves around a Document class that is the union of the FOAF Document class and the Creative Commons Works class for maximal compatibility. A Page class provides a container for each page of the document who's digitized image is in turn provided by an Image subclassed from the FOAF Image class. The Form class is another subclass of a Document (as some forms tend to be full blow documents themselves) but with additional properties intended to document their use and issuing organization.

The ontology makes extensive use of subclassing because of the different layers of a document whose different representation require multiple layers of meta-data. For example, which some works are clearly in the public domain some archival organization claim copyright on the photograph of the document itself. This allows the representation of situations where the contents of the document have been transcribed / OCR'd from a source image that is not publically available.

Documents, Document Models and Representations.

The markup described within this ontology is a document which models a physical document whose physical pages are represented by a set of images. To avoid confusion we use the term modeled document to reference the complete physical document while a representation references a single digital image of a part of the physical document.

Document copyright, access control and licensing

The handling of multiple sources of documents sometimes requires different levels of access controls for different representation of the document. It is common for a historical document to be owned by a museum, its contents to be out of copyright while the images of the document be copyrighted by the photographer.

This ontology provides the combined properties of FOAF, CC Works and some properties borrowed from Dublin Core. Together they provide a powerful description markup for the different parts of the document. The initial properties on the document are:

These properties are the cornerstone for the definition of the ownership (and authority) over the document. The date information is important in that it defines the begining of the countdown clock after which the copyright expires.

Within the Creative Commons Works class the following properties are available:

The CC Work class makes use of these last two properties which are inverse of each other. The ontology also has an additional property derived from Dublin Core accessRights which combines both properties into a single directive.

Currently the access control and rights properties of the ontology are only used to markup the information. However, a future direction is the use of this information to infer using the reasonner the appropriate distribution and access rights based on the date, location and original licensing of the document.

Classes: Document, Image, Page,

Properties: accessRights, author, back_page, content, date_copyrighted, date_created, date_digitized, date_published, date_retrieved, depiction, description, editor, filled_out_form, format, front_page, location, next_page, pages, previous_page, publisher, size_bytes, source, title, url, x_pixels, y_pixels,

Instances: Australia_Copyright_Act_1912, Australian_Crown_Copyright, British_Copyright_Act_1911, British_Crown_Copyright, Canadian_Crown_Copyright, ForwardToOriginal, ForwardToPublisher, German_Empire_Copyright_Act, Newfoundland_Crown_Copyright, Restricted, US_Copyright_Act_of_1909,

Documents ontology overview

A few examples are presented here

3.1. Example

Here is a very basic document describing a foo:

        <documents:Image rdf:ressource="http://rdf.muninn-project.org/ww1/2011/11/11/image/34">
         <documents:size_bytes>39475</documents:size_bytes;>
         <documents:x_pixels>256</documents:x_pixels>         
         <documents:y_pixels>256</documents:y_pixels>         
         <documents:format>image/png</documents:format>         
         <documents:thumbnail>
          <documents:Image rdf:ressource="http://rdf.muninn-project.org/ww1/2011/11/11/image/34/thumbnail"/>
         <documents:thumbnail>
         <!-- Wikimedia isn't on dbpedia -->
         <documents:source rdf:ressource="http://commons.wikimedia.org/wiki/File:Crystal_Project_My_documents.png"/>
         <documents:date_retrieved rdf:datatype="http://www.w3.org/2001/XMLSchema#date>2012-02-12<documents:date_retrieved>
         <documents:creator rdf:ressource="http://dbpedia.org/page/Everaldo_Coelho"/>
         <documents:license rdf:ressource="http://rdf.muninn-project.org/ontologies/documents#lgpl"/> 
         <documents:date_published rdf:datatype="http://www.w3.org/2001/XMLSchema#date>2007-06-16<documents:date_published/>
         <documents:description>An icon from the Crystal Project icon theme.</documents:description>
         <!-- Webserver will serve copy of picture to anyone. -->
         <documents:accessRights rdf:ressource="http://creativecommons.org/ns#Distribution"/>
        </documents:Image>
      

Muninn Specific Implementation details

The Muninn RDF makes extensive use of header content negotiation in HTTP headers to provide the end user with exactly the type of content that is required. There are a few extensions that are available to force certain behaviours on the part of the server.

RDF representation only

While a sparql server will allways return the appropriate content to a sparql client, it is sometimes convinient to force the server to send only the RDF metadata contents.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/about.rdf
     

Image in original format

While a sparql server will allways return the appropriate content to a sparql client, it is sometimes convinient to force the server to send only the RDF metadata contents.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/image
     

Thumbnail in original format

A 130x100 thumbnail image can be requested in the original format by requesting this specific name.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/thumbnail
     

Image in specified format

The image can be requested in a specific graphics format by requesting it using the appropriate extension (eg: image.jpg for JPG, image.png for PNG, etc...). This conversion is done automatically if the http headers request a specific format.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/image.ext
     

Thumbnail

A 130x100 thumbnail image can be requested in a specific graphics format by requesting it using the appropriate extension (eg: image.jpg for JPG, image.png for PNG, etc...). This conversion is done automatically if the http headers request a specific format.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/thumbnail.ext
     

Cross-reference for classes and properties

Class: documents:Document

URI: http://rdf.muninn-project.org/ontologies/documents#Document

A Digital Document - A digital document made up of 1 or more pages.

sub-class-of:
http://xmlns.com/foaf/spec/Document
in-domain-of:
documents:filled_out_form
documents:pages
documents:title
documents:date_published
documents:date_retrieved
documents:date_digitized
documents:date_created
documents:date_copyrighted
documents:publisher
documents:description
documents:author
documents:editor
documents:source
documents:content
documents:accessRights
in-range-of:
documents:source

No detailed documentation for this term.

[back to top]

Class: documents:Image

URI: http://rdf.muninn-project.org/ontologies/documents#Image

A Digital Image - A digital image

sub-class-of:
http://xmlns.com/foaf/spec/Image
in-domain-of:
documents:format
documents:x_pixels
documents:y_pixels
documents:size_bytes
documents:url
documents:location
in-range-of:
documents:depiction
The Image class is a sub-class of both the FOAF Image class and the Documents Document class which provides the full set of markup properties to each image. This allows the full bibliographic power of the document properties.

[back to top]

Class: documents:Page

URI: http://rdf.muninn-project.org/ontologies/documents#Page

Page - A page of a document, may be double-sided.

sub-class-of:
documents:Document
in-domain-of:
documents:page_number
documents:depiction
documents:back_page
documents:front_page
documents:next_page
documents:previous_page
in-range-of:
documents:content
documents:back_page
documents:front_page
documents:next_page
documents:previous_page
A Page in this context represents a side of a physical sheet of paper. Several properties allow for the navigation from one page to another or from the front or back of the page. The organization of these properties closely mirrors that of the physical document being digitized in order to suppport the analysis of the document.

[back to top]

Property: documents:accessRights

URI: http://rdf.muninn-project.org/ontologies/documents#accessRights

Access Rights - Meant as a side support to the rights property

OWL Type:
ObjectProperty
sub-property-of:
dct:accessRights
Domain:
documents:Document
Range:
cc:Permission

No detailed documentation for this term.

[back to top]

Property: documents:author

URI: http://rdf.muninn-project.org/ontologies/documents#author

Original author(s) of Document. Might have more than one. -

OWL Type:
ObjectProperty
sub-property-of:
dc:creator
Domain:
documents:Document
Range:
documents:owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:back_page

URI: http://rdf.muninn-project.org/ontologies/documents#back_page

Back Page - The side of the physical piece of paper that should be read last.

OWL Type:
ObjectProperty
Domain:
documents:Page
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:content

URI: http://rdf.muninn-project.org/ontologies/documents#content

A page of the document - no ordering on property. -

OWL Type:
ObjectProperty
Domain:
documents:Document
Range:
documents:Page
This property acts as the linkage between the model of the document and one of its digitized representation. There is no ordering to these properties and multiple content properties may exist for the same datatype if more than one representation exists for the same document.

[back to top]

Property: documents:date_copyrighted

URI: http://rdf.muninn-project.org/ontologies/documents#date_copyrighted

Copyrighted Date - Copyright date is synonymous with creation date in most cases. Used primarily to support copyright status and distribution permissions.

OWL Type:
ObjectProperty
sub-property-of:
dct:dateCopyrighted
Domain:
documents:Document
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:date_created

URI: http://rdf.muninn-project.org/ontologies/documents#date_created

Creation Date - Date of this *record* created. (unstable / non-standard)

OWL Type:
ObjectProperty
sub-property-of:
dct:created
Domain:
documents:Document
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:date_digitized

URI: http://rdf.muninn-project.org/ontologies/documents#date_digitized

Digitization Date - Date that this document was digitized from a physical representation.

OWL Type:
ObjectProperty
Domain:
documents:Document
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:date_published

URI: http://rdf.muninn-project.org/ontologies/documents#date_published

Publication Date - Date of publication of the document being modeled.

OWL Type:
ObjectProperty
Domain:
documents:Document
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:date_retrieved

URI: http://rdf.muninn-project.org/ontologies/documents#date_retrieved

Date Retrieved - Date that the modeled document was retrieved from another source (eg: downloaded from a web server).

OWL Type:
ObjectProperty
Domain:
documents:Document
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:depiction

URI: http://rdf.muninn-project.org/ontologies/documents#depiction

Imaged copy of the page. -

OWL Type:
ObjectProperty
sub-property-of:
foaf:depiction
Domain:
documents:Page
Range:
documents:Image

No detailed documentation for this term.

[back to top]

Property: documents:description

URI: http://rdf.muninn-project.org/ontologies/documents#description

Description -

OWL Type:
ObjectProperty
sub-property-of:
dc:description
Domain:
documents:Document
Range:
documents:owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:editor

URI: http://rdf.muninn-project.org/ontologies/documents#editor

Editor - Original editor(s) of Document. Might have more than one.

OWL Type:
ObjectProperty
sub-property-of:
dc:contributor
Domain:
documents:Document
Range:
documents:owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:filled_out_form

URI: http://rdf.muninn-project.org/ontologies/documents#filled_out_form

Form - This document is a filled out form.

OWL Type:
DatatypeProperty
Domain:
documents:Document
owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:format

URI: http://rdf.muninn-project.org/ontologies/documents#format

Mime-Type -

OWL Type:
DatatypeProperty
sub-property-of:
dct:MediaType
Domain:
documents:Image
Range:
xsd:string

No detailed documentation for this term.

[back to top]

Property: documents:front_page

URI: http://rdf.muninn-project.org/ontologies/documents#front_page

Front Page - The side of the physical piece of paper that should be read first.

Inverse:
documents:back_page
OWL Type:
ObjectProperty
Domain:
documents:Page
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:location

URI: http://rdf.muninn-project.org/ontologies/documents#location

Location -

OWL Type:
ObjectProperty
Domain:
documents:Image
Range:
documents:location

No detailed documentation for this term.

[back to top]

Property: documents:next_page

URI: http://rdf.muninn-project.org/ontologies/documents#next_page

Next Page - The next physical page that a human reader would read, even if blank. This implies that this property is pointing to a page that has no front_page property.

OWL Type:
ObjectProperty
Domain:
documents:Page
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:pages

URI: http://rdf.muninn-project.org/ontologies/documents#pages

Pages - Total number of pages (single or double sided) in the document. This count may or might not match the number of content properties if one-sided and double-sided documents are present.

OWL Type:
DatatypeProperty
Domain:
documents:Document
xsd:decimal

No detailed documentation for this term.

[back to top]

Property: documents:previous_page

URI: http://rdf.muninn-project.org/ontologies/documents#previous_page

Previous Page - The previous physical page that a human reader would have just read.

OWL Type:
ObjectProperty
Domain:
documents:Page
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:publisher

URI: http://rdf.muninn-project.org/ontologies/documents#publisher

Original publisher of modeled document, may be different than the publisher of the digital copy of the documents images. -

OWL Type:
ObjectProperty
sub-property-of:
dc:publisher
Domain:
documents:Document
Range:
documents:owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:size_bytes

URI: http://rdf.muninn-project.org/ontologies/documents#size_bytes

Size of image in 8-bit bytes. -

OWL Type:
DatatypeProperty
Domain:
documents:Image
Range:
xsd:integer

No detailed documentation for this term.

[back to top]

Property: documents:source

URI: http://rdf.muninn-project.org/ontologies/documents#source

Editor - Original Source of Document. Most likely a url or organization.

OWL Type:
ObjectProperty
sub-property-of:
dc:source
Domain:
documents:Document
Range:
documents:Document

No detailed documentation for this term.

[back to top]

Property: documents:title

URI: http://rdf.muninn-project.org/ontologies/documents#title

Title -

OWL Type:
DatatypeProperty
sub-property-of:
dc:title
Domain:
documents:Document
xsd:string

No detailed documentation for this term.

[back to top]

Property: documents:url

URI: http://rdf.muninn-project.org/ontologies/documents#url

Image URL - Shortcut to the full url to the image ressource.

OWL Type:
DatatypeProperty
Domain:
documents:Image
Range:
xsd:anyURI

No detailed documentation for this term.

[back to top]

Property: documents:x_pixels

URI: http://rdf.muninn-project.org/ontologies/documents#x_pixels

Width of image in Pixels. -

OWL Type:
DatatypeProperty
Domain:
documents:Image
Range:
xsd:integer

No detailed documentation for this term.

[back to top]

Property: documents:y_pixels

URI: http://rdf.muninn-project.org/ontologies/documents#y_pixels

Height of image in Pixels. -

OWL Type:
DatatypeProperty
Domain:
documents:Image
Range:
xsd:integer

No detailed documentation for this term.

[back to top]

Instance: ForwardToOriginal

URI: http://rdf.muninn-project.org/ww1/2011/11/11/Permission/ForwardToOriginal

Forward to original copy - Forward to source URL

RDF Type:
http://creativecommons.org/ns#Permission

No detailed documentation for this term.

[back to top]

Instance: ForwardToPublisher

URI: http://rdf.muninn-project.org/ww1/2011/11/11/Permission/ForwardToPublisher

Forward To Publisher - Webserver will forward request to content to the publisher or source organization.

RDF Type:
http://creativecommons.org/ns#Permission

No detailed documentation for this term.

[back to top]

Instance: Restricted

URI: http://rdf.muninn-project.org/ww1/2011/11/11/Permission/Restricted

Restricted Access - Webserver will not share content.

RDF Type:
http://creativecommons.org/ns#Permission

No detailed documentation for this term.

[back to top]

Conclusion and Future Work

The ontology is a limited representation of an archival document.