Muninn Documents Ontology Specification - 1.1

The Muninn ontology to specify and work with documents.

Working Draft — 15 March 2012

This version:
http://rdf.muninn-project.org/ontologies/documents-20120315.html (owl)
Latest version:
http://rdf.muninn-project.org/ontologies/documents.html (owl)
Previous version:
http://rdf.muninn-project.org/ontologies/documents-20120219.html (owl)
Last Update: 1.1
Date: 15 March 2012
Authors:
Robert Warren, The Muninn Project

Abstract

This specification defines a set of classes and properties used to represent documents and their structure.

Status of this Document

This document is based on the current practices used by The Muninn Project to represent documents and digitally scanned images within the catalog.

Table of Contents

  1. Introduction
    1. Ontology Objectives
    2. Ontology Limitations
  2. Documents ontological constructs at a glance
  3. Ontology use cases
  4. Cross-reference for Documents classes and properties
  5. Conclusion and future work

Introduction

This project was created as a result of some of the data and modeling problems encountered in the Muninn WW1 Project. The ontology records the bibliographic, provenance and digitization infromation of archival documents. It provides classes that are subclasses of FOAF for better compatibility while adding some needed functionality to manage documents.

The major differences with core FOAF classes has to do with properties that support document pages, digital representation of each of the pages, digital rights support and support for forms.

Ontology Objectives

The initial design objectives for the ontology were:

  1. The ability to markup a document with the minimum amount of structure required for analysis.
  2. Provide some provenance hooks for the source archive and re-distribution infromation.
  3. Provide linkages between the original document and its transcribed and transformed form.
  4. The ability to identify certain pages of the document as a filled out form.

Ontology Limitations

The ontology is a limited representation of the full document structure in that it represents the physical parts of the document and not its logical organization. The properties for a page reflect this in its previous_page and next_page organisation.

Ontological constructs at a glance

Class organization revolves around a Document class that is the union of the FOAF Document class and the Creative Commons Works class for maximal compatibility. A Page class provides a container for each page of the document who's digitized image is in turn provided by an Image subclassed from the FOAF Image class. The Form class is another subclass of a Document (as some forms tend to be full blow documents themselves) but with additional properties intended to document their use and issuing organization.

The ontology makes extensive use of subclassing because of the different layers of a document whose different representation require multiple layers of meta-data. For example, which some works are clearly in the public domain some archival organization claim copyright on the photograph of the document itself. This allows the representation of situations where the contents of the document have been transcribed / OCR'd from a source image that is not publically available.

Documents, Document Models and Representations.

The markup described within this ontology is a document which models a physical document whose physical pages are represented by a set of images. To avoid confusion we use the term modeled document to reference the complete physical document while a representation references a single digital image of a part of the physical document.

A collection class represents any sort of container of documents, both as a model or an actual collection.

Document copyright, access control and licensing

The handling of multiple sources of documents sometimes requires different levels of access controls for different representation of the document. It is common for a historical document to be owned by a museum, its contents to be out of copyright while the images of the document be copyrighted by the photographer.

This ontology provides the combined properties of FOAF, CC Works and some properties borrowed from Dublin Core. Together they provide a powerful description markup for the different parts of the document. The initial properties on the document are:

These properties are the cornerstone for the definition of the ownership (and authority) over the document. The date information is important in that it defines the begining of the countdown clock after which copyright can expire.

Within the Creative Commons Works class there exists a license property that references a License class. While this allows someone to instantiate any type of license, the design is not ideal in that the cc:deprecatedOn property references the license and not the application of a license to the work.

Currently the known instances of the License Class are:

A special class of Jurisdictions are English Crown Copyright. Generically, the rights lapse after 50 years unless otherwise reserved.

The Creative Commons License Class contains the following properties.

The CC Work class makes use of these last two properties which are inverse of each other. The ontology also has an additional property derived from Dublin Core accessRights which combines both properties into a single directive.

Currently the access control and rights properties of the ontology are only used to markup the information. However, a future direction is the use of this information to infer using the reasonner the appropriate distribution and access rights based on the date, location and original licensing of the document.

Classes: Collection, Document, Image, Page, Text_Snippet,

Properties: accessRights, authoredBy, back_page, contains, contains_document, custodian, date_copyrighted, date_created, date_digitized, date_published, date_retrieved, depiction, description, document_contained_in, editor, filled_out_form, first_page, format, front_page, hasAuthored, location, next_page, pages, previous_page, publisher, raw_text, size_bytes, source, title, url, x_pixels, y_pixels,

Instances: Australia_Copyright_Act_1905, Australia_Copyright_Act_1912, Australia_Copyright_Act_1968, Australian_Crown_Copyright, British_Copyright_Act_1842, British_Copyright_Act_1911, British_Copyright_Act_1956, British_Copyright_Act_1988, British_Crown_Copyright, British_India_Crown_Copyright, British_Indian_Copyright_Act_1914, Canadian_Copyright_Act_1922, Canadian_Copyright_Act_1985, Canadian_Crown_Copyright, ForwardToOriginal, ForwardToPublisher, German_Empire_Copyright, Indian_Copyright_Act_1957, New_Zealand_Copyright_Act_1994, New_Zealand_Crown_Copyright, Newfoundland_And_Labrador_Crown_Copyright, Restricted, US_Copyright_Act_1998, US_Copyright_Act_of_1909,

Documents ontology overview

A few examples are presented here

3.1. Example

Here is a very basic representation of an image:

        <documents:Image rdf:ressource="http://rdf.muninn-project.org/ww1/2011/11/11/image/34">
         <documents:size_bytes>39475</documents:size_bytes;>
         <documents:x_pixels>256</documents:x_pixels>         
         <documents:y_pixels>256</documents:y_pixels>         
         <documents:format>image/png</documents:format>         
         <documents:thumbnail>
          <documents:Image rdf:ressource="http://rdf.muninn-project.org/ww1/2011/11/11/image/34/thumbnail"/>
         <documents:thumbnail>
         <!-- Wikimedia isn't on dbpedia -->
         <documents:source rdf:ressource="http://commons.wikimedia.org/wiki/File:Crystal_Project_My_documents.png"/>
         <documents:date_retrieved rdf:datatype="http://www.w3.org/2001/XMLSchema#date>2012-02-12<documents:date_retrieved>
         <documents:creator rdf:ressource="http://dbpedia.org/page/Everaldo_Coelho"/>
         <documents:license rdf:ressource="http://rdf.muninn-project.org/ontologies/documents#lgpl"/> 
         <documents:date_published rdf:datatype="http://www.w3.org/2001/XMLSchema#date>2007-06-16<documents:date_published/>
         <documents:description>An icon from the Crystal Project icon theme.</documents:description>
         <!-- Webserver will serve copy of picture to anyone. -->
         <documents:accessRights rdf:ressource="http://creativecommons.org/ns#Distribution"/>
        </documents:Image>
      

Here is a very basic representation of an Canadian Expeditonary Force attestation form from the Great War:

        <documents:Document rdf:ressource="http://rdf.muninn-project.org/ww1/2011/11/11/image/34">
         <documents:size_bytes>39475</documents:size_bytes;>
         <documents:x_pixels>256</documents:x_pixels>         
         <documents:y_pixels>256</documents:y_pixels>         
         <documents:format>image/png</documents:format>         
         <documents:thumbnail>
          <documents:Image rdf:ressource="http://rdf.muninn-project.org/ww1/2011/11/11/image/34/thumbnail"/>
         <documents:thumbnail>
         <!-- Wikimedia isn't on dbpedia -->
         <documents:source rdf:ressource="http://commons.wikimedia.org/wiki/File:Crystal_Project_My_documents.png"/>
         <documents:date_retrieved rdf:datatype="http://www.w3.org/2001/XMLSchema#date>2012-02-12<documents:date_retrieved>
         <documents:creator rdf:ressource="http://dbpedia.org/page/Everaldo_Coelho"/>
         <documents:license rdf:ressource="http://rdf.muninn-project.org/ontologies/documents#lgpl"/> 
         <documents:date_published rdf:datatype="http://www.w3.org/2001/XMLSchema#date>2007-06-16<documents:date_published/>
         <documents:description>An icon from the Crystal Project icon theme.</documents:description>
         <!-- Webserver will serve copy of picture to anyone. -->
         <documents:accessRights rdf:ressource="http://creativecommons.org/ns#Distribution"/>
        </documents:Image>
      

Muninn Specific Implementation details

The Muninn RDF makes extensive use of header content negotiation in HTTP headers to provide the end user with exactly the type of content that is required. There are a few extensions that are available to force certain behaviours on the part of the server.

RDF representation only

While a sparql server will allways return the appropriate content to a sparql client, it is sometimes convinient to force the server to send only the RDF metadata contents.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/about.rdf
     

Image in original format

While a sparql server will allways return the appropriate content to a sparql client, it is sometimes convinient to force the server to send only the RDF metadata contents.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/image
     

Thumbnail in original format

A 130x100 thumbnail image can be requested in the original format by requesting this specific name.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/thumbnail
     

Image in specified format

The image can be requested in a specific graphics format by requesting it using the appropriate extension (eg: image.jpg for JPG, image.png for PNG, etc...). This conversion is done automatically if the http headers request a specific format.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/image.ext
     

Thumbnail

A 130x100 thumbnail image can be requested in a specific graphics format by requesting it using the appropriate extension (eg: image.jpg for JPG, image.png for PNG, etc...). This conversion is done automatically if the http headers request a specific format.

      http://rdf.muninn-project.org/ww1/2011/11/11/image/34/thumbnail.ext
     

Cross-reference for classes and properties

Class: documents:Collection

URI: http://rdf.muninn-project.org/ontologies/documents#Collection

Collection - A model of a physical Collection of documents or a Collection of digital documents.

in-domain-of:
documents:contains_document
documents:title
documents:date_published
documents:date_retrieved
documents:date_digitized
documents:date_created
documents:date_copyrighted
documents:publisher
documents:custodian
documents:description
documents:authoredBy
documents:hasAuthored
documents:editor
documents:source
documents:accessRights
in-range-of:
documents:document_contained_in

No detailed documentation for this term.

[back to top]

Class: documents:Document

URI: http://rdf.muninn-project.org/ontologies/documents#Document

A Digital Document - A digital document made up of 1 or more pages.

in-domain-of:
documents:document_contained_in
documents:location
documents:filled_out_form
documents:pages
documents:title
documents:date_published
documents:date_retrieved
documents:date_digitized
documents:date_created
documents:date_copyrighted
documents:publisher
documents:custodian
documents:description
documents:authoredBy
documents:hasAuthored
documents:editor
documents:source
documents:contains
documents:accessRights
documents:first_page
in-range-of:
documents:contains_document

No detailed documentation for this term.

[back to top]

Class: documents:Image

URI: http://rdf.muninn-project.org/ontologies/documents#Image

A Digital Image - A digital image

sub-class-of:
http://xmlns.com/foaf/spec/Image
in-domain-of:
documents:format
documents:x_pixels
documents:y_pixels
documents:size_bytes
documents:url
in-range-of:
documents:depiction

No detailed documentation for this term.

[back to top]

Class: documents:Page

URI: http://rdf.muninn-project.org/ontologies/documents#Page

Page - A page of a document, may be double-sided.

sub-class-of:
documents:Document
in-domain-of:
documents:page_number
documents:depiction
documents:back_page
documents:front_page
documents:next_page
documents:previous_page
in-range-of:
documents:contains
documents:back_page
documents:front_page
documents:next_page
documents:first_page
documents:previous_page

No detailed documentation for this term.

[back to top]

Class: documents:Text_Snippet

URI: http://rdf.muninn-project.org/ontologies/documents#Text_Snippet

Text Snippet - A construction of string litterals. Not meant to represent a full document.

in-domain-of:
documents:raw_text

No detailed documentation for this term.

[back to top]

Property: documents:accessRights

URI: http://rdf.muninn-project.org/ontologies/documents#accessRights

Access Rights - Meant as a side support to the rights property

OWL Type:
ObjectProperty
sub-property-of:
dct:accessRights
Domain:
documents:Collection
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
Range:
cc:Permission

No detailed documentation for this term.

[back to top]

Property: documents:authoredBy

URI: http://rdf.muninn-project.org/ontologies/documents#authoredBy

Original author(s) of Document. Might have more than one. -

OWL Type:
ObjectProperty
sub-property-of:
dc:creator
documents:author
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection

No detailed documentation for this term.

[back to top]

Property: documents:back_page

URI: http://rdf.muninn-project.org/ontologies/documents#back_page

Back Page - The side of the physical piece of paper that should be read last.

OWL Type:
ObjectProperty
Domain:
documents:Page
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:contains

URI: http://rdf.muninn-project.org/ontologies/documents#contains

A page of the document - no ordering on property. -

OWL Type:
ObjectProperty
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:contains_document

URI: http://rdf.muninn-project.org/ontologies/documents#contains_document

Contains - Links a Document to this Collection.

OWL Type:
ObjectProperty
Domain:
documents:Collection
Range:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work

No detailed documentation for this term.

[back to top]

Property: documents:custodian

URI: http://rdf.muninn-project.org/ontologies/documents#custodian

Custodian - The entity responsible for the documents when the original publisher does not control the works.

OWL Type:
ObjectProperty
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
documents:owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:date_copyrighted

URI: http://rdf.muninn-project.org/ontologies/documents#date_copyrighted

Copyrighted Date - Copyright date is synonymous with creation date in most cases. Used primarily to support copyright status and distribution permissions.

OWL Type:
ObjectProperty
sub-property-of:
dct:dateCopyrighted
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:date_created

URI: http://rdf.muninn-project.org/ontologies/documents#date_created

Creation Date - Date of this *record* created. (unstable / non-standard)

OWL Type:
ObjectProperty
sub-property-of:
dct:created
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:date_digitized

URI: http://rdf.muninn-project.org/ontologies/documents#date_digitized

Digitization Date - Date that this document was digitized from a physical representation.

OWL Type:
ObjectProperty
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:date_published

URI: http://rdf.muninn-project.org/ontologies/documents#date_published

Publication Date - Date of publication of the document being modeled.

OWL Type:
ObjectProperty
Domain:
documents:Collection
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:date_retrieved

URI: http://rdf.muninn-project.org/ontologies/documents#date_retrieved

Date Retrieved - Date that the modeled document was retrieved from another source (eg: downloaded from a web server).

OWL Type:
ObjectProperty
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
time:TemporalEntity

No detailed documentation for this term.

[back to top]

Property: documents:depiction

URI: http://rdf.muninn-project.org/ontologies/documents#depiction

Imaged copy of the page. -

OWL Type:
ObjectProperty
sub-property-of:
foaf:depiction
Domain:
documents:Page
Range:
documents:Image

No detailed documentation for this term.

[back to top]

Property: documents:description

URI: http://rdf.muninn-project.org/ontologies/documents#description

Description -

OWL Type:
ObjectProperty
sub-property-of:
dc:description
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
documents:owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:document_contained_in

URI: http://rdf.muninn-project.org/ontologies/documents#document_contained_in

Contained in - Links a Collection to this Document.

Inverse:
documents:contains_document
OWL Type:
ObjectProperty
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
Range:
documents:Collection

No detailed documentation for this term.

[back to top]

Property: documents:editor

URI: http://rdf.muninn-project.org/ontologies/documents#editor

Editor - Original editor(s) of Document. Might have more than one.

OWL Type:
ObjectProperty
sub-property-of:
dc:contributor
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
documents:owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:filled_out_form

URI: http://rdf.muninn-project.org/ontologies/documents#filled_out_form

Form - This document is a filled out form.

OWL Type:
DatatypeProperty
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
Range:
owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:first_page

URI: http://rdf.muninn-project.org/ontologies/documents#first_page

First Page - Convinience method to find the first page to read.

OWL Type:
ObjectProperty
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:format

URI: http://rdf.muninn-project.org/ontologies/documents#format

Mime-Type -

OWL Type:
DatatypeProperty
sub-property-of:
dct:MediaType
Domain:
documents:Image
Range:
xsd:string

No detailed documentation for this term.

[back to top]

Property: documents:front_page

URI: http://rdf.muninn-project.org/ontologies/documents#front_page

Front Page - The side of the physical piece of paper that should be read first.

Inverse:
documents:back_page
OWL Type:
ObjectProperty
Domain:
documents:Page
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:hasAuthored

URI: http://rdf.muninn-project.org/ontologies/documents#hasAuthored

Original author(s) of Document. Might have more than one. -

Inverse:
documents:authoredBy
OWL Type:
ObjectProperty
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection

No detailed documentation for this term.

[back to top]

Property: documents:location

URI: http://rdf.muninn-project.org/ontologies/documents#location

Location -

OWL Type:
ObjectProperty
sub-property-of:
dct:spatial
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
Range:
owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:next_page

URI: http://rdf.muninn-project.org/ontologies/documents#next_page

Next Page - The next physical page that a human reader would read, even if blank. This implies that this property is pointing to a page that has no front_page property.

OWL Type:
ObjectProperty
Domain:
documents:Page
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:pages

URI: http://rdf.muninn-project.org/ontologies/documents#pages

Pages - Total number of pages (single or double sided) in the document. This count may or might not match the number of content properties if one-sided and double-sided documents are present.

OWL Type:
DatatypeProperty
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
Range:
xsd:decimal

No detailed documentation for this term.

[back to top]

Property: documents:previous_page

URI: http://rdf.muninn-project.org/ontologies/documents#previous_page

Previous Page - The previous physical page that a human reader would have just read.

OWL Type:
ObjectProperty
Domain:
documents:Page
Range:
documents:Page

No detailed documentation for this term.

[back to top]

Property: documents:publisher

URI: http://rdf.muninn-project.org/ontologies/documents#publisher

Original publisher of modeled document, may be different than the publisher of the digital copy of the documents images. -

OWL Type:
ObjectProperty
sub-property-of:
dc:publisher
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
documents:owl:Thing

No detailed documentation for this term.

[back to top]

Property: documents:raw_text

URI: http://rdf.muninn-project.org/ontologies/documents#raw_text

Raw Text -

OWL Type:
DatatypeProperty
Domain:
documents:Text_Snippet
Range:
xsd:string

No detailed documentation for this term.

[back to top]

Property: documents:size_bytes

URI: http://rdf.muninn-project.org/ontologies/documents#size_bytes

Size of image in 8-bit bytes. -

OWL Type:
DatatypeProperty
Domain:
documents:Image
Range:
xsd:integer

No detailed documentation for this term.

[back to top]

Property: documents:source

URI: http://rdf.muninn-project.org/ontologies/documents#source

Editor - Original Source of Document. Most likely a url or organization.

OWL Type:
ObjectProperty
sub-property-of:
dc:source
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
owl:thing

No detailed documentation for this term.

[back to top]

Property: documents:title

URI: http://rdf.muninn-project.org/ontologies/documents#title

Title -

OWL Type:
DatatypeProperty
sub-property-of:
dc:title
Domain:
http://xmlns.com/foaf/spec/Document
http://creativecommons.org/ns#Work
documents:Collection
Range:
xsd:string

No detailed documentation for this term.

[back to top]

Property: documents:url

URI: http://rdf.muninn-project.org/ontologies/documents#url

URL - Shortcut to the full url to the image ressource. Use this to avoid content negotiation.

OWL Type:
DatatypeProperty
Domain:
documents:Image
Range:
xsd:anyURI

No detailed documentation for this term.

[back to top]

Property: documents:x_pixels

URI: http://rdf.muninn-project.org/ontologies/documents#x_pixels

Width of image in Pixels. -

OWL Type:
DatatypeProperty
Domain:
documents:Image
Range:
xsd:integer

No detailed documentation for this term.

[back to top]

Property: documents:y_pixels

URI: http://rdf.muninn-project.org/ontologies/documents#y_pixels

Height of image in Pixels. -

OWL Type:
DatatypeProperty
Domain:
documents:Image
Range:
xsd:integer

No detailed documentation for this term.

[back to top]

Instance: ForwardToOriginal

URI: http://rdf.muninn-project.org/ww1/2011/11/11/Permission/ForwardToOriginal

Forward to original copy - Forward to source URL

RDF Type:
http://creativecommons.org/ns#Permission

No detailed documentation for this term.

[back to top]

Instance: ForwardToPublisher

URI: http://rdf.muninn-project.org/ww1/2011/11/11/Permission/ForwardToPublisher

Forward To Publisher - Webserver will forward request to content to the publisher or source organization.

RDF Type:
http://creativecommons.org/ns#Permission

No detailed documentation for this term.

[back to top]

Instance: Restricted

URI: http://rdf.muninn-project.org/ww1/2011/11/11/Permission/Restricted

Restricted Access - Webserver will not share content, unless authenticated.

RDF Type:
http://creativecommons.org/ns#Permission

No detailed documentation for this term.

[back to top]

Conclusion and Future Work

The ontology is a limited representation of an archival document.

Version History