This is the second in a series of posts summarising discussion at the first ANDS Queensland SWIG (Semantic Web Interest Group) meeting, held on 17 February 2011 at QUT. My previous post catalogued the projects represented at the meeting. This post covers one of the major discussion point at the meeting: how to publish and re-use common vocabularies.
Research data vocabularies created in ANDS projects
The Australian ANDS community have developed ontology and vocabulary representations that are already being re-used beyond their original projects:
- the ANDS-VITRO ontology developed by Griffith University, QUT and Melbourne University
- RDF encodings of the ANZSRC Field of Research (for), Socio-Economic Objective (seo) and Type of Activity (toa) vocabularies developed by USQ, Griffith University and QUT
Most of the projects represented at the SWIG are using terms from these vocabularies to describe their research data holdings. Most of the folks present also raised issues about how these vocabularies could be referenced and accessed.
Vocabulary publishing principles
To understand the issues, it is worth taking a step back and thinking about requirements for hosting a Linked Data vocabulary. As Duncan Dickinson pointed out previously on this blog, vocabularies are most easily re-used if they are published according Tim Berners Lee’s Linked Data principles:
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
4. Include links to other URIs. so that they can discover more things.
I think it is also useful to consider a further principle from the W3C Cool URIs for the Semantic Web document:
5. Stability. Once you set up a URI to identify a certain resource, it should remain this way as long as possible
It turns out that the ANDS-VITRO and ANZSRC RDF vocabularies stack up well against principle 1, principle 2, and principle 4. In my opinion many of the issues raised at the SWIG pertain to support for principle 3 and principle 5. Let’s go through them one by one at assess the situation …
HTTP URIs and Linking
All terms in the ANZSRC RDF vocabularies and the ANDS-VITRO ontology are assigned HTTP URIs (meeting principles 1 & 2). The definitions in the ANDS-VITRO ontology link extensively to other URIs (meeting principle 4). The ontology re-uses established vocabularies by linking to terms in Dublin Core, FOAF, BIBO, and the base VIVO ontology.
The definitions in the ANZSRC RDF vocabularies have fewer URI linkages, but this makes sense given they are foundational and not built from other vocabularies. While it is technically feasible to make links between vocabulary terms in ANZSRC and equivalents in other subject vocabularies (such as Library of Congress Subject Headings), this would require significant effort, and I’m not aware of this being raised as a requirement. So: ANZSRC meets principle 4 to the extent that it currently makes sense to do so.
Useful information and Stable URIs: ANZSRC
The hosting of the ANZSRC RDF vocabularies supports principle 3 (Useful information). Dereferencing a URI from the ANZSRC FOR RDF vocabulary like http://purl.org/anzsrc/for#group_0501 takes you to information about the definition. Additionally, the information you get back varies depending on how you ask for it (the web server hosting the vocabulary supports content negotiation). Clicking on the link in a web browser will return a HTML page with information on the vocabulary term. Linked data browsers on the other hand can get various RDF encodings of the vocabulary.
The URIs in the ANZSRC RDF vocabularies have also been designed to address principle 5 (Stable URIs). The vocabularies use PURLs that redirect to URIs where the vocabularies are actually hosted. Currently the vocabularies are hosted by ADFI at USQ, but if they were moved in the future, the PURLs could simply be updated to redirect to the new location and folks could still refer to the vocabularies using the same PURLs.
The SWIG did discuss whether ADFI was the best place to host these vocabularies. Given ANZSRC is published by the Australian Bureau of Statistics, wouldn’t it make more sense for them to host a machine readable encoding? The ANDS folks suggested that this issue might be addressed by their “Data Connections” projects (see previous blog post).
The SWIG also discussed whether PURLs were the best URI option for identifying vocabularies because they can get in the way of content negotiation. For example, asking for an RDF+XML serialisation of http://purl.org/anzsrc/for#group_0501 returns a HTTP 302 redirect with an HTML body, rather than RDF:
GET /anzsrc/for#group_0501 HTTP/1.1 Host: purl.org Accept: application/rdf+xml
returns
HTTP/1.0 302 Moved Temporarily Location: http://namespace.adfi.usq.edu.au/anzsrc/for/ Content-Type: text/html; charset=iso-8859-1
I suspect the community needs to discuss this issue further. In my experience, it is possible to ignore the returned HTML and still do content negotiation on the redirected URI (http://namespace.adfi.usq.edu.au/anzsrc/for/). Some clients can even do that automatically for you (e.g. curl -L), but your mileage may vary depending on the client you are using. For this reason, some at the meeting thought it might be better to run our own PURL server that could do direct content negotiation. Once again, I think this needs further discussion.
Useful information and Stable URIs: ANDS-VITRO
The designers of the ANDS-VITRO ontology gave it a base URI of http://www.ands.org.au/ontologies/ns/0.1/VITRO-ANDS.owl# with the intent of getting ANDS to host the ontology for others to use. Clicking on the above URI, however, reveals that ANDS have (so far) declined to host the ontology (it returns a 404 Not Found error). ANDS sees the ontology as an output of one of their projects, rather than an official ANDS output. They say that
“for ANDS to host the ANDS Vitro ontology would be to both endorse it and imbue it with a degree of officialness with which we were not comfortable.”
The SWIG resolved to ask ANDS to reconsider its position in light of how many folks are using the ontology as the basis for their ANDS work. For my sins, I’ve been asked to draft a letter to ANDS requesting clarification and reconsideration of their position. I’ll keep this blog informed about progress.
To their credit, ANDS have suggested another possible way forward: they suggested that they could register a more neutral domain name such as vitro.org.au to host the ontology. This solution requires that all of us who are currently using http://www.ands.org.au/ontologies/ns/0.1/VITRO-ANDS.owl# in our metadata to update our URIs to something like http://vitro.org.au/ontologies/ns/0.1/VITRO-ANDS.owl# . This change might still be feasible at this early stage, but a decision needs to be made quickly, before too many uses of http://www.ands.org.au/ontologies/ns/0.1/VITRO-ANDS.owl# are published on the (linked data) web.
In the next installment …
This post covered the major issue discussed at the SWIG: publishing our community vocabularies. The SWIG also discussed other interesting semantic web topics including best practice for using the vocabularies, and publishing data (as opposed to just metadata) using semantic web technologies. I’ll cover these topics in (yet another) blog post.
Written by Nigel Ward. Copyright The University of Queensland, 2011. Licensed under Creative Commons Attribution-Share Alike 3.0 Australia. <http://creativecommons.org/licenses/by-sa/3.0/au/>. 
The project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.