An informal gathering of folks using semantic web approaches in their ANDS-funded projects met after the ANDS Queensland Community Day on 17 February 2011 at QUT. Dubbing ourselves the ANDS Queensland Semantic Web Interest Group (SWIG), we came together with the aim of understanding what semantic web approaches are being used in ANDS projects, to uncover best practices and common issues, and to identify areas where we might collaborate.
It turns out there is a lot of semantic web activity in ANDS Queensland projects. Even with just high-level summaries from those present we easily used up our allocated hour of discussion time. This post summarises the projects that were discussed at the meeting. Subsequent posts will cover the major discussion points and potential areas for action.
- 2011-02-23: clarified QUT data capture projects, QUT contributions to ANZSRC vocabularies and fixed link to R2RML.
- 2011-03-02: further refined description of QUT data capture project.
A broad definition of “Semantic Web”
The projects represented took a broad view of what “semantic web” meant in their projects. Some are interested in ensuring we all use the same URIs to identify vocabulary terms in our metadata, some in publishing data into the Linked Data Cloud, and some in building systems that uses ontologies to represent their data model.
ANDS “In-house” Projects
The ANDS folks at the meeting described “Data Connections” projects being pursued within ANDS.
One project involves creating “linked data endpoints” for information on NHMRC and ARC grants. Research Data Australia (RDA) already alows linkages between collection descriptions and descriptions of the activities (usually grants) that funded the work. Most ANDS projects are currently writing their own activity descriptions for activities and uploading them to RDA alongside their collection descriptions. This means that there is potential for two descriptions of the same overarching activity to be uploaded a number of times and described slightly differently each time. The project aims to reduce this duplicated effort by creating definitive URIs to identify NHMRC and ARC grants. This would allow collection records to unambiguously reference the (NHMRC and ARC) activities that funded them and for RDA users to navigate to definitive descriptions of the activities.
ANDS is working with Geoscience Australia to expose the Gazetteer of Australia as Linked Data. The gazetter provides information on the location and spelling of Australian geographical names. It is already searchable online, but the underlying data isn’t exposed in a machine-readable format. The project is investigating exposing the data in a machine readable format and linking it with the international geonames ontology. This would allow RDA collection records to unambiguously identify Australian place names and to link collection descriptions with more information about their geographic location.
ANDS also reported that a interest group has been established to examine requirements and use cases for developing Vocabulary Services for e-Research, involving the ABS, CSIRO, Geoscience Australia, and other bodies; this follows a workshop held at Geoscience Australia in November 2010. A number of folks at the meeting requested more information about the project and expressed a desire to contribute.
Tropical Data Hub
James Cook University reported that they are keen to implement a semantic layer within their tropical data hub project. They are interested in pursuing a linked data approach to exposing their data and linking it with other data and facilities. They are also keen to create a portal with a semantic layer for intelligent searches. JCU are just embarking on this project and are still shopping for technology that meets their requirements.
University of Southern Queensland / Newcastle University
As has been previously described on this and other blogs, the USQ / Newcastle RedBox metadata store project is keen to play with the linked data world. The project team reported that one of their key barriers has been the lack of definitive URIs for referencing things in the research space like ANZSRC subject terms, parties, activities or geographic places. While the RedBox team is aware that ANDS projects like those described above are addressing this issue, they couldn’t wait and so have developed some interim solutions:
- The Mint is local infrastructure for identify and describing parties at Newcastle. It pulls information from various authorities, allows for data merging / de-duplication and publishes party identifiers as URIs.
- They are storing a local (cut down?) copy of the geonames ontology to avoid load time issues they found with using the complete dataset.
- USQ created URIs for every term in the ABS ANZSRC Field of Research (for), Socio-Economic Objective (seo) and Type of Activity (toa) vocabularies. This allows them to unambiguously reference subject terms from the Linked Data collection descriptions. These URIs currently de-reference to either a human readable HTML page or a machine-interpretable SKOS representation of the vocabulary.
USQ also mentioned their work examining how to link datasets to papers in a package, as part of the Beyond the PDF initiative.
University of Queensland
Like USQ, the University of Queensland metadata store is being designed to play well in the Linked Data world. It will re-use URIs for existing entities (where they exist), provide URIs for the new entities it describes, and will be able to expose its data using an RDF serialisation.
Many of the other nationally funded projects within the UQ eResearch Lab have employed semantic web approaches:
- the Scoping Study for W4 Semantic Tagging Service report investigates the availability of suitable technologies for providing a Who/What/Where/When (W4) semantic tagging service.
- Aus-e-lit has implemented a compound object authoring and publishing service based on the OAI-ORE semantic web model. The service is useful for modelling provenance of literary resources and for online learning objects for teaching and research
- early work on the Health-e-reef project used an ontology-driven approach to integrating data from heterogeneous datasets about the Great Barrier Reef
- the Phenomics Ontology Driven Data Management Project (PODD) is developing semantic web data management solutions to meet the needs of researchers working at the Australian Plant Phenomics Facility (APPF) and the Australian Phenomics Network (APN)
Although they weren’t present at the SWIG, the Queensland Facility for Advanced Bioinformatics team at UQ was present at the ANDS community day and reported on some relevant work. Their team is using ontologies to analyse data in the Australian mirror of European Bioinformatics Institute molecular biology database with the aim populating RDA with collection descriptions of EBI data that relevant to Australian species.
Queensland University of Technology
QUT worked with Griffith University on the Metadata Exchange Hub project that adapted the VIVO open source semantic web application to collect metadata from multiple university sources and publish it to Research Data Australia (see Griffith report below).
QUT are also working on an ANDS data capture project in the bio-diversity domain that will help to create mappings between relational schemas and ontologies. The QUT team reported that they won’t be developing a sophisticated ontology: the end goal is generating ANDS compliant collection descriptions, and the research group hasn’t expressed a need for modelling complex data. The project will be storing the underlying data in a relational database and will develop a relational to RDF transformation using a technology like D2RQ or R2RML.
Griffith University collaborated with QUT on adapting the VIVO semantic web application to publish data to Research Data Australia. The Griffith team reported that VIVO has proven to be a good choice of technology, and will also be used for other in-house applications.
A major concrete output of this work is the ANDS-VITRO ontology: an extension to the VIVO ontology originally developed by the VIVO project. The project team collaborated with Simon Porter from Melbourne University to extend the VITRO ontology with concepts and properties needed to syndicate information to Research Data Australia. Although the ontology URI isn’t currently resolvable online, it can be downloaded directly from Griffith University.
As part of Metadata Exchange Hub project, Griffith and QUT developed OWL representations of the ANZSRC vocabularies, building on the USQ SKOS representations. They deliberately re-used the vocabulary term URIs defined by USQ to indicate that the two vocabulary representations described the same concepts. The group discussed that in the future it might be sensible for the URIs assigned to de-reference to this new OWL+SKOS representation.
In the next installment …
Hopefully this summary gives an impression of the breadth of semantic web activity in ANDS funded projects.
There is also a substantial amount of depth to some of these projects. Many of the SWIG participants commented that the ANDS-VITRO ontology and the SKOS/OWL representations of the ANZSRC vocabularies are fundamental pieces of work that are already being re-used in the community. There was a lot of discussion about the need to host and maintain this work, as well as share emerging best practices. These discussions will be the subject of my next blog post reporting on our first SWIG.
Written by Nigel Ward. Copyright The University of Queensland, 2011. Licensed under Creative Commons Attribution-Share Alike 3.0 Australia. <http://creativecommons.org/licenses/by-sa/3.0/au/>.
The project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.