<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Bites : Ifs, ANDS and buts</title>
	<atom:link href="http://www.ands-partners.org/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ands-partners.org/blog</link>
	<description>Unofficial ANDS-partner blog for ANDS-funded projects</description>
	<lastBuildDate>Wed, 09 Mar 2011 02:57:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.5</generator>
		<item>
		<title>In search of Linked Research Data best practices</title>
		<link>http://www.ands-partners.org/blog/2011/03/in-search-of-linked-research-data-best-practices/</link>
		<comments>http://www.ands-partners.org/blog/2011/03/in-search-of-linked-research-data-best-practices/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 01:36:12 +0000</pubDate>
		<dc:creator>Nigel Ward</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[ontology]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[SWIG]]></category>
		<category><![CDATA[VITRO]]></category>
		<category><![CDATA[VIVO]]></category>
		<category><![CDATA[vocabulary]]></category>

		<guid isPermaLink="false">http://www.ands-partners.org/blog/?p=223</guid>
		<description><![CDATA[This third and final post about the ANDS Queensland SWIG (Semantic Web Interest Group) meeting summarises our discussion of best practices for semantic web representation of research data and research metadata. Previous posts catalogued the projects represented at the meeting &#8230; <a href="http://www.ands-partners.org/blog/2011/03/in-search-of-linked-research-data-best-practices/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This third and final post about the ANDS Queensland SWIG (Semantic Web Interest Group) meeting summarises our discussion of best practices for semantic web representation of research data and research metadata.  Previous posts catalogued the <a href="http://www.ands-partners.org/blog/2011/02/our-first-swig/">projects</a> represented at the meeting and documented issues about <a href="http://www.ands-partners.org/blog/2011/02/publishing-research-metadata-vocabularies/">publishing common vocabularies</a>.</p>
<h2>Representing research outputs</h2>
<p>While most of the SWIG discussion focussed on semantic web representation of <em>metadata</em> about research entities (metadata about data collections, projects,  researchers etc), the group also discussed approaches to representing actual<em> research outputs</em> (such as datasets and annotations) using semantic web technologies. Projects discussed included:</p>
<ul>
<li>The <strong><a href="http://itee.uq.edu.au/~eresearch/projects/aus-e-lit/">Aus-e-lit project</a></strong> that has implemented a compound object authoring and publishing service based on the <a href="http://www.openarchives.org/ore/">OAI-ORE</a> semantic web model.  Jane Hunter from their team discussed the advantages of using semantic web technologies to represent research data. She suggested that semantic web approaches can succintly model relationships between entities, such as the relationship between a &#8220;cleaned&#8221; dataset derived from a raw dataset.  Semantic web representations also makes it easier to create visualisations of these high level relationships and present them to end users.</li>
<li>The Aus-e-lit researchers have recently joined the <strong><a href="http://www.openannotation.org/">Open Annotation Collaboration</a></strong> project. Social science, humanities, and crystallography communities create annotations in the course of their research, but find it difficult to share annotations between software systems. The Open Annotation Collaboration project aims to use semantic web approaches to move annotations across the boundaries of annotation clients, annotation servers, and content collections.</li>
<li>The ADFI team at USQ have <a href="http://ptsefton.com/2011/02/08/beyond-the-pdf-workshop-trip-report.htm">contributed</a> to the <strong><a href="https://sites.google.com/site/beyondthepdf/">Beyond the PDF</a></strong> project, examining semantic web approaches for linking papers to datasets and other related information.</li>
<li>The <strong><a href="http://itee.uq.edu.au/~eresearch/projects/ands/">Health-e-reef</a></strong> project used a high level ontology to unify coral reef survey observations. The ontology provided a way to unify observations from disparate data sources created by multiple independent projects  and with different data strucures.  The ontology models concepts such as Observations, Actors, Sites, Ecological Processes, and Measurements. James Cook University commented that they might use this ontology as part of their tropical data hub initiative.</li>
</ul>
<p>The SWIG discussed how representing data, as opposed to metadata, using semantic web technologies introduces new scalability challenges (moving from thousands of RDF triples to millions of RDF triples). For example, CSIRO presented a <a href="https://ocs.arcs.org.au/index.php/eraust/2010/paper/view/56">paper</a> at the 2010 eResearch Australasia conference about performance issues when querying and accessing the 30 million RDF triples in the Atlas of Living Australia dataset. CSIRO tested both open source and commercial triple stores, and observed execution times of over 8 hours for some queries, with variations of up to 9,000 times between some systems. In related work, Campbell Allen wrote a <a href="https://espace.library.uq.edu.au/view/UQ:218905">masters thesis</a>  that benchmarked RDF triple stores against a traditional relational database. He found that the relational solutions outperform the triple stores, especially for spatial queries.  Campbell did observe, however, that</p>
<blockquote><p>The Semantic Web RDF triple stores were found to be particularly suited to the data integration task at hand due the ability of ontologies to semantically define and link the data</p></blockquote>
<p>The SWIG felt that these RDF triple store scalability issues may iron themselves out over time, observing that relational databases had to overcome similar scalability problems early in their development.</p>
<h2>Representing metadata about research outputs</h2>
<p>Many of the projects at the SWIG use terms from the <a href="http://eresearch.griffith.edu.au/ANDS/vitro/ANDS-VITRO.owl">ANDS-VITRO ontology</a> to describe their research outputs. Some noted, however, that they had difficulty knowing how to apply the ontology and would benefit from documentation of best practice (or at least common practice) for representing research metadata as linked data. </p>
<p>The rest of this section provides an overview of common practice areas that might need documentation.  The SWIG raised the topics below, but I have extended the discussion with more detailed examples to illustrate some points.</p>
<h3>Best practices for using ANDS-VITRO and VIVO terms</h3>
<p>The ANDS-VITRO and VIVO vocabularies can represent some research metadata concepts in multiple ways. For example, the VIVO ontology contains both <tt>vivo:webpage</tt> and <tt>foaf:homepage</tt> properties for describing links to webpages.  Similarly, the ANDS-VITRO to RIF-CS metadata crosswalk document (available from the <a href="http://groups.google.com/group/ands-vitro">ANDS-VITRO google group</a>) contains many properties for describing Agent names:</p>
<ul>
<li><tt>bibo:prefixName</tt></li>
<li><tt>foaf:firstName</tt> / <tt>foaf:givenName</tt></li>
<li><tt>foaf:lastName</tt> / <tt>foaf:familyName</tt></li>
<li><tt>vivo:middleName</tt></li>
<li><tt>foaf:name</tt> / <tt>rdfs:label</tt> for display</li>
</ul>
<p>The SWIG felt that the community would benefit from guidance on using these types of &#8220;overlapping&#8221; vocabulary terms. Peter Sefton from USQ related his experience in the ARROW project where agreeing on common usage early could have avoided interoperability problems down the track.</p>
<h3>Supporting linked data principles</h3>
<p>Melbourne, Griffith and QUT originally created the ANDS-VITRO ontology to represent ANDS registry objects within the VIVO system. More recently, non-VIVO uses of the ontology have also emerged, with some institutions using the ontology to create linked data representations of their research metadata.  The documentation supporting the ontology, however, has inconsistent support for some of the principles championed by the linked data community. In particular, <a href="http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#whichvocabs">The How to Publish Linked Data on the Web</a> tutorial suggests the following principle for choosing vocabulary terms: </p>
<blockquote><p>
In order to make it as easy as possible for client applications to process your data, you should reuse terms from well-known vocabularies wherever possible.</p></blockquote>
<p>The ANDS-VITRO to RIF-CS metadata crosswalk document has mixed support for this principle. It supports the principle by recommending re-use of the Dublin Core description element (<tt>dcterms:description</tt>) for narrative descriptions of research data.  On the other hand, the same document breaks the principle by recommending use of <tt>vivo:hasSubjectArea</tt> rather than the much more widely used <tt>dcterms:subject</tt> term for describing the topic of a research data collection.</p>
<p>These inconsistencies probably stem from the original purpose of the ANDS-VITRO ontology as a representation of ANDS registry objects within the VIVO system. The designers presumably did not consider linked data as a primary driver. Given SWIG interest in linked data representations, however, the community would benefit from a discussion about using more common linked data vocabularies and how these align with the ANDS-VITRO ontology.</p>
<h3>Expanding the scope of ANDS-VITRO</h3>
<p>The SWIG discussed expanding the scope of the ANDS-VITRO ontology. </p>
<p>The current version of the ontology only contains a subset of the concepts covered by ANDS registry objects. For example, it does not include properties for describing spatial and temporal coverage of a data collection, or alternative titles for registry objects. Some members of the SWIG reported how they use common linked data properties to include these concepts in their metadata (such as <tt>dcterms:spatial, dcterms:temporal, dcterms:alternative</tt>). </p>
<p>The SWIG also discussed expanding the ontology to cover concepts beyond those needed for the ANDS registry. For example, Newcastle University wish to model derivation relationships between data sets, such as the relationship between a &#8220;cleaned&#8221; dataset and a raw dataset. Many institutions also want to describe record keeping requirements relating to research data, such as information on how long to keep data, and how to dispose of it when appropriate.  </p>
<p>Future work on the ontology might usefully compare these emerging practices, decide on a common approach, and document their use.</p>
<h2>Where to from here?</h2>
<p>This post summarises SWIG discussion of emerging practices for representing Australian research data and research metadata using linked data technologies. I hope it also highlights areas for future work: both in extending the scope of what we can describe, but also in agreeing on common ways to describe it.  </p>
<p>Possible online forums for sharing emerging practice include this blog, the general <a href="http://groups.google.com/group/ands-partners">ANDS partners</a> mailing list and <a href="http://community.ands.org.au/">community bulletin board</a>, and the more detailed <a href="http://groups.google.com/group/ands-vitro">ANDS-VITRO google group</a>. Some of the issues, however, such as nutting out a process for maintaining the ANDS-VITRO ontology, probably require face-to-face discussion. Simon Porter from Melbourne University has suggested that the <a href="http://ccaeducause.caudit.edu.au/">CCA-Educause conference in Sydney in April</a> as a possible venue for another SWIG. Any takers?</p>
<hr />
<p>Unrelated aside: this post represents my first attempt at using <a href="http://en.wikipedia.org/wiki/E-prime">E-Prime</a> to improve the clarity of my writing.  I don&#8217;t think the experiment totally succeeded, but I certainly learnt a lot from the experience. </p>
<hr />
<p>Written by Nigel Ward. Copyright The University of Queensland, 2011. Licensed under Creative Commons Attribution-Share Alike 3.0 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/3.0/au/">http://creativecommons.org/licenses/by-sa/3.0/au/</a>&gt;. <a href="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/m40ca94ba1.png"><img class="alignleft size-full wp-image-79" src="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/m40ca94ba1.png" alt="" /></a></p>
<p>The project is supported by the <a href="http://www.ands.org.au/">Australian National Data Service</a> (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2011/03/in-search-of-linked-research-data-best-practices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Publishing Research Metadata Vocabularies</title>
		<link>http://www.ands-partners.org/blog/2011/02/publishing-research-metadata-vocabularies/</link>
		<comments>http://www.ands-partners.org/blog/2011/02/publishing-research-metadata-vocabularies/#comments</comments>
		<pubDate>Wed, 23 Feb 2011 23:55:04 +0000</pubDate>
		<dc:creator>Nigel Ward</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[ANZSRC]]></category>
		<category><![CDATA[ontology]]></category>
		<category><![CDATA[RDF]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[VITRO]]></category>
		<category><![CDATA[VIVO]]></category>
		<category><![CDATA[vocabulary]]></category>

		<guid isPermaLink="false">http://www.ands-partners.org/blog/?p=176</guid>
		<description><![CDATA[This is the second in a series of posts summarising discussion at the first ANDS Queensland SWIG (Semantic Web Interest Group) meeting, held on 17 February 2011 at QUT. My previous post catalogued the projects represented at the meeting. This &#8230; <a href="http://www.ands-partners.org/blog/2011/02/publishing-research-metadata-vocabularies/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This is the second in a series of posts summarising discussion at the first ANDS Queensland SWIG (Semantic Web Interest Group) meeting, held on 17 February 2011 at QUT.  My <a href="http://www.ands-partners.org/blog/2011/02/our-first-swig/">previous post</a> catalogued the projects represented at the meeting.  This post covers one of <em>the</em> major discussion point at the meeting: <strong>how to publish and re-use common vocabularies</strong>.</p>
<h2>Research data vocabularies created in ANDS projects</h2>
<p>The Australian ANDS community have developed ontology and vocabulary representations that are already being re-used beyond their original projects:</p>
<ul>
<li>the <a href="http://eresearch.griffith.edu.au/ANDS/vitro/ANDS-VITRO.owl">ANDS-VITRO</a> ontology developed by Griffith University, QUT and Melbourne University</li>
<li>RDF encodings of the ANZSRC Field of Research (<a href="http://purl.org/anzsrc/for">for</a>), Socio-Economic Objective (<a href="http://purl.org/anzsrc/seo">seo</a>) and Type of Activity (<a href="http://purl.org/anzsrc/toa">toa</a>) vocabularies developed by USQ, Griffith University and QUT</li>
</ul>
<p>Most of the projects represented at the SWIG are using terms from these vocabularies to describe their research data holdings.  Most of the folks present also raised issues about how these vocabularies could be referenced and accessed.  </p>
<h2>Vocabulary publishing principles</h2>
<p>To understand the issues, it is worth taking a step back and thinking about requirements for hosting a Linked Data vocabulary. As Duncan Dickinson pointed out previously <a href="http://www.ands-partners.org/blog/2010/08/on-the-subject-of-identifiers-3/#id3">on this blog</a>, vocabularies are most easily re-used if they are published according <a href="http://www.w3.org/DesignIssues/LinkedData.html">Tim Berners Lee&#8217;s Linked Data principles</a>:</p>
<blockquote><p>
1. Use URIs as names for things</p>
<p>2. Use HTTP URIs so that people can look up those names.</p>
<p>3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)</p>
<p>4. Include links to other URIs. so that they can discover more things.
</p></blockquote>
<p>I think it is also useful to consider a further principle from the W3C <a href="http://www.w3.org/TR/cooluris/">Cool URIs for the Semantic Web</a> document:</p>
<blockquote><p>
5. Stability.  Once you set up a URI to identify a certain resource, it should remain this way as long as possible
</p></blockquote>
<p>It turns out that the ANDS-VITRO and ANZSRC RDF vocabularies stack up well against principle 1, principle 2, and principle 4. In my opinion many of the issues raised at the SWIG pertain to support for principle 3 and principle 5.  Let&#8217;s go through them one by one at assess the situation &#8230;</p>
<h2>HTTP URIs and Linking</h2>
<p>All terms in the ANZSRC RDF vocabularies and the ANDS-VITRO ontology are assigned HTTP URIs (meeting principles 1 &amp; 2).  The definitions in the ANDS-VITRO ontology link extensively to other URIs (meeting principle 4). The ontology re-uses established vocabularies by linking to terms in <a href="http://dublincore.org/documents/dcmi-terms/">Dublin Core</a>, <a href="http://xmlns.com/foaf/spec/">FOAF</a>, <a href="http://bibliontology.com/specification">BIBO</a>, and the base <a href="http://vivoweb.org/ontology/core">VIVO ontology</a>. </p>
<p>The definitions in the ANZSRC RDF vocabularies have fewer URI linkages, but this makes sense given they are foundational and not built from other vocabularies.  While it is technically feasible to make links between vocabulary terms in ANZSRC and equivalents in other subject vocabularies (such as <a href="http://authorities.loc.gov/help/subj-auth.htm">Library of Congress Subject Headings</a>), this would require significant effort, and I&#8217;m not aware of this being raised as a requirement. So: ANZSRC meets principle 4 to the extent that it currently makes sense to do so.</p>
<h2>Useful information and Stable URIs: ANZSRC</h2>
<p>The hosting of the ANZSRC RDF vocabularies supports principle 3 (Useful information).  Dereferencing a URI from the ANZSRC FOR RDF vocabulary like <a href="http://purl.org/anzsrc/for#group_0501">http://purl.org/anzsrc/for#group_0501</a> takes you to information about the definition.  Additionally, the information you get back varies depending on how you ask for it (the web server hosting the vocabulary supports <a href="http://en.wikipedia.org/wiki/Content_negotiation">content negotiation</a>).  Clicking on the link in a web browser will return a HTML page with information on the vocabulary term.  Linked data browsers on the other hand can get various RDF encodings of the vocabulary.</p>
<p>The URIs in the ANZSRC RDF vocabularies have also been designed to address principle 5 (Stable URIs).  The vocabularies use <a href="http://purl.oclc.org/docs/index.html">PURL</a>s that redirect to URIs where the vocabularies are actually hosted.  Currently the vocabularies are <a href="http://namespace.adfi.usq.edu.au/anzsrc/for/">hosted</a> by <a href="http://www.usq.edu.au/adfi">ADFI at USQ</a>, but if they were moved in the future, the PURLs could simply be updated to redirect to the new location and folks could still refer to the vocabularies using the same PURLs.</p>
<p>The SWIG did discuss whether ADFI was the best place to host these vocabularies. Given ANZSRC is published by the Australian Bureau of Statistics, wouldn&#8217;t it make more sense for them to host a machine readable encoding?  The ANDS folks suggested that this issue might be addressed by their &#8220;Data Connections&#8221; projects (see <a href="http://www.ands-partners.org/blog/2011/02/our-first-swig/">previous blog post</a>).</p>
<p>The SWIG also discussed whether PURLs were the best URI option for identifying vocabularies because they can get in the way of content negotiation. For example, asking for an RDF+XML serialisation of <tt>http://purl.org/anzsrc/for#group_0501</tt> returns a HTTP 302 redirect with an HTML body, rather than RDF:</p>
<pre>
GET /anzsrc/for#group_0501 HTTP/1.1
Host: purl.org
Accept: application/rdf+xml
</pre>
<p>returns</p>
<pre>
HTTP/1.0 302 Moved Temporarily
Location: http://namespace.adfi.usq.edu.au/anzsrc/for/
Content-Type: text/html; charset=iso-8859-1
</pre>
<p>I suspect the community needs to discuss this issue further. In my experience, it is possible to ignore the returned HTML and still do content negotiation on the redirected URI (<tt>http://namespace.adfi.usq.edu.au/anzsrc/for/</tt>). Some clients can even do that automatically for you (e.g. <tt>curl -L</tt>), but your mileage may vary depending on the client you are using.  For this reason, some at the meeting thought it might be better to run our own PURL server that could do direct content negotiation. Once again, I think this needs further discussion. </p>
<h2>Useful information and Stable URIs: ANDS-VITRO</h2>
<p>The designers of the ANDS-VITRO ontology gave it a base URI of <a href="http://www.ands.org.au/ontologies/ns/0.1/VITRO-ANDS.owl#">http://www.ands.org.au/ontologies/ns/0.1/VITRO-ANDS.owl#</a> with the intent of getting ANDS to host the ontology for others to use.  Clicking on the above URI, however, reveals that ANDS have (so far) declined to host the ontology (it returns a 404 Not Found error). ANDS sees the ontology as an output of one of their projects, rather than an official ANDS output. They say that </p>
<blockquote><p>&#8220;for ANDS to host the ANDS Vitro ontology would be to both endorse it and imbue it with a degree of officialness with which we were not comfortable.&#8221;</p></blockquote>
<p>The SWIG resolved to ask ANDS to reconsider its position in light of how many folks are using the ontology as the basis for their ANDS work. For my sins, I&#8217;ve been asked to draft a letter to ANDS requesting clarification and reconsideration of their position.   I&#8217;ll keep this blog informed about progress.</p>
<p>To their credit, ANDS have suggested another possible way forward: they suggested that they could register a more neutral domain name such as <tt>vitro.org.au</tt> to host the ontology.  This solution requires that all of us who are currently using  <tt>http://www.ands.org.au/ontologies/ns/0.1/VITRO-ANDS.owl#</tt> in our metadata to update our URIs to something like <tt>http://vitro.org.au/ontologies/ns/0.1/VITRO-ANDS.owl#</tt> .  This change might still be feasible at this early stage, but a decision needs to be made quickly, before too many uses of <tt>http://www.ands.org.au/ontologies/ns/0.1/VITRO-ANDS.owl#</tt> are published on the (linked data) web.</p>
<h2>In the next installment &#8230;</h2>
<p>This post covered the major issue discussed at the SWIG: publishing our community vocabularies.  The SWIG also discussed other interesting semantic web topics including best practice for using the vocabularies, and publishing data (as opposed to just metadata) using semantic web technologies.  I&#8217;ll cover these topics in (yet another) blog post.</p>
<hr />
<p>Written by Nigel Ward. Copyright The University of Queensland, 2011. Licensed under Creative Commons Attribution-Share Alike 3.0 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/3.0/au/">http://creativecommons.org/licenses/by-sa/3.0/au/</a>&gt;. <a href="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/m40ca94ba1.png"><img class="alignleft size-full wp-image-79" src="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/m40ca94ba1.png" alt="" /></a></p>
<p>The project is supported by the <a href="http://www.ands.org.au/">Australian National Data Service</a> (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2011/02/publishing-research-metadata-vocabularies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Our first SWIG</title>
		<link>http://www.ands-partners.org/blog/2011/02/our-first-swig/</link>
		<comments>http://www.ands-partners.org/blog/2011/02/our-first-swig/#comments</comments>
		<pubDate>Tue, 22 Feb 2011 07:14:28 +0000</pubDate>
		<dc:creator>Nigel Ward</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Queensland]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[SWIG]]></category>

		<guid isPermaLink="false">http://www.ands-partners.org/blog/?p=155</guid>
		<description><![CDATA[An informal gathering of folks using semantic web approaches in their ANDS-funded projects met after the ANDS Queensland Community Day on 17 February 2011 at QUT. Dubbing ourselves the ANDS Queensland Semantic Web Interest Group (SWIG), we came together with &#8230; <a href="http://www.ands-partners.org/blog/2011/02/our-first-swig/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>An informal gathering of folks using semantic web approaches in their ANDS-funded projects met after the ANDS Queensland Community Day on 17 February 2011 at QUT. Dubbing ourselves the <strong>ANDS Queensland Semantic Web Interest Group (SWIG)</strong>, we came together with the aim of understanding what semantic web approaches are being used in ANDS projects, to uncover best practices and common issues, and to identify areas where we might collaborate.</p>
<p>It turns out there is a <em>lot</em> of semantic web activity in ANDS Queensland projects.  Even with just high-level summaries from those present we easily used up our allocated hour of discussion time.  This post summarises the projects that were discussed at the meeting. Subsequent posts will cover the major discussion points and potential areas for action.</p>
<blockquote><p>Updates:</p>
<ul>
<li>2011-02-23: clarified QUT data capture projects, QUT contributions to ANZSRC vocabularies and fixed link to R2RML.</li>
<li>2011-03-02: further refined description of QUT data capture project.</li>
</ul>
</blockquote>
<h2>A broad definition of &#8220;Semantic Web&#8221;</h2>
<p>The projects represented took a broad view of what &#8220;semantic web&#8221; meant in their projects. Some are interested in ensuring we all use the same URIs to <a href="http://www.ands-partners.org/blog/2010/08/on-the-subject-of-identifiers-3/">identify vocabulary terms in our metadata</a>, some in publishing data into the <a href="http://linkeddata.org/">Linked Data Cloud</a>, and some in building systems that uses <a href="http://www.w3.org/standards/semanticweb/ontology">ontologies</a> to represent their data model.</p>
<p>Interestingly, no-one at the meeting discussed using advanced semantic web technologies like <a href="http://www.w3.org/standards/semanticweb/inference">inferencing</a> or querying data via <a href="http://www.w3.org/standards/techs/sparql">SPARQL</a> endpoints.</p>
<h2>ANDS &#8220;In-house&#8221; Projects</h2>
<p>The ANDS folks at the meeting described &#8220;<a href="http://ands.org.au/dataconnections.pdf">Data Connections</a>&#8221; projects being pursued within ANDS. </p>
<p>One project involves creating &#8220;linked data endpoints&#8221; for information on <a href="http://www.nhmrc.gov.au/">NHMRC</a> and <a href="http://www.arc.gov.au/">ARC</a> grants.  <a href="http://services.ands.org.au/home/orca/rda/index.php">Research Data Australia</a> (RDA) already alows linkages between collection descriptions and descriptions of the activities (usually grants) that funded the work.  Most ANDS projects are currently writing their own activity descriptions for activities and uploading them to RDA alongside their collection descriptions. This means that there is potential for two descriptions of the same overarching activity to be uploaded a number of times and described slightly differently each time.  The project aims to reduce this duplicated effort by creating definitive URIs to identify NHMRC and ARC grants. This would allow collection records to unambiguously reference the (NHMRC and ARC) activities that funded them and for RDA users to navigate to definitive descriptions of the activities.</p>
<p>ANDS is working with <a href="http://www.ga.gov.au/">Geoscience Australia</a> to expose the <a href="https://www.ga.gov.au/products/servlet/controller?event=GEOCAT_DETAILS&amp;catno=65589">Gazetteer of Australia</a> as Linked Data. The gazetter provides information on the location and spelling of Australian geographical names. It is already <a href="http://www.ga.gov.au/map/names/">searchable online</a>, but the underlying data isn&#8217;t exposed in a machine-readable format. The project is investigating exposing the data in a machine readable format and linking it with the international geonames ontology. This would allow RDA collection records to unambiguously identify Australian place names and to link collection descriptions with more information about their geographic location.</p>
<p>ANDS also reported that a interest group has been established to examine requirements and use cases for developing Vocabulary Services for e-Research, involving the <a href="http://www.abs.gov.au/">ABS</a>, <a href="http://www.csiro.au/">CSIRO</a>, <a href="http://www.ga.gov.au/">Geoscience Australia</a>, and other bodies; this follows a workshop held at Geoscience Australia in November 2010. A number of folks at the meeting requested more information about the project and expressed a desire to contribute.</p>
<p><a href="mailto:monica.omodei@ands.org.au">Monica Omodei</a> can be contacted for more information about these projects. The <a href="http://ands.org.au/newsletters/newsletter-2011-01.pdf">January 2011 edition</a> of the ANDS newsletter also has an article on the Gazetter project.</p>
<h2>Tropical Data Hub</h2>
<p><a href="http://www.jcu.edu.au/">James Cook University</a> reported that they are keen to implement a semantic layer within their <a href="http://eresearch.jcu.edu.au/projects/tropical-futures-a-tropical-data-hub">tropical data hub</a> project. They are interested in pursuing a linked data approach to exposing their data and linking it with other data and facilities.  They are also keen to create a portal with a semantic layer for intelligent searches. JCU are just embarking on this project and are still shopping for technology that meets their requirements.</p>
<h2>University of Southern Queensland / Newcastle University</h2>
<p>As has been previously described on <a href="http://www.ands-partners.org/blog/2010/10/but-what-format-are-you-storing-in-that-thing/#id2">this</a> and <a href="http://cairss.caul.edu.au/blog/2010/08/05/i-only-have-uris-for-you-vicki-and-peters-adventures-in-linked-data-land/">other</a> blogs, the USQ / Newcastle RedBox metadata store project is keen to play with the linked data world. The project team reported that one of their key barriers has been the lack of definitive URIs for referencing things in the research space like ANZSRC subject terms, parties, activities or geographic places. While the RedBox team is aware that ANDS projects like those described above are addressing this issue, they couldn&#8217;t wait and so have developed some interim solutions:</p>
<ul>
<li>The Mint is local infrastructure for identify and describing parties at Newcastle. It pulls information from various authorities, allows for data merging / de-duplication and publishes party identifiers as URIs.</li>
<li>They are storing a local (cut down?) copy of the geonames ontology to avoid load time issues they found with using the complete dataset.</li>
<li>USQ created URIs for every term in the <a href="http://www.abs.gov.au/ausstats/abs@.nsf/0/4AE1B46AE2048A28CA25741800044242">ABS ANZSRC</a> Field of Research (<a href="http://purl.org/anzsrc/for">for</a>), Socio-Economic Objective (<a href="http://purl.org/anzsrc/seo">seo</a>) and Type of Activity (<a href="http://purl.org/anzsrc/toa">toa</a>) vocabularies. This allows them to unambiguously reference subject terms from the Linked Data collection descriptions. These URIs currently de-reference to either a human readable HTML page or a machine-interpretable <a href="http://www.w3.org/2004/02/skos/">SKOS</a> representation of the vocabulary. </li>
</ul>
<p>USQ also mentioned their work examining how to link datasets to papers in a package, as part of the <a href="http://ptsefton.com/2011/02/08/beyond-the-pdf-workshop-trip-report.htm">Beyond the PDF</a> initiative. </p>
<h2>University of Queensland</h2>
<p>Like USQ, the <a href="http://uq.edu.au">University of Queensland</a> metadata store is being designed to play well in the Linked Data world. It will re-use URIs for existing entities (where they exist), provide URIs for the new entities it describes, and will be able to expose its data using an RDF serialisation.</p>
<p>Many of the other nationally funded projects within the <a href="http://itee.uq.edu.au/~eresearch/">UQ eResearch Lab</a> have employed semantic web approaches:</p>
<ul>
<li>the <a href="http://itee.uq.edu.au/~eresearch/projects/ands/W4SemanticTagging-report-2011-02.pdf">Scoping Study for W4 Semantic Tagging Service</a> report investigates the availability of suitable technologies for providing a Who/What/Where/When (W4) semantic tagging service.</li>
<li><a href="http://itee.uq.edu.au/~eresearch/projects/aus-e-lit/">Aus-e-lit</a> has implemented a compound object authoring and publishing service based on the OAI-ORE semantic web model. The service is useful for modelling provenance of literary resources and for online learning objects for teaching and research</li>
<li>early work on the <a href="http://itee.uq.edu.au/~eresearch/projects/ands/">Health-e-reef</a> project used an ontology-driven approach to integrating data from heterogeneous datasets about the Great Barrier Reef</li>
<li>the <a href="http://itee.uq.edu.au/~eresearch/projects/podd/index.html">Phenomics Ontology Driven Data Management Project</a> (PODD) is developing semantic web data management solutions to meet the needs of researchers working at the Australian Plant Phenomics Facility (APPF) and the Australian Phenomics Network (APN)</li>
</ul>
<p>Although they weren&#8217;t present at the SWIG, the <a href="http://www.qfab.org/">Queensland Facility for Advanced Bioinformatics</a> team at UQ was present at the ANDS community day and reported on some relevant work.  Their team is using ontologies to analyse data in the <a href="http://www.emblaustralia.org/Facilities/EBI_Mirror.aspx">Australian mirror</a> of European Bioinformatics Institute molecular biology database with the aim populating RDA with <a href="http://www.qfab.org/case-studies/qfab-services/research-computing/linking-the-embl-australia-ebi-mirror-with-the-australian-research-data-commons">collection descriptions of EBI data</a> that relevant to Australian species.</p>
<h2>Queensland University of Technology</h2>
<p><a href="http://www.qut.edu.au/">QUT</a> worked with <a href="http://www.griffith.edu.au/">Griffith University</a> on the Metadata Exchange Hub project that adapted  the <a href="http://vivo.sourceforge.net/">VIVO open source semantic web application</a> to collect metadata from multiple university sources and publish it to Research Data Australia (see Griffith report below).</p>
<p>QUT are also working on an ANDS data capture project in the bio-diversity domain that will help to create mappings between relational schemas and ontologies. The QUT team reported that they won&#8217;t be developing a sophisticated ontology: the end goal is generating ANDS compliant collection descriptions, and the research group hasn&#8217;t expressed a need for modelling complex data. The project will be storing the underlying data in a relational database and will develop a relational to RDF transformation using a technology like <a href="http://www4.wiwiss.fu-berlin.de/bizer/d2rq/">D2RQ</a> or <a href="http://www.w3.org/TR/r2rml/">R2RML</a>.</p>
<h2>Griffith University</h2>
<p><a href="http://www.griffith.edu.au/">Griffith University</a> collaborated with QUT on adapting the VIVO semantic web application to publish data to Research Data Australia.  The Griffith team reported that VIVO has proven to be a good choice of technology, and will also be used for other in-house applications. </p>
<p>A major concrete output of this work is the ANDS-VITRO ontology: an extension to the <a href="http://vivoweb.org/ontology/core">VIVO ontology</a> originally developed by the VIVO project.  The project team collaborated with Simon Porter from Melbourne University to extend the VITRO ontology with concepts and properties needed to syndicate information to Research Data Australia. Although the ontology URI isn&#8217;t currently resolvable online, it can be downloaded <a href="http://eresearch.griffith.edu.au/ANDS/vitro/ANDS-VITRO.owl">directly</a> from Griffith University.</p>
<p>As part of Metadata Exchange Hub project, Griffith and QUT developed <a href="http://www.w3.org/2004/OWL/">OWL</a> representations of the ANZSRC vocabularies, building on the USQ SKOS representations.  They deliberately re-used the vocabulary term URIs defined by USQ to indicate that the two vocabulary representations described the same concepts. The group discussed that in the future it might be sensible for the URIs assigned to de-reference to this new OWL+SKOS representation.</p>
<h2>In the next installment &#8230;</h2>
<p>Hopefully this summary gives an impression of the <em>breadth</em> of semantic web activity in ANDS funded projects.</p>
<p>There is also a substantial amount of <em>depth</em> to some of these projects.  Many of the SWIG participants commented that the ANDS-VITRO ontology and the SKOS/OWL representations of the ANZSRC vocabularies are fundamental pieces of work that are already being re-used in the community. There was a <em>lot</em> of discussion about the need to host and maintain this work, as well as share emerging best practices.  These discussions will be the subject of my next blog post reporting on our first SWIG. </p>
<hr />
<p>Written by Nigel Ward. Copyright The University of Queensland, 2011. Licensed under Creative Commons Attribution-Share Alike 3.0 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/3.0/au/">http://creativecommons.org/licenses/by-sa/3.0/au/</a>&gt;. <a href="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/m40ca94ba1.png"><img class="alignleft size-full wp-image-79" src="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/m40ca94ba1.png" alt="" /></a></p>
<p>The project is supported by the <a href="http://www.ands.org.au/">Australian National Data Service</a> (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2011/02/our-first-swig/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Don&apos;t let your research data collection become a statistic</title>
		<link>http://www.ands-partners.org/blog/2011/02/dont-let-your-research-data-collection-become-a-statistic/</link>
		<comments>http://www.ands-partners.org/blog/2011/02/dont-let-your-research-data-collection-become-a-statistic/#comments</comments>
		<pubDate>Fri, 11 Feb 2011 05:32:10 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[ReDBox (EIF-040)]]></category>

		<guid isPermaLink="false">http://www.ands-partners.org/blog/2011/02/dont-let-your-research-data-collection-become-a-statistic/</guid>
		<description><![CDATA[PDF version This is a short post on what we&#8217;re doing with repository statistics in the ReDBox application. It has to be short as we are not doing much, statistics are not on the list of major features in this &#8230; <a href="http://www.ands-partners.org/blog/2011/02/dont-let-your-research-data-collection-become-a-statistic/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="rendition-links"> <span class="pdf-rendition-link"><a href="http://www.ands-partners.org/blog/wp-content/uploads/2011/02/redbox-stats.pdf.pdf" title="View the printable version of this page">PDF version</a></span></div>
<div class="body">
<div>
<p class="meta-abstract">This is a short post on what we&#8217;re doing with repository statistics in the ReDBox application. It has to be short as we are not doing much, statistics are not on the list of major features in this initial release.</p>
<p>The approach we are taking at the moment is to use either or both of the following:</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p>Standard web server statistics, from the Apache web server with a package like <a href="http://awstats.sourceforge.net/">AWStats</a></p>
</li>
<li>
<p>Remote stats services like <a href="https://www.google.com/analytics/settings/home">Google Analytics</a> or <a href="http://w3counter.com/">W3Counter</a>, which work via JavaScript code embedded in pages on the repository site, or any other website for that matter.</p>
</li>
</ol>
<p>Both these approaches require human interpretation, or some kind of processing or log files and reports as they do not take account of the structure and function of the repository <span class="spCh spChx2013">&#226;&#8364;&#8220;</span> they will not distinguish between the various kinds of pages and views in the site, but they are all that resources will permit us to use in the first release of the software.</p>
<p>We looked for a way to provide useful repository-specific statistics without a lot of development. Oliver Lucido read up on the <a href="http://www.cranfieldlibrary.cranfield.ac.uk/pirus2/tiki-index.php">PIRUS2 project</a> which looked at repository statistics that could be collected and reported across repositories, but he found that it is concerned with documents, chiefly journal articles, and not suitable for tracking use of research data collections, which is what we&#8217;re dealing with here.</p>
<p>Looking to the future we note the following:</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p><a href="http://fasciator.usq.edu.au/">The Fascinator</a>. which provides the building blocks for ReDBox, has a event logging system, so a future project will be able to add reporting on</p>
</li>
<li>
<p>We would like to work with ANDS on a way to start collecting usage statistics, in research data registries/repositories so a page view of the metadata about a data set in a university repository and one on Research Data Australia for the same thing could be aggregated in one spot; like PIRUS project for research data.</p>
</li>
</ol>
<p class="center">Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/" onclick="javascript:window.top.open(&quot;http://creativecommons.org/licenses/by-sa/2.5/au/&quot;);return false;">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt;</p>
<p class="center"><span class="Default_20_Paragraph_20_Font"><span style="country:US; language:en;"><span class="T1"><a name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><!-- --></a><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://www.ands-partners.org/blog/wp-content/uploads/2011/02/redbox-stats_filesm40ca94ba.png.png" style="border:0px; vertical-align: top" width="88"></span></span></span></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/" onclick="javascript:window.top.open(&quot;http://ice.usq.edu.au/&quot;);return false;">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm" onclick="javascript:window.top.open(&quot;http://fascinator.usq.edu.au/desktop/desktop.htm&quot;);return false;">The Fascinator</a>.</p>
</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2011/02/dont-let-your-research-data-collection-become-a-statistic/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>ADFI Migration</title>
		<link>http://www.ands-partners.org/blog/2011/01/adfi-migration/</link>
		<comments>http://www.ands-partners.org/blog/2011/01/adfi-migration/#comments</comments>
		<pubDate>Mon, 31 Jan 2011 05:26:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.ands-partners.org/blog/?p=144</guid>
		<description><![CDATA[All blogs hosted by ADFI (including this one) will not be available on wednesday, 2nd of february, as they are being moved to a new server. Admin]]></description>
			<content:encoded><![CDATA[<p>All blogs hosted by ADFI (including this one) will not be available on wednesday, 2nd of february, as they are being moved to a new server.</p>
<p>Admin</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2011/01/adfi-migration/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ANDS Project Plan: EIF040</title>
		<link>http://www.ands-partners.org/blog/2011/01/ands-project-plan-eif040/</link>
		<comments>http://www.ands-partners.org/blog/2011/01/ands-project-plan-eif040/#comments</comments>
		<pubDate>Tue, 04 Jan 2011 02:33:49 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[ReDBox (EIF-040)]]></category>

		<guid isPermaLink="false">http://www.ands-partners.org/blog/2011/01/ands-project-plan-eif040/</guid>
		<description><![CDATA[1.1 Introduction 1.1.1 Project Purpose 1.1.2 Definition of Terms 1.1.3 Methodology 1.1.4 Deliverables 1.2 Project Tasks 1.3 Project Plan 1.4 Project Budget 1.4.1 Travel 1.5 Risk Factors 1.6 Roles and Responsibilities 1.7 Ongoing Development and Use of Deliverables 1.8 Reporting &#8230; <a href="http://www.ands-partners.org/blog/2011/01/ands-project-plan-eif040/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc">
<ul>
<li><a href="#id2"><span>1.1 Introduction</span></a>
<ul>
<li><a href="#id3"><span>1.1.1 Project Purpose</span></a></li>
<li><a href="#id5"><span>1.1.2 Definition of Terms</span></a></li>
<li><a href="#id7"><span>1.1.3 Methodology</span></a></li>
<li><a href="#id11"><span>1.1.4 Deliverables </span></a></li>
</ul>
</li>
<li><a href="#id18"><span>1.2 Project Tasks </span></a></li>
<li><a href="#id19"><span>1.3 Project Plan</span></a></li>
<li><a href="#id20"><span>1.4 Project Budget</span></a>
<ul>
<li><a href="#id21"><span>1.4.1 Travel</span></a></li>
</ul>
</li>
<li><a href="#id22"><span>1.5 Risk Factors</span></a></li>
<li><a href="#id23"><span>1.6 Roles and Responsibilities</span></a></li>
<li><a href="#id24"><span>1.7 Ongoing Development and Use of Deliverables</span></a></li>
<li><a href="#id25"><span>1.8 Reporting Requirements</span></a></li>
<li><a href="#id27"><span>1.9 Declaration </span></a></li>
</ul>
</div>
<div>
<p>This is a edited and updated version of the approved and amended project plan for ANDS project EIF-040. Financial details have been removed and some extra comments have been added in square brackets.</p>
<div class="Table1" style="width: 100%; margin: 0px; padding: 0px; text-align: left;">
<table class="Table1" style="border-spacing: 0; empty-cells: show; width: 14.894cm; border-collapse: collapse;">
<colgroup>
<col style="width: 5.715cm;"></col>
<col style="width: 9.179cm;"></col>
</colgroup>
<tbody>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P24">Project No.</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P17">EIF-040</p>
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P24">Project Title</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P16">Ingect: ANDS Metadata Store for VITAL/Fedora</p>
<p>[Now known as <span class="spCh spChx201c">“</span>ReDBox<span class="spCh spChx201d">”</span>]</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P23"><strong>Organisation responsible for the project</strong><strong><span> </span></strong></p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P16">University of Southern Queensland</p>
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P24">Organisation(s) that will undertake the work</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P16">University of Southern Queensland</p>
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P19">ABN and/or ACN</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P21">
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P18">Name of  Contact Person</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P16">Peter Sefton</p>
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P18">Contact details of Contact Person</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P14">West St Toowoomba, Queensland 4350</p>
<p>+61746311640</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P19">Funding required</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P21">
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P18">Proposed start date</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P16">01/04/10</p>
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P19">Expected project timeframe</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P21">9 months</p>
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P19">Name of the person responsible for contract administration</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P21">Peter Sefton</p>
<p class="P20">sefton@usq.edu.au</p>
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P19">Name of the signatory for the contract</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P21">
</td>
</tr>
<tr>
<td class="Table1_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P18">Names and affiliations of any collaborators</p>
</td>
<td class="Table1_B1" style="vertical-align: top; border: 1.0px solid #000000; padding-bottom: 1.0px; padding-left: 0.191cm; padding-right: 0.191cm; padding-top: 1.0px;">
<p class="P14">Vicki Picasso <span class="spCh spChx2013">–</span> University of Newcastle</p>
<p>vivki.picasso@newcastle.edu.au</p>
<p class="P20">
</td>
</tr>
</tbody>
</table>
</div>
<p class="P22">
<p class="P22">
<h1><a name="id2"></a>1.1 <a name="__RefHeading__5615_2107375272"></a>Introduction</h1>
<p>This project proposal is a deliverable of the project EIF-040 The working title Ingect is a mashup of Ingest (which is what the application does) and Inject <span class="spCh spChx2013">–</span> which is what we intend to do with name identities.</p>
<h2><a name="id3"></a>1.1.1 <a name="__RefHeading__5617_2107375272"></a>Project Purpose</h2>
<p><strong>This project aims</strong> to produce some software components, documentation and exemplar configuration files to allow metadata for data collections to be captured, managed and stored in an ARROW/VITAL repository in a way that is compatible with Research Data Australia.</p>
<p><strong>The software to be delivered</strong> will be an add-on for the VTLS Vital + Fedora Commons + Apache Solr combination used at ARROW sites. It will:</p>
<ul class="lib">
<li>Allow data to be ingested into a VITAL + Fedora repository in variety of ways.
<ul class="lib">
<li>Via web-forms.</li>
<li>Email triggers from grants database (InfoEd).</li>
<li>Via harvesting an index of the Newcastle Research Storage Facility,</li>
</ul>
</li>
<li>Expose an OAI-PMH feed of RIF-CS metadata to Research Data Australia.</li>
</ul>
<p><strong>Exemplar configuration</strong> will be delivered based on the requirements for data description at the University of Newcastle and possibly Swinburne if the project is extended.</p>
<p>Documentation will cover the entire process of installing the application alongside VITAL, configuring all aspects of the system and will give advice on information architecture based on experience we will gain by piloting the system at Newcastle.</p>
<p>Not in scope:</p>
<ul class="lib">
<li>Design of metadata schemas <span class="spCh spChx2013">–</span> these are to be supplied by ANDS/Newcastle.</li>
<li>Installation at Swinburne or other institutions. ANDS will work with their stakeholders to determine if there are other potential consortium members.</li>
</ul>
<h2><a name="id5"></a>1.1.2 <a name="__RefHeading__5619_2107375272"></a>Definition of Terms</h2>
<dl>
<dt>ADFI</dt>
<dd>Australian Digital Futures Institute</dd>
<dt>ANDS</dt>
<dd>Australian National Data Service</dd>
<dt>ARO Australia Research Online</dt>
<dd>A search service run by the National Library of Australia which harvests Dublin Core metadata from repositories using the OAI-PMH protocol.</dd>
<dt>ARROW</dt>
<dd>Australian Repositories Online to the World</dd>
<dt>Squire</dt>
<dd>Squire is an open-source Squire repository ingest application designed to work with VTLS VITAL which is copyright Monash University, and released under the Mozilla Public License. It was designed to replace the VALET system which was produced by VTLS using ARROW funding.</dd>
<dt><a href="http://code.google.com/p/valet/"><span class="Internet_20_link"><span class="T2">VALET</span></span></a></dt>
<dd>VALET is an open source repository ingest application which is copyright by the US corporation VTLS and released under the Mozilla Public License.</dd>
<dt>VTLS VITAL</dt>
<dd>Vital is a proprietary repository solution that works with the open source Fedora Commons repository back-end.</dd>
<dd>
<dl>
<dt>Proprietary components</dt>
<dd>Web portal, Fedora Indexer, object management interface, security model.</dd>
<dt>Open Source components</dt>
<dd>Fedora, Apache Solr text index OAI-PMH harvesters, VALET ingest</dd>
</dl>
</dd>
<dd></dd>
</dl>
<h2><a name="id7"></a>1.1.3 <a name="__RefHeading__5621_2107375272"></a>Methodology</h2>
<p>The system will be developed using ADFI&#8217;s agile software development methodology. This uses a development approach that allows the stakeholders to adjust their expectations and change the product as they go by adjusting the relative priorities of particular features. Notwithstanding this generic flexible approach, the intention at the time of contract signing is to deliver the functionality listed in section 1.4.2 below.</p>
<dl>
<dt>Development cycles</dt>
</dl>
<ul class="lib">
<li>ADFI development is undertaken in weekly cycles.</li>
<li>Stakeholders meet with the development team and our project coordination staff weekly, on a Monday to:
<ul class="lib">
<li>Review progress.</li>
<li>Choose tasks for the forthcoming week, based on a consensus about the most important next-step towards meeting the project&#8217;s aim.
<p>Tasks at this stage are no more than an estimated half a day&#8217;s work (circa 4 hours).</p>
<p>Customers have a choice between:</p>
<ol class="li-upper-roman" style="list-style: upper-roman;">
<li>Prioritizing tasks to be completed each cycle (simplest and cheapest approach which is our current recommendation).</li>
<li>Asking developers for a more detailed time-estimation than <span class="spCh spChx201c">“</span>it&#8217;s less than half a day<span class="spCh spChx201d">”</span> then &#8216;filling up&#8217; the time available with tasks that have been estimated (our legacy approach based on the XP methodology).</li>
</ol>
</li>
</ul>
</li>
</ul>
<h2><a name="id11"></a>1.1.4 <a name="__RefHeading__5623_2107375272"></a>Deliverables<span class="T6"> </span></h2>
<p>The overall architecture of the solution for the University of Newcastle is shown in the following diagram</p>
<p><a name="graphics1"></a><img class="fr1" style="border: 0px; vertical-align: top;" src="http://www.ands-partners.org/blog/wp-content/uploads/2011/01/7f884adc_640x443.jpeg" alt="graphics1" width="640" height="443" /></p>
<p>The diagram is colour coded <span class="spCh spChx2013">–</span> with the new components to be developed shown in red.</p>
<p>[NicNamesPlus is now known as The Mint Linked Authority Control Service]</p>
<h3><a name="id12"></a>1.1.4.1Assumptions</h3>
<ol class="lin" style="list-style: decimal;">
<li>Ingect will use <strong>Fedora 3 as an internal storage component <span class="spCh spChx2013">–</span></strong> with data synchronised to VITAL as needed. This meets the requirement that VITAL is functioning as the main repository, but with minimal extra load. Fedora is an obvious choice for ARROW/VITAL sites. We are choosing to work with the latest version for the Ingect component, but it will need to synchronise with Fedora 2 which underpins the VITAL product. This approach also means that with appropriate configuration Ingect should be able to work with any Fedora based repository, either by sharing a repository, or by installing side-by side as with VITAL.</li>
<li>The application will be<strong> developed in Java,</strong> building on The Fascinator platform which was originally sponsored by the ARROW project in 2008 and which has been under development at USQ since then. Benefits include:
<ul class="lib">
<li>Being Java it can sit in the same Tomcat web-server as Fedora and the Apache Solr indexer used by most Fedora repositories these days.</li>
<li>The ingest component USQ is developing for The Fascinator, while incomplete, meets the requirement that the system be configurable in a similar way to VALET <span class="spCh spChx2013">–</span> where extending the forms and integrating them with external systems like CrossRef is trivially easy.</li>
<li>It&#8217;s highly modular and so can be used, for example, without a portal, a role which in this case VITAL will take on. One of the most important modules will be plugin harvest technology to pick up content that&#8217;s on the storage system and provide a view to researchers and data librarians to begin describing it. The Fascinator has as extensible system of plugins and we already have file-indexers and a framework for extracting metadata from files, which can be extended to work with new kinds of research data as they appear.</li>
<li>Our developers know the system, meaning we can be up and running with this application very quickly.</li>
</ul>
<p>In our work on software and with CAIRSS, ADFI Staff are aware of other Fedora software components, but none that meet all of the above criteria. The closest would probably be Muradora <span class="spCh spChx2013">–</span> if there are ANDS contributors using that platform, it could be considered as part of the USQ metadata stores contract.</li>
<li>While there was some discussion about using VALET or VITAL as the foundation for the Ingect software early in the project, neither of these systems has an architecture which can work on a university-wide scale if all data sets are to be described. They also lack the ability to harvest metadata about files on the research storage service or interface with other institutional services.</li>
</ol>
<h3><a name="id14"></a>1.1.4.2Components</h3>
<ol class="lin" style="list-style: decimal;">
<li>The major component will be a <strong>repository ingest tool</strong> that will be configurable as follows:
<ol class="li-lower-roman" style="list-style: lower-roman;">
<li>An <strong>alerting service</strong> will interface with a variety of institutional data sources and show a &#8216;discovery&#8217; view of events and items of interest to both researchers and data curators.
<ol class="li-lower-alpha" style="list-style: lower-alpha;">
<li>Events from the grants database, principally project completions will be emailed to the system, and data extracted into workflow so that a data librarian can chase up details of the project and do triage to work out if there is data that can be captured.</li>
<li>The area of the Research Storage Utility (which is a high-end file system) set aside for RDA data will be harvested by the ingest software so that whatever metadata is available in the data collection files themselves will be able to be searched and browsed from the ingest tool.</li>
</ol>
</li>
<li>A <strong>direct deposit service </strong>will allow researchers or data librarians to initiate deposit of collection metadata via two mechanisms:
<ol class="li-lower-alpha" style="list-style: lower-alpha;">
<li>Via a form, similar to the ones used in VALET. These forms will be able to be designed and installed by library and research office staff without having to involve central IT services. This form will be able to describe data on the Research Storage Utility or elsewhere.</li>
<li>Via the &#8216;discovery&#8217; service, by searching or browsing to the collection on the university&#8217;s research storage service and tagging it for inclusion in RDA, which will trigger creation of the metadata form mentioned above.</li>
</ol>
</li>
<li>A <strong>staging service</strong> so that decisions can be made about collection records, allowing for records to be checked/moderated before being made available in the repository. This service will allow an arbitrary number of workflow stages configured with a simple configuration file so that items can be checked as appropriate before deposit to the repository for publication to RDA. This staging will be modeled as far as possible on the simple approach taken by VALET with extensions to deal with a wider range of data as described above.</li>
<li><strong>Configurable Mapping</strong> from ingest data forms to the following as required:
<ol class="li-lower-alpha" style="list-style: lower-alpha;">
<li>RIF-CS.</li>
<li>Dublin Core. (This is a requirement for VITAL Compatibility).</li>
<li>A storage format to be determined in consultation between Newcastle, ANDS and the broader community. (The project&#8217;s principles suggest that as far as possible this should be based on a schema that draws from established ontologies like the approach taken at Melbourne university).</li>
</ol>
</li>
</ol>
</li>
<li>An <strong>OAI-PMH</strong> service will be provided that can supply RIF-CS metadata to the RDA. There are a number of potential OAI-PMH toolkits that could be used, the project will choose/extend one and document how to configure it.</li>
<li><strong>Configuration for VITAL</strong> in the form of page view templates and index configuration so that data collection records can be displayed.</li>
<li><strong>The technical component of a &#8216;toolkit&#8217;</strong> to be produced jointly with the UoN, including how to install and configure the software components created by and used by this project. The UoN will provide the user-oriented and outreach components of the toolkit.</li>
<li>A linked data authority service (formerly described as <span class="spCh spChx2018">‘</span>NicNames plus<span class="spCh spChx2019">’</span>) running as a discrete module that can be deployed separately..
<ul class="lib">
<li>Provision of name authority services that map party names in a linked data manner (like a locally managed version of People Australia).
<ul class="lib">
<li>APIs in the metadata registry forms interface for talking to authority services using a lightweight protocol such as JSON.</li>
<li>Modifications to existing packaging mechanics to hold authority relationships (ie. bundling several name variants together).</li>
</ul>
</li>
<li>Provision of linked data services for metadata terms such as resource types and subject headings.</li>
<li>Batch scripts for processing existing client metadata (both from repositories and other client, corporate systems) and create links.</li>
<li>Documentation of systems configuration/administration, to serve as examples for deployments at other sites.</li>
<li>Provision of this data back into client systems and making available to external parties (again, such as People Australia) as EAC/RIF-CS records over OAI-PMH.</li>
</ul>
</li>
<li><strong>Identity interfaces</strong> <span class="spCh spChx2013">–</span> allowing integration of ingest workflows with:
<ul class="lib">
<li>PIP / NicNames for names.</li>
</ul>
</li>
<li>Handles for persistent identification.<strong>Batch editing</strong>.</li>
<li><span class="T2">More explanation of the requirements and deliverables can be </span><a href="http://ptsefton.com/2010/03/04/more-details-on-a-metdata-store-for-data-inalongside-vital.htm"><span class="Internet_20_link"><span class="T2">found on this blog post</span></span></a><span class="T2">.</span></li>
</ol>
<h3><a name="id16"></a>1.1.4.3Deployment</h3>
<p>The Fascinator can work with Fedora (which is part of this project) or with a simple file-based storage layer. The solution will be designed to be able to be deployed in different ways for different use-cases.</p>
<ul class="lib">
<li>As a WAR file that can be dropped in to a Tomcat instance alongside VITAL, using a file-system storage layer.</li>
<li>As a single ready-to run JAR file with an embedded webserver, for testing and light-weight deployments.</li>
<li>Using Maven for fine-control over deployment.</li>
</ul>
<h1><a name="id18"></a>1.2 <a name="__RefHeading__5625_2107375272"></a>Project Tasks</h1>
<p><span class="T2">Project tasks reside on the </span><a href="https://204.236.227.98/projects/metadatastore/trac/wiki/SquireProject"><span class="Internet_20_link"><span class="T2">project wiki</span></span></a><span class="T2">. We will paste a copy of the agreed tasks into this document when agreement is reached; after than the task-list will evolve as per the ADFI development process.</span></p>
<h1><a name="id19"></a>1.3 <a name="__RefHeading__5627_2107375272"></a>Project Plan</h1>
<p>The plan is to use the software development methodology described above, in a series of one-week development cycles to carry out the tasks listed above.</p>
<div class="Table2" style="width: 100%; margin: 0px; padding: 0px; text-align: left;">
<table class="Table2" style="border-spacing: 0; empty-cells: show; margin-left: -0.053cm; width: 16.311cm; border-collapse: collapse; border: 1.0px solid #000000;">
<colgroup>
<col style="width: 3.048cm;"></col>
<col style="width: 3.251cm;"></col>
<col style="width: 4.001cm;"></col>
<col style="width: 2.76cm;"></col>
</colgroup>
<tbody>
<tr>
<td class="Table2_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding: 0.097cm;">
<p class="P25">Milestone</p>
</td>
<td class="Table2_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding: 0.097cm;">
<p class="P25">Actions</p>
</td>
<td class="Table2_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding: 0.097cm;">
<p class="P25">Cost</p>
</td>
<td class="Table2_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding: 0.097cm;">
<p class="P25">Performance measures</p>
</td>
<td class="Table2_E1" style="vertical-align: top; border: 1.0px solid #000000; padding: 0.097cm;">
<p class="P25">Expected completion</p>
</td>
</tr>
<tr>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P26">1 August, 2010 &#8212;&#8211;2 November, 2010</p>
</td>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P28">Establish basic Ingect application</p>
</td>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P28">[Removed]</p>
</td>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P28">Sign off from ANDS rep that code has been released.</p>
</td>
<td class="Table2_E2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">2010-11-02</p>
<p class="P27">
</td>
</tr>
<tr>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P26">3 November, 2010 &#8212;&#8212;- 15 February, 2011</p>
</td>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P28">Install test configuration at Newcastle</p>
</td>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P28">[Removed]</p>
</td>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P28">Sign off from Newcastle contact that application is running and configured.</p>
</td>
<td class="Table2_E2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">2011-02-15</p>
<p class="P27">
</td>
</tr>
<tr>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P26">16 February, 2011 &#8212;&#8212; 14 May, 2011</p>
</td>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P28">
</td>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P28">[Removed]</p>
</td>
<td class="Table2_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P28">Sign off from Newcastle and ANDS that application is installed and running</p>
</td>
<td class="Table2_E2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">2011-05-14</p>
</td>
</tr>
</tbody>
</table>
</div>
<p>Note: The last milestone includes an offset completion date to take into account institutional closures over the Christmas period.</p>
<h1><a name="id20"></a>1.4 <a name="__RefHeading__5629_2107375272"></a>Project Budget</h1>
<p>The project budget is based on the estimates on the tasks page</p>
<p>Human resources</p>
<p>[Removed]</p>
<h2><a name="id21"></a>1.4.1 <a name="__RefHeading__5631_2107375272"></a>Travel</h2>
<p>Travel as required and approved by ANDS &#8211; expected to be one trip per quarter for two ADFI staff to visit Newcastle. Travel to be booked and paid by USQ, and invoiced to ANDS, this is to simplify the payment of out of pocket expenses.</p>
<h1><a name="id22"></a>1.5 <a name="__RefHeading__5633_2107375272"></a>Risk Factors</h1>
<div class="Table3" style="width: 100%; margin: 0px; padding: 0px; text-align: left;">
<table class="Table3" style="border-spacing: 0; empty-cells: show; width: 16.013cm; border-collapse: collapse; border: 1.0px solid #000000;">
<colgroup>
<col style="width: 7.752cm;"></col>
<col style="width: 8.26cm;"></col>
</colgroup>
<tbody>
<tr>
<td class="Table3_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding: 0.097cm;">
<p class="P25">Risk</p>
</td>
<td class="Table3_B1" style="vertical-align: top; border: 1.0px solid #000000; padding: 0.097cm;">
<p class="P25">Mitigation Plan</p>
</td>
</tr>
<tr>
<td class="Table3_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P15">The software we develop here might end up being only used at one or a handful of institutions, which would then bear the maintenance load.</p>
<p class="P13">
</td>
<td class="Table3_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<ul class="lib">
<li>Work to promote this solution to other ARROW sites to build the installed base.</li>
<li>Release all code as open source with tested documentation.</li>
<li>Use components of this solution as part of the standalone solution we have also been asked to look at, broadening the installed base for the Ingect application.</li>
<li>Consider funding a program to port VALET workflows to the new system for ARROW sites to build a sustainable community.</li>
<li>Work with the University of Melbourne and collaborators to see if some of the component developed for the Ingect application could be used at their sites, and to ensure future compatibility with VITRO as a data store.</li>
<li>Document the metadata storage system and batch transformation system so that new Fedora-compatible ingest or portal tools can be swapped-in later.</li>
</ul>
</td>
</tr>
<tr>
<td class="Table3_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P15">Newcastle/ANDS unable to supply full metadata schema on time; there is no clear consensus on best practice for describing research data collection to meet the demands of The Code, while being able to serve RIF-CS to Research Data Australia.</p>
</td>
<td class="Table3_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<ul class="lib">
<li>Stakeholders to vigorously encourage ANDS to produce metadata guides, possibly after workshops.</li>
<li>If all else fails implement RIF-CS in the repository, with the possibility of doing a batch-update later.</li>
</ul>
</td>
</tr>
<tr>
<td class="Table3_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P15">
</td>
<td class="Table3_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<ul class="lib">
<li></li>
</ul>
</td>
</tr>
</tbody>
</table>
</div>
<h1><a name="id23"></a>1.6 <a name="__RefHeading__5635_2107375272"></a>Roles and Responsibilities</h1>
<div class="Table4" style="width: 100%; margin: 0px; padding: 0px; text-align: left;">
<table class="Table4" style="border-spacing: 0; empty-cells: show; width: 15.513cm; border-collapse: collapse; border: 1.0px solid #000000;">
<colgroup>
<col style="width: 4.022cm;"></col>
<col style="width: 11.492cm;"></col>
</colgroup>
<tbody>
<tr>
<td class="Table4_A1" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: 1.0px solid #000000; padding: 0.097cm;">
<p class="P25">Project Strategist</p>
</td>
<td class="Table4_B1" style="vertical-align: top; border: 1.0px solid #000000; padding: 0.097cm;">
<p class="P28">Dr Peter Sefton</p>
</td>
</tr>
<tr>
<td class="Table4_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P25">Developer</p>
</td>
<td class="Table4_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">Ron Ward</p>
</td>
</tr>
<tr>
<td class="Table4_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P25">Developer</p>
</td>
<td class="Table4_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">Greg Pendlebury</p>
<p class="P27">
</td>
</tr>
<tr>
<td class="Table4_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P25">Developer</p>
</td>
<td class="Table4_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">Linda Octalina</p>
</td>
</tr>
<tr>
<td class="Table4_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P25">Developer</p>
</td>
<td class="Table4_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">Oliver Lucido</p>
</td>
</tr>
<tr>
<td class="Table4_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P25">Developer</p>
</td>
<td class="Table4_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">Cynthia Wong</p>
</td>
</tr>
<tr>
<td class="Table4_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P25">Project coordinator</p>
</td>
<td class="Table4_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">Duncan Dickinson</p>
</td>
</tr>
<tr>
<td class="Table4_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P25">Project coordinator</p>
</td>
<td class="Table4_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">Bronwyn Chandler</p>
</td>
</tr>
<tr>
<td class="Table4_A2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: none; border-top: none; padding: 0.097cm;">
<p class="P25">Admin</p>
</td>
<td class="Table4_B2" style="vertical-align: top; border-bottom: 1.0px solid #000000; border-left: 1.0px solid #000000; border-right: 1.0px solid #000000; border-top: none; padding: 0.097cm;">
<p class="P28">Various ADFI staff <span class="spCh spChx2013">–</span> systems adminsitration and office duties.</p>
</td>
</tr>
</tbody>
</table>
</div>
<h1><a name="id24"></a>1.7 <a name="__RefHeading__5637_2107375272"></a>Ongoing Development and Use of Deliverables</h1>
<p>The deliverables in this project will be released under an open source license, GPL V2.</p>
<p>ADFI intends to continue development of The Fascinator software suite. It is under consideration for some USQ systems (Policies and procedures library, a media repository for courseware and an Arts repository capable of storing original artworks such as exhibitions of photographs). It is also being used in an active pilot for public memory research.</p>
<h1><a name="id25"></a>1.8 <a name="__RefHeading__5639_2107375272"></a>Reporting Requirements</h1>
<p>Project reporting will be via:</p>
<ul class="lib">
<li>Weekly milestone reporting via project Trac site.</li>
<li>Quarterly reports to ANDS on project progress.</li>
<li>[Added <span class="spCh spChx2013">–</span> the ANDS-Partner blog]</li>
</ul>
<h1><a name="id27"></a>1.9 <a name="__RefHeading__5641_2107375272"></a>Declaration</h1>
<p>[Removed]</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2011/01/ands-project-plan-eif040/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ReDBox world tour, December 2010</title>
		<link>http://www.ands-partners.org/blog/2010/12/redbox-world-tour-december-2010-2/</link>
		<comments>http://www.ands-partners.org/blog/2010/12/redbox-world-tour-december-2010-2/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 04:04:36 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[ReDBox (EIF-040)]]></category>

		<guid isPermaLink="false">http://www.ands-partners.org/blog/2010/12/redbox-world-tour-december-2010-2/</guid>
		<description><![CDATA[PDF version UQ ANU ARC / NHMRC JCU [2010-12-22 Update - fixed an encoding problem] Last week I took to the skies again, via the dreaded Warrego Highway for the last time in 2010, to show ReDBox to folks from &#8230; <a href="http://www.ands-partners.org/blog/2010/12/redbox-world-tour-december-2010-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="rendition-links"><span class="pdf-rendition-link"><a title="View the printable version of this page" href="http://www.ands-partners.org/blog/wp-content/uploads/2010/12/ReDBox-dec-tour1.pdf1.pdf">PDF version</a></span></div>
<div class="page-toc">
<ul>
<li><a href="#id2">UQ</a></li>
<li><a href="#id3">ANU</a></li>
<li><a href="#id4">ARC / NHMRC</a></li>
<li><a href="#id5">JCU</a></li>
</ul>
</div>
<div class="body">
<div>
<p>[2010-12-22 Update - fixed an encoding problem]</p>
<p>Last week I took to the skies again, via the dreaded Warrego Highway for the last time in 2010, to show ReDBox to folks from a bunch of institutions: The University of Queensland (UQ), The Australian National University (ANU), The Australian Research Council, (<a href="http://arc.gov.au/">ARC</a>) The National Health and Medical Research council (<a href="http://www.nhmrc.gov.au/">NHMRC</a>), the National Library of Australia (NLA), my friends at the Australian National Data Service (<a href="http://ands.org.au/">ANDS</a>) and last but not least James Cook University (JCU). I was showing off both of the components we&#8217;re working on in the EIF-040 project with different emphasis for different audiences:</p>
<ol class="lin" style="list-style: decimal;">
<li>The ReDBox research data management registry/repository.</li>
<li>The Mint Linked Authority Control Service.</li>
</ol>
<p>The Canberra (ANU/ANDS) leg of my trip was funded by ANDS directly and organised by Monica Omodei &#8211; thanks Monica. The other bits were funded by the University of Southern Queensland, my employer using money from the EIF-040 project, which came from ANDS.</p>
<p>And last month, there was a metadata stores meeting hosted by ANDS, at the University of Melbourne, held in conjunction with our annual <a href="http://cairss.caul.edu.au/www/events/cairss_community_day_2010.htm">CAIRSS repository manager&#8217;s meeting</a> — this half-day looked at all three ANDS-funded metadata stores and featured an intro from Andrew Treloar   — that event deserves its own blog post, but I will make do with this <a href="http://cairss.caul.edu.au/www/events/cairss_ands_metadata_stores_briefing_2010.htm">link to the presentations</a> and a couple of mentions of the other solutions here.</p>
<h1><a name="id2"></a>UQ</h1>
<p>I dropped in to see Nigel Ward and Abdul Alabri at UQ on December 15<sup>th</sup> on my way to the airport.</p>
<p>The purpose was to explain to them how the forms part of ReDBox works. (That&#8217;s explain in a hand-waving sense, they&#8217;ll need to talk to the actual programmers to find out the real story.) The basic design goal of the ReDBox forms is to work in a similar way to the simple but well liked VALET application that comes with VITAL Fedora repositories. Here&#8217;s how it works, which is <a href="http://www.ands-partners.org/blog/2010/10/but-what-format-are-you-storing-in-that-thing/">also covered in this other post.</a></p>
<ol class="lin" style="list-style: decimal;">
<li>A repository techie can design a web form using standard HTML, with some special widgets supplied by the ReDBox toolkit for things like dates, and looking up names in an authority control service. In our case that&#8217;s The Mint, but it could be an HR system or People Australia. My group has a proposal in with ANDS which involves working out some standard protocols for this.</li>
<li>The Repository techie designs simple workflows by setting up a config.json file with a number of workflow steps and who can do what to what sort of object.</li>
<li>The form data is automatically linked-in to the repository workflow. If someone fills in a form the data is pushed into a JSON data structure that maps directly to and from the form-set.</li>
<li>The Repository tech supplies simple scripts (or adapts ours) to map the form-data to the storage and dissemination formats they want   — these can be kept private or broadcast using OAI-PMH.</li>
</ol>
<p>You can&#8217;t talk about the ReDBox forms without talking about how the look-up works. So I showed them Mint., which stores reference data about people, organisational units, subject codes, grants, all stuff that is sourced from other places and needs to be referred to using linked data principles. It seems like the UQ team is interested in finding out more about Mint, we&#8217;ll assist them with that in January (<a href="https://fascinator.usq.edu.au/trac/ticket/1345">#1345</a>).</p>
<h1><a name="id3"></a>ANU</h1>
<p>On the 16<sup>th</sup> I had a morning meeting/demo with ANU staff. I ran through the ReDBox presentation   — there was a lively discussion about the challenges of managing anything shared in a big institution like ANU (are there any like ANU?) but I got the impression that there is going to be a centralised view of what data lives where, similar to the the ANDS-VITRO and the Newcastle/ReDBox approaches   — which contrasts with the <a href="http://cairss.caul.edu.au/www/events/cairss_ands_metadata_stores_briefing_2010.htm">approach being taken at Monash</a> where they are working with research teams on a individual basis, to manage their data and advertise it to the world via <a href="http://services.ands.org.au/home/orca/rda/">Research Data Australia</a>.</p>
<h1><a name="id4"></a>ARC / NHMRC</h1>
<p>The afternoon of the 16<sup>th</sup> was interesting. I was there at the ANDS office to talk about the need for us all to be able to refer to grants in the same way. It was encouraging that we were able to have a lively informed discussion about Linked Data, and the best way to construct URLs for research grants from our two big funding agencies. But it was frustrating as well   — government departments don&#8217;t just <em>get stuff done</em> because a bunch of the best informed people in the country, including their own staff <em>know</em> it&#8217;s a good idea. Canberra.</p>
<p>The good news is that the rest of us can decide what we&#8217;d like to call a research grant  — using Linked Data URIs  — and get on with it. Nothing stopping us setting up PURL URLs that point to Research Data Australia records about public grants data  — management of the PURLs could be handed to the government later, if they want. I think we should get on with this, which is pretty much what Duncan Dickinson from my team did, in consultation with the ANDS-VITRO group with ANZSRC subject codes, also owned by gov.au. That way we will have nice quality well linked data in RDA  — you&#8217;ll be able to find data and project descriptions from different sites that pertain to the same grant. If we don&#8217;t do this work then that simply won&#8217;t happen.</p>
<p>There was an eye-opening discussion about privacy and data   — about what ANDS could and could not do with data from one of the agencies in terms of identifying researchers not just by ambiguous strings, but using IDs. People have gone off to start working towards getting a ruling from the Privacy Commissioner. Apparently that&#8217;s the done thing. Again, Canberra.</p>
<h1><a name="id5"></a> JCU</h1>
<p>On the 17<sup>th</sup> I went tropical and visited Townsville for the day. I talked to a group of eResearch (with strong IT background), library and research office people. I showed the ReDBox / Mint combo, and we talked about how it might fit with JCU&#8217;s infrastructure.</p>
<p>I was impressed by a couple of things:</p>
<ol class="lin" style="list-style: decimal;">
<li>The research-office has a data-warehouse which has authority data for lots of the data needed for managing research data. They have lots of options including these three:
<ol class="li-lower-roman" style="list-style: lower-roman;">
<li>they could feed it to VIVO/VITRO to be fed on into RDA,</li>
<li>feed it to the Mint for forwarding to RDA and to provide lookup services for a forms interface</li>
<li>or write their own lookup/feed services.</li>
</ol>
</li>
<li>The high performance computing stack at JSU includes a data-drop service with a web front end. An evolution of this service, or a new service based on it would be a great adjunct to the things we have built so far. At Newcastle, as far as I am aware the research data store will be pretty much a bare file-server as far as researchers a concerned  — a management interface would be worth looking at.</li>
</ol>
<p>Thanks to all at JCU for having me and particularly to Trina Myers who made sure I got to have a quick tour of Townsville on my way back to the airport.</p>
<p>Trina is also interested in whether The Fascinator could use the triple store in Fedora Commons to provide semantic web RDF services for JCU&#8217;s forthcoming tropical data hub. It sounds very possible  — we already have all the queuing and indexing code  — would just be a matter of adding an interface to the triple store. Alternatively, we could interface with the <a href="http://vitro.mannlib.cornell.edu/vitroDeploymentGuide.html">VITRO</a> application and use it as an index to complement the high-performance Apache Solr faceted text index that we&#8217;re using now.</p>
<p class="Caption" style="width: 643px;"><span style="display: block;"><a name="graphics1"></a><img class="fr3" style="border: 0px; vertical-align: top;" src="http://www.ands-partners.org/blog/wp-content/uploads/2010/12/ReDBox-dec-tour_files9caf034_643x482.jpg1.jpeg" alt="graphics1" width="643" height="482" /></span>The view from Castle Hill &#8211; transport by Trina Myers</p>
<p class="center">Copyright</p>
<p><span><a href="http://ontologize.me/?tl_p=http://purl.org/dc/terms/creator&amp;triplink=http://purl.org/triplink/v/0.1&amp;tl_o=http://trove.nla.gov.au/people/541658">Peter Sefton</a></span>, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a onclick="javascript:window.top.open(&quot;http://creativecommons.org/licenses/by-sa/2.5/au/&quot;);return false;" href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt;</p>
<p class="center"><span class="Default_20_Paragraph_20_Font"><span style="country: US; language: en;"><span class="T1"><a name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"></a><img class="fr2" style="border: 0px; vertical-align: top;" src="http://www.ands-partners.org/blog/wp-content/uploads/2010/12/ReDBox-dec-tour_filesm40ca94ba1.png1.png" alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" width="88" height="31" /></span></span></span></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a onclick="javascript:window.top.open(&quot;http://ice.usq.edu.au/&quot;);return false;" href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a onclick="javascript:window.top.open(&quot;http://fascinator.usq.edu.au/desktop/desktop.htm&quot;);return false;" href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2010/12/redbox-world-tour-december-2010-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ingest (or how do you get things in there?)</title>
		<link>http://www.ands-partners.org/blog/2010/10/ingest-or-how-do-you-get-things-in-there/</link>
		<comments>http://www.ands-partners.org/blog/2010/10/ingest-or-how-do-you-get-things-in-there/#comments</comments>
		<pubDate>Mon, 18 Oct 2010 03:48:48 +0000</pubDate>
		<dc:creator>Nigel Ward</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[RIF-CS]]></category>
		<category><![CDATA[atom]]></category>
		<category><![CDATA[AtomPub]]></category>
		<category><![CDATA[ingest]]></category>
		<category><![CDATA[Institutional Repository]]></category>
		<category><![CDATA[oai-pmh]]></category>

		<guid isPermaLink="false">http://www.ands-partners.org/blog/?p=95</guid>
		<description><![CDATA[In an earlier post on this blog, Peter Sefton discussed a forms based data entry interface for getting data into the USQ / Newcastle ReDBox data registry. While this sort of metadata editing functionality is something every (decent) registry should &#8230; <a href="http://www.ands-partners.org/blog/2010/10/ingest-or-how-do-you-get-things-in-there/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In an <a href="http://www.ands-partners.org/blog/2010/10/but-what-format-are-you-storing-in-that-thing/">earlier post </a>on this blog, Peter Sefton discussed a forms based data entry interface for getting data into the USQ / Newcastle ReDBox data registry.  While this sort of metadata editing functionality is something every (decent) registry should support, it&#8217;s not the only scenario for getting metadata into a registry. Sometimes it is appropriate to directly <em>ingest</em> pre-existing metadata sourced from other systems. In this post i&#8217;ll discuss the UQ Seeding the Commons team&#8217;s early thinking on how we might ingest pre-existing metadata descriptions into our registry.</p>
<p><span style="color: #808080">[Content warning: this post contains technical details about OAI-PMH and the Atom Publication Protocol. Where possible, the technical bits have been contextualised with business requirements and illustrated with pretty pictures.]</span></p>
<h2>An ingest scenario</h2>
<p>First the ingest scenario:<br />
At UQ we have a few ANDS funded <a href="http://ands.org.au/funded/funding-overview.html">data capture projects</a> that aim to &#8220;create infrastructure … to collect and manage data, and to improve the way metadata about it is managed.&#8221;   Each of these projects will create data repositories that manage both data <em>and</em> metadata about that data. The vision is that metadata descriptions from these repositories will be shared with the UQ data registry, where they may be augmented with new metadata (e.g. linked to publications and grants information). Finally, the augmented descriptions will be syndicated to the ANDS collection registry. The scenario looks something like this:<br />
<a href="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/InternalSyndication.png"><img class="alignleft size-full wp-image-106" src="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/InternalSyndication.png" alt="" width="751" height="458" /></a></p>
<p>What technologies could aid the ingest of data into the UQ registry from the data capture systems?  This post examines how <a href="http://www.openarchives.org/pmh/">OAI-PMH</a> and <a href="http://tools.ietf.org/html/rfc5023">AtomPub</a> could support this scenario.</p>
<h2>Ingest via OAI-PMH</h2>
<p><strong><a href="http://ands.org.au/resource/rif-cs.html">RIF-CS</a> over <a href="http://www.openarchives.org/pmh/">OAI-PMH</a></strong> seems like an obvious choice for supporting the scenario.  It is what ANDS are using to syndicate data from institutional registries into the national Research Data Australia registry.  UQ are already implementing an OAI-PMH server in our data collections registry to support syndication with ANDS. In theory, we could also build an OAI-PMH client into our registry and ask our new data capture projects to syndicate their metadata by implementing OAI-PMH servers on top of their repositories.  (In fact it is more than a theory: we have prototyped this solution as part of our Health-e-reef <a href="http://ands.org.au/funded/eif-fast-start.html">EIF</a> project).</p>
<p>While OAI-PMH seems like an attractive option for transferring metadata from data capture repositories to the UQ data registry, it isn&#8217;t perfect.  Preliminary discussions with UQ data capture projects revealed that OAI-PMH actually imposes a higher implementation barrier than they&#8217;d like. The data capture projects are focussed on capturing and managing a community&#8217;s data. Syndicating metadata descriptions about that data is important, but needs to be as simple as possible so that they can focus on data management. Building OAI-PMH servers, ensuring they are always up, and ensuring they can deal with the load created by harvesting takes quite a bit of implementation effort. OAI-PMH also only supports XML based data formats. Some of our data capture communities are familiar with, and want to use simpler formats such as <a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV</a> and <a href="http://www.json.org/">JSON</a>.</p>
<p>For these reasons, RIF-CS (or an alternate XML format) over OAI-PMH might be a step too far for some of our data capture projects.</p>
<h2>Publication workflows</h2>
<p>Ingest via OAI-PMH also doesn&#8217;t support some of the publication workflows we have in mind for our registry.  Taking a lead from the institutional repository community, we&#8217;d like the data registry to support versioning of metadata records, and to support publication workflows where only selected versions of metadata records are syndicated to ANDS.</p>
<p>A <em>first-cut</em> at a publication workflow for our data repository looks something like this:<br />
<a href="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/EntityStates.png"><img class="alignnone size-full wp-image-99" src="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/EntityStates.png" alt="" width="610" height="421" /></a></p>
<p>This workflow says that items in the repository are either <em>unpublished</em> (only internally accessible), <em>under review</em> (on the road to being published), or <em>published</em> (accessible to the outside world and harvestable by ANDS).  Ideally, we&#8217;d like the ingest protocol to support all of the transitions between these states (represented as arrows in the diagram).  OAI-PMH can deal with create, update (version) and delete. It doesn&#8217;t, however, support notions of submit (for review), update based on review, or selective publication.</p>
<h2>AtomPub as a possible technology</h2>
<p>In an early discussion with one of the data capture projects, they suggested syndicating their metadata to our registry using the <a href="http://tools.ietf.org/html/rfc4287">Atom</a> format over HTTP. This discussion led us to investigate the related <strong><a href="http://tools.ietf.org/html/rfc5023">Atom Publishing Protocol</a></strong> (AtomPub to its friends), which looks to be a <em>very</em> nice fit for the workflow show above.</p>
<p>AtomPub is a protocol for publishing and editing Web resources that was originally developed to support posting blog entries from a web authoring tool. Since then it has been adapted to support publication and updates to other types of data. Most notably, it is used in</p>
<ul>
<li><a href="http://code.google.com/apis/gdata/">GData (Google Data Protocol)</a> that forms the basis for many of the Google APIs</li>
<li><a href="http://www.odata.org/">OData (Open Data Protocol)</a> that Microsoft are using to &#8220;provide access to information from a variety of applications, services, and stores&#8221;</li>
<li><a href="http://swordapp.org/">SWORD (Simple Web-service Offering Repository Deposit)</a> that was developed by the institutional repository community to ingest complex data objects</li>
</ul>
<p>The AtomPub protocol is pretty simple: it uses the basic HTTP verbs to create and manage information resources (look away now and skip past the next diagram if you want to avoid the inner workings of HTTP):</p>
<ul>
<li>GET is used to retrieve a representation of a known Resource.</li>
<li>POST is used to create a new, dynamically named, Resource.</li>
<li>PUT is used to edit a known Resource.</li>
<li>DELETE is used to remove a known Resource.</li>
</ul>
<p>AtomPub distinguishes between draft and published resources, and with one minor extension can support the notion of resources under review.  Mapping our publication workflow to AtomPub looks something like this:<br />
<a href="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/EntityStatesHTTP.png"><img class="alignnone size-full wp-image-100" src="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/EntityStatesHTTP.png" alt="" width="610" height="445" /></a></p>
<p>(The review=yes part is a minor extension to the protocol to support transition to a review state. It was inspired by the work on <a href="http://swordapp.org/sword-v2/">SWORD v2</a> to support the full deposit lifecycle.)</p>
<h2>Sure, but will it fly?</h2>
<p>We think this approach has legs: AtomPub is a great fit for our workflow and information model; and there is growing support for it in the institutional repository community (via SWORD). We have already built a simple AtomPub interface on top of a repository storing metadata about research collections and parties (using the <a href="http://abdera.apache.org/">Apache Abdera</a> AtomPub implementation).  We can create, get, edit and delete metadata resources using both a HTML/Javascript user interface and via command line tools.  Our system currently only supports published resources, but the next steps are to add under review and published states.</p>
<p>As well as fitting our workflow and information model, we think AtomPub can simplify our data capture projects in two main ways.  Firstly, AtomPub supports simple non-XML metadata formats out of the box. We have already prototyped ingesting metadata represented as JSON, and may support CSV if the requirement arises. Secondly, AtomPub doesn&#8217;t require the data capture repositories to implement OAI-PMH servers.  At its simplest,  a data capture repository can submit data to our registry via a basic HTML form. We think that is a very low barrier to entry.</p>
<p>We will be continuing to implement our AtomPub interface over the coming months, and will share our experience in case others also want to play. We are keen to get feedback on our approach, particularly from those familiar with institutional repository ingest systems who can tell us what we are missing.</p>
<hr />
<p>Written by Nigel Ward. Copyright The University of Queensland, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt;. <a href="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/m40ca94ba1.png"><img class="alignleft size-full wp-image-79" src="http://www.ands-partners.org/blog/wp-content/uploads/2010/10/m40ca94ba1.png" alt="" /></a></p>
<p>The project is supported by the <a href="http://www.ands.org.au/">Australian National Data Service</a> (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2010/10/ingest-or-how-do-you-get-things-in-there/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>ReDBox research-data-metadata registry/repository – who might want it?</title>
		<link>http://www.ands-partners.org/blog/2010/10/redbox-research-data-metadata-registryrepository-who-might-want-it-2/</link>
		<comments>http://www.ands-partners.org/blog/2010/10/redbox-research-data-metadata-registryrepository-who-might-want-it-2/#comments</comments>
		<pubDate>Thu, 14 Oct 2010 23:52:38 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[ReDBox (EIF-040)]]></category>

		<guid isPermaLink="false">http://adfi.usq.edu.au/ands-partners/2010/10/redbox-research-data-metadata-registryrepository-who-might-want-it-2/</guid>
		<description><![CDATA[On Wednesday this week I was invited1 to Flinders University in South Australia by Amanda Nixon, to talk to a group of people from Flinders and other SA institutions about the work we’re doing on ReDBox, an application for managing … &#60;a href= <a href="http://www.ands-partners.org/blog/2010/10/redbox-research-data-metadata-registryrepository-who-might-want-it-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc"></div>
<div>
<p>[Update: Fixed some formatting issues]</p>
<p>On Wednesday this week I was invited<span class="footnote" style="vertical-align: super;"><a name="ftn1-text"></a></span> to Flinders University in South Australia by Amanda Nixon, to talk to a group of people from Flinders and other SA institutions about the work we&#8217;re doing on ReDBox, an application for managing metadata about research data. Also in attendance was Simon Porter of the University of Melbourne, talking about the ANDS-VITRO solution. Simon talked first, and provided a good overview of the institutional drivers for research data management; compliance, data re-use, etc.</p>
<p>For the benefit of those in attendance, and others, here is my cut-price presentation. Please, if you were there and you&#8217;d like to add something or argue, use the comments. And for those who weren&#8217;t, you can join in too.</p>
<p class="P5">
<h1><a name="id2"></a>Purpose: to help you choose a research-data-metadata store</h1>
<p class="center">Peter Sefton :: sefton@usq.edu.au<br />
Manager, Software Research and Development Laboratory,<br />
Australian Digital Futures Institute,<br />
University of Southern Queensland<br />
Toowoomba Queensland</p>
<p class="P2">This project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.</p>
<p class="P3">
<h1><a name="id3"></a>Architecture based on work by Vicki Picasso</h1>
<p>The architecture of the software system is very much driven by the model that Vicki Picasso and team developed at Newcastle, for an institution-wide research data management system involving the <strong>library</strong> as the curators of data that are deposited by <strong>researchers</strong> in a research data facility provided by <strong>IT</strong>, with policies developed by the <strong>research office,</strong> and data feeds from the university research management system(s) with responsibility for data retention and disposal resting with <strong>records management</strong>. We <a href="http://ptsefton.com/2010/02/23/ands-metadata-stores-describing-metadata-collections-in-vital.htm"><span>wrote up the initial modelling on my blog</span></a>.</p>
<p>This is an updated version of the overall architecture from the project plan.</p>
<p><a name="graphics1"></a><a href="http://adfi.usq.edu.au/ands-partners/wp-content/uploads/2010/10/52360b52_610x432.jpeg"><img class="alignnone size-full wp-image-78" title="52360b52_610x432.jpeg" src="http://adfi.usq.edu.au/ands-partners/wp-content/uploads/2010/10/52360b52_610x432.jpeg" alt="" /></a></p>
<h1><a name="id4"></a>Authorities</h1>
<p>As discussed in my <a href="http://adfi.usq.edu.au/ands-partners/2010/10/but-what-format-are-you-storing-in-that-thing/"><span>last post here, about what data is stored where,</span></a> the main focus of the work with Newcastle, and partner Swinburne is on describing and pointing-to data collections. Other entities such as people and projects are important, but are very often described elsewhere, including (eventually) in by centralised authorities such as the National Library&#8217;s People Australia service so we are working with a two part model with data collections stored and managed in RedBox, with a second system (The Mint) used to marshal data from other sources and assign (mint) URIs for it where none exists.</p>
<p><a name="Object1"></a><a href="http://adfi.usq.edu.au/ands-partners/wp-content/uploads/2010/10/m56a61ed3.gif"><img class="alignnone size-full wp-image-80" title="m56a61ed3.gif" src="http://adfi.usq.edu.au/ands-partners/wp-content/uploads/2010/10/m56a61ed3.gif" alt="" /></a></p>
<h1><a name="id5"></a>The approach</h1>
<p>Vicki:  ReDBox mimics the IR processes and workflows that libraries are already used to</p>
<ul class="lib">
<li>Uses Fedora for data storage <span class="spCh spChx201c">“</span>Nobody ever got fired for using Fedora<span class="spCh spChx201d">”</span> <span class="spCh spChx2026">…</span> yet
<p>(Fedora provides robust data storage but it does have a cost, mainly in speed for accessing objects. In this project that cost will be minimal as almost all transactions will be with the Solr index, and that&#8217;s fast.)</li>
<li>Uses very simple customisation via web forms.</li>
<li>Treads a pragmatic line between using standards (ANDS-VITRO ontology for example) but maintaining flexibility to add locally-required data. I will report soon on a meeting with Griffith, one of the ANDS-VITRO partners about potential for standardization.</li>
</ul>
<h1><a name="id7"></a>Demos</h1>
<p>I gave some demos <span class="spCh spChx2013">–</span> these are not publicly available, contact me if you want one.</p>
<h1><a name="id8"></a>Readiness &amp; Sustainability</h1>
<p>ReDBox is ready to install and start working on configuration.</p>
<ul class="lib">
<li>Vicki:  the message people need to have is that the software is easy to use, user friendly, flexible, and ready to go</li>
<li>There is a community around the software USQ , Newcastle &amp; Swinburne</li>
<li>ADFI @ USQ has a track record of facilitating communities (RUBRIC / CAIRSS / ANDS). (And we can take on consulting work)</li>
</ul>
<p class="P4">
<h1><a name="id9"></a>Other issues / Questions</h1>
<p>A quick summary of the discussion.</p>
<dl>
<dt>Q: Ah, RDF <span class="spCh spChx2013">–</span> are you building the worldwide brain?</dt>
<dd>No, we want to <span class="spCh spChx201c">“</span>do linked data<span class="spCh spChx201d">”</span> and use an extensible standard for metadata. We&#8217;ll go <span class="spCh spChx201c">“</span>Semantic web<span class="spCh spChx201d">”</span> when users start asking for SPARQL interfaces.</dd>
<dt>Q: How much is it?</dt>
<dd>It&#8217;s free software.</dd>
<dt>Q: (from the floor) Why don&#8217;t you have forms for researchers to self-deposit</dt>
<dd>In both the ANDS-VITRO and ReDBox worlds we are focussing on administrator and library access first because we want to (a) get things right before opening them to the researcher community and (b) we&#8217;re not really expecting a lot of researchers to want to use the system, until we put in place campaigns to persuade them, and start changing administrative systems to foreground data management compliance.</dd>
</dl>
<blockquote class="bq"><p>But, in ReDBox you could adapt the forms and make the system available to researchers to self-deposit, most likely with review by data librarians or research-support staff.</p></blockquote>
<dl>
<dt>Q: (from the floor) These things (RedBox / Vitro) look very similar&#8230;</dt>
<dd>There are some differences in approach between our two applications but as Simon points out, we actually need each other as a risk-management strategy. Always look for the exits! Given our nascent collaboration on using a common data storage standard, there should be opportunities to migrate between systems reducing the risk of choosing either one.</dd>
<dd>The way I look at it is that ReDBox is built around a data-curation repository model while the ANDS-VITRO work has initially looked at doing as much machine-to-machine data routing as possible; two of the ANDS VITRO partners have focussed on describing projects rather than data collections, too, which has meant a different emphasis in what has been developed so far.</dd>
<dd>ReDBox is built on repository foundations. This means that it can be used in data-capture projects, and could be used to store some of the data itself. VITRO is built around an index of data, and does not come with a storage-layer, although there has been some work done on Fedora integration.</dd>
<dd>VITRO grew out of researcher-centric directory information systems, so if that&#8217;s an institutional goal, it&#8217;s worth looking at, although it would also be possible to expose the authority control server, Mint, we&#8217;ve developed with some more effort.</dd>
<dd>(There are also some possibilities for hybrid deployment to get the best of both worlds, but that&#8217;s not recommended ATM.)</dd>
</dl>
<p class="center">Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a onclick="javascript:window.top.open(&quot;http://creativecommons.org/licenses/by-sa/2.5/au/&quot;);return false;" href="http://creativecommons.org/licenses/by-sa/2.5/au/"><span>http://creativecommons.org/licenses/by-sa/2.5/au/</span></a>&gt;</p>
<p class="center">
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a onclick="javascript:window.top.open(&quot;http://ice.usq.edu.au/&quot;);return false;" href="http://ice.usq.edu.au/"><span>Integrated Content Environment</span></a> project and published to WordPress using <a onclick="javascript:window.top.open(&quot;http://fascinator.usq.edu.au/desktop/desktop.htm&quot;);return false;" href="http://fascinator.usq.edu.au/desktop/desktop.htm"><span>The Fascinator</span></a>.</p>
<hr />
<div style="font-size: .9em;"><span class="footnote-defined"><a name="ftn1"></a>My trip was funded by USQ, ,using the proceeds of our contract with ANDS.</span></div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2010/10/redbox-research-data-metadata-registryrepository-who-might-want-it-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>But what FORMAT are you STORING in that thing?</title>
		<link>http://www.ands-partners.org/blog/2010/10/but-what-format-are-you-storing-in-that-thing/</link>
		<comments>http://www.ands-partners.org/blog/2010/10/but-what-format-are-you-storing-in-that-thing/#comments</comments>
		<pubDate>Fri, 08 Oct 2010 02:57:14 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[ReDBox (EIF-040)]]></category>

		<guid isPermaLink="false">http://adfi.usq.edu.au/ands-partners/2010/10/but-what-format-are-you-storing-in-that-thing/</guid>
		<description><![CDATA[Aiming for linked data What will be stored? What is this ANDS-VITRO stuff? Get to the point! What are you storing? By: Peter Sefton, University of Southern Queensland &#8211; with assistance from other members of the ReDBox team. This is &#8230; <a href="http://www.ands-partners.org/blog/2010/10/but-what-format-are-you-storing-in-that-thing/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc">
<ul>
<li><a href="#id2"><span>Aiming for linked data</span></a></li>
<li><a href="#id3"><span>What will be stored?</span></a></li>
<li><a href="#id4"><span>What is this ANDS-VITRO stuff?</span></a></li>
<li><a href="#id5"><span>Get to the point! What are you storing?</span></a></li>
</ul>
</div>
<div>
<p>By: Peter Sefton, University of Southern Queensland <span class="spCh spChx2013">&#8211;</span> with assistance from other members of the ReDBox team.</p>
<p>This is another post about the ReDBox data registry application we <a href="http://adfi.usq.edu.au/ands-partners/2010/08/colour-me-red-the-ingect-system-for-research-data-collections/"><span class="Internet_20_link">introduced in our last post</span></a>. ReDBox is going to be a <a href="http://fedora-commons.org/"><span class="Internet_20_link">Fedora-commons</span></a>-backed registry of research data which will be able to be deployed in a number of different modes, one of which is the model proposed for the University of Newcastle, in which descriptions of data will be prepared in ReDBox, then when they have been approved by a data librarian, will be published-to and stored-in Newcastle&#8217;s Fedora-commons-backed repository, <a href="http://nova.newcastle.edu.au/vital/access/manager/Index"><span class="Internet_20_link">Nova</span></a>. </p>
<p>In this post we want to quickly describe our current thinking on what is being stored and published and we invite comment about what the community thinks of this approach, and what other ANDS-funded and international projects are doing.</p>
<h1><a id="id2" name="id2"><span /></a>Aiming for linked data</h1>
<p>Vicki Picasso and I wrote, over on the CAIRSS blog, about one of the basic principles of the metadata work we&#8217;re doing; that we want to take a <a href="http://cairss.caul.edu.au/blog/2010/08/05/i-only-have-uris-for-you-vicki-and-peters-adventures-in-linked-data-land/"><span class="Internet_20_link">linked-data approach to all metadata</span></a>, so all the terms used to describe things have URIs that we use in the same way as others describing <i>their</i> research data,. For entities like People (parties in ANDS-speak) and research projects (activities) we are building an authority control service to help ensure that people and projects are not just described by strings that might not match the same entity in another registry but by well defined persistent IDs.</p>
<p>This authority control service will be a kind of clearinghouse for data such as people&#8217;s IDs, research project codes, and a local look up service for vocabularies and ontologies, such as the subject codes, and geo-location information, We&#8217;ll write more about that soon.</p>
<p>
<p style="width:642px;"><a name="Object1"><span /></a><img alt="Object1" class="fr4" height="386" src="http://adfi.usq.edu.au/ands-partners/wp-content/uploads/2010/10/m2ac72d5d.gif" style="border:0px; vertical-align: top" width="619" />Figure 1: Simple architectural diagram of what will be stored where</p>
</p>
<h1><a id="id3" name="id3"><span /></a>What will be stored?</h1>
<p>The question <span class="spCh spChx201c">&#8220;</span>what format are you storing<span class="spCh spChx201d">&#8221;</span> might not make sense in all metadata registry projects, some of which are not so much storing metadata as <i>routing </i>it, but in a Fedora-commons-backed repository and for the ReDBox project, where the system we&#8217;re building will be used as <i>the</i> place to collect data about research data that can&#8217;t be sourced from other systems, it&#8217;s a useful question. </p>
<p>Fedora commons is a kind of database-ish repository component which has digital objects, each one described by some basic Dublin Core metadata, with some optional &#8216;datastreams&#8217; which can be either more, richer, metadata, or stuff like PDF files, or data. In all the Fedora-commons-backed institutional repositories we know about in Australia the main metadata for research documents is either MARCXML or MODS which are essentially two different serialisations of pretty much the same thing <span class="spCh spChx2013">&#8211;</span> the MARC metadata standard.  MARC is a bibliographic standard, not really suited to describing research data collections (although there are some overlaps with the kind of metadata you want for data).</p>
<p>Instead of MARC-based data, the starting point for this project is to use schema that&#8217;s being used by a number of Australian universities <span class="spCh spChx2013">&#8211;</span> the ANDS-VITRO ontology.</p>
<h1><a id="id4" name="id4"><span /></a>What is this ANDS-VITRO stuff?</h1>
<p>The ANDS-VITRO ontology is being developed by the ANDS-funded metadata stores behemoth; Melbourne, QUT and Griffith. They have:</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p>Taken the <a href="http://www.dlib.org/dlib/july07/devare/07devare.html"><span class="Internet_20_link">VIVO ontology</span></a> which is maintained by Cornell for describing researchers and related data.</p>
</li>
<li>
<p>Added extra stuff to deal with local requirements for data management, and to make sure that all of RIF-CS is covered.</p>
</li>
</ol>
<p>Our project has picked up on this ANDS-VITRO ontology, and we&#8217;re working on mapping it to HTML forms so that humans can enter it (the MQG group have concentrated on machine-to-machine metadata-routing pipelines rather than human-web interfaces). You can <a href="https://fascinator.usq.edu.au/trac/wiki/tf2/DeveloperNotes/investigations/Ontologies/DataSet"><span class="Internet_20_link">follow our work building on ANDS-VITRO on the project Wiki.</span></a> </p>
<p>(I hope I don&#8217;t have my VIVOs and VITROs mixed up, as far as I can tell <a href="http://vitro.mannlib.cornell.edu/"><span class="Internet_20_link">VITRO</span></a> is a bit of software for managing semantic-web sites, while VIVO refers to both Cornell&#8217;s <a href="http://vivo.cornell.edu/"><span class="Internet_20_link">VIVO</span></a>  site for its research an ontology and the underlying <a href="http://www.vivoweb.org/download"><span class="Internet_20_link">VIVO ontology</span></a>. ANDS-VITRO is an ontology, and the name of a mailing list.) </p>
<p>
<p style="width:642px;"><a name="graphics1"><span /></a><img alt="graphics1" class="fr3" height="241" src="http://adfi.usq.edu.au/ands-partners/wp-content/uploads/2010/10/m1a493580_642x241.jpeg" style="border:0px; vertical-align: top" width="642" />Figure 2: An early-draft forms interface with behind-the-scenes linked-data goodness</p>
</p>
<p>Behind the forms interface is a web-developer friendly approach to forms design and data-wrangling. One of the design goals was to make a very flexible system that library-tech-staff could configure using standard tools so the forms system is based on simple HTML forms with some simple hooks to make it easy to connect a form to a linked-authority control such as the Field of Research subject codes above.</p>
<p>Now, if you are a web developer this format will seem friendly and comforting, others may want to look away:</p>
<blockquote class="bq"><p><code>{</code></p>
<p><code><span class="spCh spChx2026">&#8230;</span> </code></p>
<p><code>"vivo:hasSubjectArea.1" : "http://purl.org/anzsrc/for#0601",</code></p>
<p><code> "vivo:hasSubjectArea.1.skos:prefLabel" : "0601 - BIOCHEMISTRY AND CELL BIOLOGY",</code></p>
<p><code><span class="spCh spChx2026">&#8230;</span> </code></p>
<p><code>}</code></p>
</blockquote>
<p>We have stuff like this for every bit of the form data, which is<b> Linked Data Compliant</b> because it is has  URL for the subject code and a human-readable label.</p>
<h1><a id="id5" name="id5"><span /></a>Get to the point! What are you storing?</h1>
<p>So, enough background. What will be stored in ReDBox then pushed to VITAL?</p>
<p>The primary metadata format will be based on each institution&#8217;s requirements. Essentially, this will be defined by the set of forms they come up with; we hope that we will be able to make sure that everything we need to capture can be stored in the ANDS-VITRO framework, with as few extensions as possible required, and to that end the people working on metadata for ReDBox will be very active in working with the ANDS-VITRO team to come up with a emerging standard for research-data-metadata in the Australian university context.</p>
<p><b>For internal use</b>, the forms data will be stored in JSON as this makes it very easy to pull a collection description back out of ReDBox or VITAL and turn it back into a form, for editing.</p>
<p><b>For preservation purposes</b>, we will also map the JSON forms data to RDF, conforming to the ANDS-VITRO schema, plus any necessary extensions. We hope there will not be any extensions,  that the format can be made to accommodate all the requirements for the sector. But if local extensions are needed that&#8217;s quite simple to manage in RDF, as it is inherently extensible<span class="Footnote_20_Reference"><span class="footnote" style="vertical-align: super;"><a class="footnote" href="#ftn1" name="ftn1-text" title="1 Every time I use that word I remember the agent who told me to take it out of my CV for a technical writing job on account of it is &#8220;not a proper word&#8221;, just a couple of years before the eXtensible markup language came along and made us both rich. Well him, anyway."><span>1</span></a></span></span>. RDF can be stored in a variety of ways, all interchangeable, so it might be XML, or some other serialization format.</p>
<p>We&#8217;ll keep the JSON view of the forms data (coloured red because it is really the ReDBox native format), for pragmatic reasons. This will be mapped to ANDS-VTIRO RDF  which we are backing as the emerging standards in this area (really, this is a community driven format which reflects user needs). </p>
<p>For interchange, and harvesting to Research Data Australia, there will be RIF-CS, for sending data to the NLA&#8217;s people Australia service there will likely be EAC. And for very general discovery. Dublin Core. </p>
<p>
<p style="width:649px;"><a name="Object2"><span /></a><img alt="Object2" class="fr4" height="436" src="http://adfi.usq.edu.au/ands-partners/wp-content/uploads/2010/10/m61f1d2f9.gif" style="border:0px; vertical-align: top" width="629" />Figure 3: Four formats stored: JSON, ANDS-VITRO RDF, RIF-CS and Dublin Core (for the purposes of this diagram ReDBox and VITAL are shown as a single system.</p>
</p>
<p class="center" />
<p class="center" />
<p class="center">Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/" onclick="javascript:window.open(&quot;http://creativecommons.org/licenses/by-sa/2.5/au/&quot;);return false;"><span class="Internet_20_link">http://creativecommons.org/licenses/by-sa/2.5/au/</span></a>&gt;</p>
<p class="center"><span class="WW-Default_20_Paragraph_20_Font"><span style="country:US; language:en; "><span class="T3"><a name="graphics2"><span /></a><img alt="graphics2" class="fr3" height="31" src="http://adfi.usq.edu.au/ands-partners/wp-content/uploads/2010/10/m40ca94ba.png" style="border:0px; vertical-align: top" width="88" /></span></span></span></p>
<p class="center" />
<hr />
<div style="font-size: .9em;"><span class="footnote-defined"><a href="#ftn1-text" name="ftn1"><span>1</span></a> Every time I use that word I remember the agent who told me to take it out of my CV for a technical writing job on account of it is <span class="spCh spChx201c">&#8220;</span>not a proper word<span class="spCh spChx201d">&#8221;</span>, just a couple of years before the eXtensible markup language came along and made us both rich. Well him, anyway.</span></div>
</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ands-partners.org/blog/2010/10/but-what-format-are-you-storing-in-that-thing/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

