Register and pay for one or more of these tutorials via online registration

Monday July 22 Morning (9:00 - 12:00)

The Europeana Data Model and Collections

  • Karen Wickett
  • Valentine Charles
  • Katrina Fenlon

Recent developments in digital library aggregations can support the representation and description of collections from a variety of sources and domains, and can even allow users to curate and manage new collections. Collection descriptions allow users to find collections and determine whether a given collection contains resources relevant to their purpose while providing an important kind of context that can be used interpret the significance of individual items. The Europeana Data Model (EDM) was developed to support Europeana, the largest aggregation of cultural heritage resources in Europe. The EDM addresses issues related to domain-specific metadata standards, participation in a Linked Open Data environment, and adding value to digital cultural heritage objects through data enrichment. Research conducted for the IMLS Digital Collections and Content (DCC) project through the Center for Informatics Research in Science and Scholarship indicates that extending the EDM to include collection records and maintain relationships between items and collections in the context of an aggregation would advance the goal of addressing the challenges of meaningful organization and access. Together, researchers from Europeana and DCC have produced a white paper that provides recommendations for modeling collections in digital library aggregation and exchange environments like Europeana and The European Library.

This tutorial provides a technical introduction to the EDM and explores the role that collections play in adding value to digital libraries by 1) supporting the information seeking activities of system users, 2) allowing users to build and curate their own collections of resources, and 3) supporting administrative management of resources and metadata. Participants will gain a better understanding of conceptual data modeling, structured collection description, and collection metadata. The tutorial will conclude with a discussion of practitioners’ experience with items and collections in a digital library context and next steps for collection modeling research.

Introduction to Digital Libraries

  • Edward Fox

This tutorial is a thorough and deep introduction to the DL field, providing a firm foundation: covering key concepts and terminology, as well as services, systems, technologies, methods, standards, projects, issues, and practices. It introduces and builds upon a firm theoretical foundation (starting with the '5S' set of intuitive aspects: Streams, Structures, Spaces, Scenarios, Societies), giving careful definitions and explanations of all the key parts of a 'minimal digital library', and expanding from that basis to cover key DL issues. Illustrations will come from a well-chosen set of case studies. Attendees will receive a copy of the Dec. 2011 (over 500 pages) combined DL book the presenter used in teaching a graduate DL course, as well as portions of the 4 books that have extended that work and should all be out before the conference, from Morgan & Claypool (Theoretical Foundations, Key Issues, Technologies, Applications) – see https://sites.google.com/a/morganclaypool.com/dlibrary/ , http://www.morganclaypool.com/doi/abs/10.2200/S00434ED1V01Y201207ICR022. Complementing the coverage of '5S' will be an overview of key aspects of the DELOS Reference Model and the DL.org activities based in Europe.

ResourceSync: The Resource Synchronization Framework 

  • Martin Klein

This tutorial will provide an overview and a practical introduction to ResourceSync, a synchronization framework consisting of multiple modular capabilities that a server can selectively implement to enable third party systems to remain synchronized with the server's evolving resources. The tutorial will motivate the ResourceSync approach by outlining several synchronization use cases including scholarly article repositories, linked data knowledge bases, and resource aggregators. It will detail the concepts of the ResourceSync capabilities, their discovery mechanisms, and their serialization based on the widely adopted Sitemap protocol. The tutorial will further hint at the extensibility of the synchronization framework, for example, for scenarios to provide references to mirror locations of synchronization resources, to transferring partial content, and to offering historical data.

Workflows for Automating Data Curation  CANCELLED

  • Stacy Kowalczyk
    Scott Jensen
    Beth Plale
    Kavitha Chandrasekar

Scientific workflow systems present both new opportunities and new challenges for digital repositories. These systems are often used by domain scientists as a tool to “plug together” components for data acquisition, transformation, analysis and visualization to build complex data-analysis frameworks from existing building blocks, including algorithms available as locally installed software packages or globally accessible web services. We believe that these workflow systems present an opportunity for digital repositories to leverage these systems for data ingest, curation, and preservation activities. These same workflow systems are also generating a deluge (or bonanza) of scientific data. In addition to their existing responsibilities, institutional repositories are increasingly being asked to also provide archival and discovery services for scientific data generated digitally from these workflows as well as from simulations, and scientific instruments. This tutorial explores these opportunities and challenges.

Workflow systems can help librarians, technologists, and repository managers streamline processing by integrating existing, well known software (such as JHOVE ) with repository specific needs. At ingest, workflow systems can orchestrate processes to verify and validate formats, create derivative files, and create structural and administrative metadata. Workflows can automate a wide variety of preservation actions such as reservation processes such as format normalization, metadata extraction from data files in excel and csv, and provenance and preservation metadata generation. Curation processes such as fixity verification, format migration, and preservation metadata updates can be automated with workflows as well. In addition to the administrative functions of ingest, preservation, and curation, workflows can provide additional functionality for dissemination processes such as converting file formats on demand for rendering, providing “snippets” of data files for discovery, and packaging numerous large files into a single compressed file, such as zip or tar, for faster and more efficient delivery.

An increased focus on the sharing and reuse of scientific data has resulted in scientists and institutions turning to digital repositories to perform a greater role in preserving this data for future users due to their experience in preserving other forms of digital data. In this tutorial we will explore the wide variety of metadata formats being used for scientific data, ranging from self-described data, to well structured metadata based on XML schemata, and even unstructured name/value pairs or no metadata at all. The increasing volume of scientific data generated by workflow systems, simulations, and large instruments demands detailed metadata to enable discovery and reuse while also requiring automation of metadata capture to the extent possible.

Lunch (12:00 - 13:30)

Afternoon (13:30 - 16:30)

Building Digital Library Collections with Greenstone 3 and Interoperability with Open Source Software

  • David Bainbridge

This tutorial is designed for those who want an introduction to building a digital library using an open source software program. The course will focus on the Greenstone digital library software, with emphasis on interoperability with other open source DL systems such as Fedora and DSpace. In particular, participants will work with the Greenstone Librarian Interface, a flexible graphical user interface designed for developing and managing digital library collection. Attendees do not require programming expertise, however they should be familiar with HTML and the Web, and be aware of representation standards such as Unicode, Dublin Core and XML.

The Greenstone software has a pedigree of over a decade, with over for example 750,000 downloads from SourceForge. The premier version of the software has, up to this point, been Greenstone 2 (version 2.85)—this tutorial would be the first ever given on Greenstone 3. The new version of the software is a complete redesign and reimplementation of the original version to take better advantage of newer standards and web technologies that have been developed since the original implementation of Greenstone. Written in Java, the software is more modular in design to increase the flexibility and extensibility of Greenstone. We will illustrate this through the tutorial, including examples of interoperability with other DL systems as well as integration with other web-based resources, such as Open Street Map.

From Ingest to Access using the Archivematica Open-Source Digital Preservation System CANCELLED

Courtney Mumma

Archivematica is a free and open-source digital preservation system that is designed to maintain standards-based, long-term access to collections of digital objects. It uses a micro-services design pattern to provide an integrated suite of software tools that allows users to process digital objects from ingest to access in compliance with the ISO-OAIS functional model.

This tutorial will be an opportunity for hands-on experience processing digital objects. Attendees will use their own laptops to access cloud-hosted copies of the web-based Archivematica system and complete the tutorial steps. The instructor will also answer questions about installation, integration with other tools (both open-source and proprietary), software features, tools used and the open-source project management model.

Mining Data Semantics CANCELLED

  • Ying Ding
  • Jie Tang
  • Erjia Yan

The tutorial aims to discuss key issues and practices of mining semantics in heterogeneous information networks. Social, information and biological systems usually consist of a large number of interacting, multi-typed components connected via various types of links, which makes heterogeneous networks ubiquitous. Mining semantics from the heterogeneous networks can address several important questions. The tutorial will provide a hands-on experience on how to apply data integration and data discovery in large patent datasets, scholarly publication datasets, and open biomedical datasets.

Using Open Annotation

  • Timothy Cole
  • Robert Sanderson
  • Thomas Habing
  • Jacob Jett

The practice of annotating documents and resources is a time-honored tradition across all scholarly domains and in the day-to-day lives of academicians and non-academicians alike. The W3C Open Annotation (OA) Community Group was formed in late 2011, joining the efforts of the Open Annotation Collaboration and the Annotation Ontology initiative. The Community Group has recently released a developer-ready, RDF-based annotation data model specification to foster the development of interoperable annotation tools and service and to facilitate the sharing of annotations across repository and Web application boundaries.

This tutorial introduces digital library, humanities and science computing, and semantic web developers, project managers, and experimenters to:

  • The essential components of the OA data model and specification,
  • Working exemplars of how the OA data model has been applied to various annotation use cases
  • Helpful links, resources, guides and libraries for implementing the OA data model.