Introduction to (Teaching/Learning about) Digital Libraries
Evaluating Digital Libraries
Thesauri and ontologies in digital libraries (parts 1 & 2)
Using Standards in Digital Library Design & Development
Practical Digital Library Interoperability Standards

Introduction to (Teaching/Learning about) Digital Libraries

This tutorial will provide a thorough and deep introduction to the DL field, introducing and building upon a firm theoretical foundation (starting with “5S”: Streams, Structures,

Spaces, Scenarios, Societies), giving careful definitions and explanations of all the key parts of a “minimal digital library”, and expanding from that basis to cover key DL issues, illustrated with a well-chosen set of case studies.

Attendees will receive a first draft copy of a new book under development by the co-presenters, with tentative title “Foundations for Information Systems: Digital Libraries and the 5S Framework”, based in part on ideas explored in Dr. Gonçalves' dissertation.

Goals are to:

  • aid those with CS, library, or info. science backgrounds to enter the DL field
  • clarify key terms and concepts to provide a basis to understand JCDL
  • explain how DL services fit into a simple taxonomic framework
  • enhance concern for quality in DLs by providing a contextual setting in the

Information Life Cycle, and precisely specifying popular indicators

  • show those teaching a DL course how to use the forthcoming book
  • personalize the tutorial based on a list of top priority goals from each attendee, making use of having 2 presenters who can switch off or handle different groups

The presenters are:

  • Dr. Edward A. Fox, Professor of Computer Science and Director of the Digital Library Research Laboratory at Virginia Tech. He serves as Chair of the IEEE Technical Committee on Digital Libraries and as Executive Director of the Networked Digital Library of Theses and Dissertations. He has taught courses and many tutorials on digital libraries, as well as on information retrieval, multimedia, electronic publishing, etc. He was one of the first to work in the area of digital libraries.
  • Dr. Marcos André Gonçalves, former PhD student at Virginia Tech in 2004, now teaching at Federal University of Minas Gerais, Belo Horizonte, Brazil . His dissertation explains one of the first comprehensive formal theories for digital libraries: the 5S framework. He has been working in the DL field since 1997. His research interests include Digital Libraries, Information Retrieval, and Databases. He was lead author of the JCDL 2004 Best Student Paper. He was funded in 2003-2004 by an AOL Fellowship, and is now supported by CNPq.

Description / topical outline:

This tutorial will be based on the new book being prepared by the co-presenters, which will be well-along by the time of the conference. We plan for all the figures in the book to be done, and those will be used to illustrate most of the ideas of the tutorial. The tentative book outline is as follows:

1. Motivation, Synopsis

  • Why do we need this book? What are DLs? Why 5S?
  • History; Related Areas: LIS, prob/statistics, linguistics, AI, databases, knowledge management, content management, …

Part 1 – The `Ss'

2. Streams: Text, Images, Audio, Video

3. Structures

3.1 Digital Objects and Metadata

  • DOs: Digitization, packaging, interchange, standards, genre, organization
  • Metadata: Standards, Markup

3.2 Knowledge Structures

  • Databases
  • Ontology, Thesauri, Dictionary/Lexicon/Authority Files
  • Indexes, Clusters/Classification schemes

4. Spaces: Retrieval Models, User interfaces and Visualization

5. Scenarios: Scenario-Based Design, Events, Services

6. Societies

  • Economical issues: IP, Rights, Publishing, Sustainability
  • Research Community: Associations, Conferences, Laboratories

Part 2 – Higher Level DL Constructs

7. Collections

  • Large and Distributed Collections
  • Federation and Harvesting

8. Repositories/Archives

  • Naming, Identifiers
  • Architectures, Interoperability
  • Preservation
  • Scalability, Storage

9. Catalogs: OPACs, etc.

10. Services

  • Ontology, Composition, reuse,
  • Evaluation

11. Systems (Greenstone, Fedora, Eprints, Dspace, Kepler)

Part 3 – DL Case Studies


  • Appendix A- Mathematical Preliminaries
  • Appendix B – Formal definitions of the Ss and DL
  • Appendix C – Glossary of terms and mapping CS <-> LIS

Target audience, including level of experience:

  • Audience 1: Those attending JCDL for the first time, to become oriented.
  • Audience 2: Those interested in DL theory in general, or 5S in particular.
  • Audience 3: Those teaching DL courses, so as to be prepared to use the new book.
  • Level of experience required: introductory. Those at intermediate or advanced levels could benefit as well, since the 5S framework has broad applicability for planners, designers, implementers, and evaluators.

Learning objectives:

Be able to:

  • Explain 5S.
  • Define key DL concepts.
  • Define “digital library”.
  • List common DL services, and to organize them into a useful taxonomy.
  • Analyze an existing or planned DL using the 5S framework.
  • Explain how quality fits into the Information Life Cycle.
  • Describe how to compute a variety of DL quality indicators.
  • Organize and begin to teach a DL course based on the new book.

Evaluating Digital Libraries

(Introductory, intermediate, advanced levels could all benefit.)

To conduct a comprehensive evaluation of a digital library requires a "triangulation" approach whereby multiple models, procedures, and tools are applied. Conducting valid evaluations of digital libraries in a timely and efficient manner is the focus of this tutorial. Why is evaluation of digital libraries so important? Each year sees the introduction of more and more digital libraries promoted as valuable resources for education and other needs. Yet systematic evaluation of the implementation and efficacy of these digital library systems is often lacking.

This tutorial is specifically designed to establish evaluation as a key strategy throughout the design, development, and implementation of digital libraries at all levels of education. A decision-oriented model to the evaluation of digital libraries will be the focus of the tutorial. Within this model, methods used include: service evaluation, usability evaluation, information retrieval, biometrics evaluation, transaction log analysis survey methods, interviews and focus groups, observations, and experimental methods. Participants will be provided with a range of resources including an online evaluation toolkit.

Topical Outline:

Participants in this tutorial will learn how to implement models and procedures for evaluating digital libraries at all levels of education. The tutorial includes presentations with actual case studies that are focused on a variety of digital library evaluation strategies. Tutorial participants will learn to develop, implement, and report specific plans, strategies, and tools for a decision-oriented approach to the evaluation of digital libraries. Key evaluation strategies emphasized in the tutorial include:

  1. service evaluation,
  2. usability evaluation,
  3. information retrieval,
  4. biometrics evaluation,
  5. transaction log analysis
  6. survey methods
  7. interviews and focus groups
  8. observations, and
  9. experimental methods.

Target Audience:

Anyone involved in the development, implementation, or use of digital libraries.

Learning Objectives:

After attending this tutorial, the participants will be able to perform the following tasks:

  1. Describe different paradigms for digital library evaluation.
  2. Distinguish between:
    • assessment and evaluation;
    • internal and external evaluation;
    • intrinsic and extrinsic evaluation; and
    • formative and summative evaluation.
  3. Implement six facets of evaluation for digital:
    • review;
    • needs assessment;
    • formative evaluation;
    • effectiveness evaluation;
    • impact evaluation; and
    • maintenance evaluation.
  4. Prepare an evaluation plan for various decision-making needs, including clarification
  5. of key questions and methods.
  6. Recognize the advantages and limitations of evaluation in the context of digital libraries.

Thomas C. Reeves, Ph.D.,
Professor of Instructional Technology,
The University of Georgia , USA

Dr. Susan Buhr
Director of Outreach for the Cooperative Institute for Research in Environmental Sciences (CIRES),
University of Colorado , USA

Dr. Lecia Barker
Director of the Evaluation & Research Group of the Alliance for Technology, Learning, and Society (ATLAS)
University of Colorado , USA

Thesauri and ontologies in digital libraries

Part 1: Structure and use in knowledge-based assistance to users


This introductory tutorial is intended for anyone concerned with subject access to digital libraries. It provides a bridge by presenting methods of subject access as treated in an information studies program for those coming to digital libraries from other fields. It will elucidate through examples the conceptual and vocabulary problems users face when searching digital libraries. It will then show how a well-structured thesaurus / ontology can be used as the knowledge base for an interface that can assist users with search topic clarification (for example through browsing well-structured hierarchies and guided facet analysis) and with finding good search terms (through query term mapping and query term expansion — synonyms and hierarchic inclusion). It will touch on cross-database and cross-language searching as natural extensions of these functions. It will also mention the use of more richly structured ontologies, including Semantic Web applications. The tutorial will cover the thesaurus structure needed to support these functions: Concept-term relationships for vocabulary control and synonym expansion, conceptual structure (semantic analysis, facets, and hierarchy) for topic clarification and hierarchic query term expansion). It will introduce a few sample thesauri and some thesaurus supported digital libraries and Web sites to illustrate these principles.

Part 2: Design, evaluation, and development


This tutorial is intended for people who have a basic familiarity with the function and structure of thesauri and ontologies. It will introduce criteria for the design and evaluation of thesauri and ontologies and then deal with methods and tools for their development: Locating sources; collecting concepts, terms. and relationships to reuse existing knowledge; developing and refining thesaurus/ontology structure; software and database structure for the development and maintenance of thesauri and ontologies; collaborative development of thesauri and ontologies; developing crosswalks / mappings between thesauri/ontologies. In summing up, the tutorial will address the question of the resources needed to develop and maintain a thesaurus or ontology.

Dagobert Soergel
College of Information Studies
Univ. of Maryland
College Park, MD 20742
Office:(301) 405-2037 Fax (301) 314-9145 Cell 703-585-2840

Using Standards in Digital Library Design & Development

This tutorial will cover a set of Standards or de facto Standards that can play a role in the design and development of Digital Library applications. The Standards that will be discussed are the ISO MPEG-21 Digital Item Declaration, the ISO MPEG-21 Digital Item Identification, the ISO MPEG-21 Digital Item Processing, the Open Archives Protocol for Metadata Harvesting, the Internet Archive ARC file format, the NISO OpenURL Framework for Context-Sensitive Services, and the proposed info URI scheme. The tutorial will discuss these Standards by illustrating how they have been used in the context of the aDORe Digital Object repository. aDORe [8] has been designed and implemented for ingesting, storing, and accessing a vast collection of Digital Objects at the Research Library of the Los Alamos National Laboratory. Since aDORe is not a product, the tutorial is not a product advertisement. Rather, it is an opportunity for designers and developers to learn about Standards that can help addressing real-life challenges in DL design and development, and help increase interoperability across systems. The presenters are actively involved in all of the standardization efforts that are discussed.


Using MPEG-21 DID to represent Digital Objects

The tutorial examines the use of the MPEG-21 Digital Item Declaration [3] (ISO/IEC 21000-2) to represent Digital Objects as XML-based MPEG-21 DIDL documents.

Using MPEG-21 DII to identify Digital Objects

The tutorial examines the use of the MPEG-21 Digital Item Identification [4] (ISO/IEC 21000-2) to convey the identifier(s) of a Digital Object, and its constituent datastreams.

Using MPEG-21 DIP to process Digital Objects

The tutorial examines the use of the MPEG-21 Digital Item Processing Standard [1] (ISO/IEC 21000-10) to facilitate the delivery of various disseminations of Digital Objects.

Using OAI-PMH to harvest resources represented as Digital Objects

The tutorial examines how digital resources, not just metadata about resources, can be harvested using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [5] .

Using Internet Archive ARC files to store constituent datastreams of a Digital Object

The tutorial examines how constituent datastreams of the Digital Object can be provided by reference and physically stored in Internet Archive ARC files [1] .

Using info URI to facilitate the referencing of information assets under the URI allocation

The tutorial examines the use of the info URI scheme [7] , proposed to allow referencing by means of URIs those resources that have identifiers in public namespaces but have no representation within the URI allocation.

Using the OpenURL Framework to convey Context-Sensitive dissemination requests

The tutorial examines the use of the NISO OpenURL Standard [6] , which provides a generic framework for the delivery of Context-Sensitive services pertaining to resources referenced in a networked environment.

Jeroen Bekaert
Research Library
Los Alamos National Laboratory
Ghent University
Faculty of Engineering

Xiaoming Liu
Research Library
Los Alamos National Laboratory

Herbert Van de Sompel
Research Library
Los Alamos National Laboratory

Practical Digital Library Interoperability Standards


As the field of digital libraries matures and new systems and standards develop, the ability to interoperate between systems becomes paramount. This tutorial gives a practical introduction to many recent standards and de facto standards for interoperability, and illustrates them using open source digital library software—including online demonstrations of interoperation issues and solutions. Core standards that are discussed include Dublin Core, OAI-PMH, METS, and MODS. We use interoperation between Greenstone and DSpace as a motivating case study.

For those demonstrations that involve Greenstone, attendees who wish to may bring their laptops, install Greenstone from a CD-ROM that we will provide, along with various sample files, and follow along with the demonstrations on their own machine.

Subject Matter

To set the context we briefly overview the traditional library standards MARC and Z39-50. Then we focus on Dublin Core and consider how crosswalk files are built in practice to map MARC metadata to Dublin Core, a form readily supported by digital library systems. Dublin Core variants will be discussed, including national extensions and the qualified Dublin Core standard, and the LOM standard for educational metadata. The recent MODS standard from the Library of Congress will be introduced, along with some practical difficulties that its very general structure raises. All of these will be illustrated, and demonstrated online, in the application context of importing and exporting collections from and to different standard representations using open-source digital library software.

We then proceed to discuss OAI-PMH, and practical issues of ingesting collections and serving them using standard digital library software. OAIPMHis based on metadata harvesting, and although metadata records may contain a link to the resource to which they refer this is beyond the scope of the protocol. Nevertheless, informal conventions can often be exploited to access the document's full content, allowing a practical digital library to form richer collections—again, this will be supported by actual implementations and demonstrations.

The METS standard uses a meta-description approach to describe what constitutes a “work” in a digital library. Although very flexible, this has the disadvantage that different systems may use different, and logically incompatible, structures. The notion of METS “profiles” helps to define the dialect of METS that particular systems use, and we show how open source digital library systems can utilize general XSLT modules to ingest foreign METS profiles.

As an example, we discuss METS-level transfer of collections between a standard METS implementation and Greenstone. We also discuss options for bridging between DSpace and Greenstone, and demonstrate alternative approaches.


  • Traditional library standards: MARC and Z39-50
  • Dublin core and its variants; crosswalk files
  • OAI-PMH protocol: ingesting from and serving over OAI-PMH
  • Greenstone architecture: plugins for different metadata formats: MARC, ProCite, CDS/ISIS, BibTex, Refer
  • Subfields: qualified DC, and MODS
  • METS as intermediate document representation; METS profiles
  • DSpace, Fedora and Greenstone
  • Web services
  • Concluding discussion

Target Audience

The tutorial is designed for those who want to learn about digital library standards and interoperability in the context of actual digital library software and digital library collections. Interoperability issues that seem abstract when discussed in isolation become immediate and concrete when set in the context of particular practical problems.

The tutorial is intended for digital library students, researchers, and practitioners who are interested in practical issues of interoperability. It will also be useful for those seeking to further their knowledge of what existing open source digital library software can do and how to work with them.


Participants will receive a handout that includes PowerPoint slides for the tutorial

In addition, participants who wish to will receive a copy of the material for the Greenstone tutorial, which includes a CD-ROM containing the Greenstone software, full documentation, and sample collections.

David Bainbridge and Ian H. Witten


Ian H. Witten, Department of Computer Science, University of Waikato
Hamilton , New Zealand
Phone (+64 7) 838-4246, fax (+64 7) 838-4155
Electronic mail: ihw@cs.waikato.ac.nz.