A Digital Library for Water Main Break Identification and Visualization
Sunshin Lee; Noha Elsherbiny; Edward Fox

ABSTRACT

This paper describes a prototype of a digital library for water main break identification and visualization. Many utilities rely on a emergency call to detect water main breaks, because breaks are difficult to predict. Collecting the information by call requires time consuming human efforts. Furthermore, it is not archived and not shared with others. Collecting and archiving the information by tweets, news, and web resources helps users to identify relevant water main breaks efficiently. In developing this prototype, we extracted location information from text instead of using GPS data. We also describe the importance of tweet visualization by location, and how we visualize tweets on a map.

A Preliminary Analysis of FRBR's Bibliographic Relationships for Path Based Associative Rules
Ya-Ning Chen; Hui-Pin Chen; Fei-Yen Tu

ABSTRACT

The Functional Requirements for Bibliographic Records (hereafter FRBR) has been adopted to address the relationships for bibliographic records and the related aggregate works. However, an approach to transform FRBR-based bibliographic relationships and their patterns into path-based rules for retrieval, navigation, display and data mining in the bibliographic space is still lacking. This study used the FRBR as a basis to analyze bibliographic relationships and their path-based rules. The novel “Harry Potter and the Philosopher’s Stone” was used as a case study. Up until now, 87 unique records were retrieved from OCLC’s Open WorldCat for analysis. Two specialists in library and information science familiar with FRBR conducted in-depth analysis to achieve inter-reliability agreement. This study generalizes several patterns of path-based rules for associating bibliographic records and outlines related issues for future study.

A Qualitative Analysis of Information Dissemination through Twitter in a Digital Library
Hae Min Kim; Christopher Yang; Eileen Abels; Mi Zhang

ABSTRACT

This study examines the use of Twitter in a digital library, the Internet Public Library (ipl2), to understand the content and dissemination patterns of Twitter messages posted by the ipl2. We conducted a content analysis on ipl2’s messages on Twitter to develop a categorization of the type of tweets, and examined retweets and the active users who retweeted ipl2 tweets. We present our analysis of four areas related to the tweets: motivation, content, audience, and sources. Active users are categorized into eight groups. The research findings contribute to a further understanding of the actual use of Twitter in a digital library.

A Study of Automation from Seed URL Generation to Focused Web Archive Development: The CTRnet Context
Seungwon Yang; Kiran Chitturi; Gregory Wilson; Mohamed Magdy Gharib Farag; Edward Fox

ABSTRACT

In the event of emergencies and disasters, massive amounts of web resources are generated and shared. Due to the rapidly changing nature of those resources (to deliver the latest news), it is important to start archiving them as soon as a disaster occurs. This led us to develop a prototype system for constructing archives with minimum human intervention using the seed URLs extracted from tweet collections.

We present the details of our prototype system including its components and operation. To study the capability and limitations of our prototype, we applied it to five tweet collections that had been developed in advance. The resulting archives of HTML files were examined to compute precision. We also identified five categories of non-relevant files. We conclude with a discussion of findings from the evaluation.

A System for Indexing Tables, Algorithms and Figures
Pradeep Teregowda; Madian Khabsa; Clyde Giles

ABSTRACT

Indexing diverse objects such as documents, figures, tables and algorithms within a single system presents several challenges. These include identifying schema encompassing these objects, identifying overlapping objects, and building a suitable user interface for viewing results from the index. We propose a federated system for indexing and retrieval of objects embedded in academic papers, utilizing a single interface and a federated index.

A Technique for Suggesting Related Wikipedia Articles Using Link Analysis
Christopher Markson; Min Song

ABSTRACT

With more than 3.7 million articles, Wikipedia has become an important social medium for sharing knowledge. However, with this enormous repository of information, it can often be difficult to locate fundamental topics that support lower-level articles. Presently, users must rely on author-generated categories to navigate to related articles. By exploiting the information stored in the links between articles, we propose that related companion articles can be automatically generated to help further the reader’s understanding of a given topic. This approach to a recommendation system uses tested link analysis techniques to present users with a clear path to related high-level articles, furthering the understanding of low-level topics.

An Exploration of the Research Trends in the Digital Library Evaluation Domain
Giannis Tsakonas; Angelos Mitrelis; Leonidas Papachristopoulos; Christos Papatheodorou

ABSTRACT

Evaluation is a vital research area in the digital library domain, demonstrating a growing literature in conference and journal papers. In this paper we explore the directions and the evolution of evaluation research within the decade 2001-2010. For this purpose we studied the evaluation initiatives presented in two main conferences of the digital library domain in this period. The literature is annotated using a domain ontology, named DiLEO, which defines explicitly the main concepts of the digital library evaluation field and their correlations. The ontology instances constitute a semantic network that enables the uniform and formal representation of the critical evaluation constructs in both conferences, untangles their associations and supports the investigation of their evolution. Several findings from this study underline the persistent character of quantitative research in evaluation initiatives.

An Iterative Reliability Measure for Semi-anonymous Annotators
Peter Organisciak

ABSTRACT

This study addresses problems of reliability in the creation of tagged corpora by self-selected semi-anonymous raters. In order to account for both strong and weak raters, this paper contributes a recursive technique for scoring rater reliability. By assigning raters trust scores in the proposed method, candidate labels can be weighted by a condence score and low-condence ratings can be routed to an expert rater or additional amateur raters for further action.

An Unsupervised Technical Difficulty Ranking Model Based on Conceptual Terrain in the Latent Space
Shoaib Jameel; Wai Lam; Xiaojun Qian; Ching-Man Au Yeung

ABSTRACT

Search results of existing general-purpose search engines usually do not satisfy domain-specific information retrieval tasks as there is a mis-match between the technical expertise of a user and the results returned by the search engine. Users have to sift through multitudes of documents or refine queries several times in order to find the document which can suit the technical expertise. In this paper, we investigate the problem of ranking domain-specific documents based on the
technical difficulty of the documents. We argue that technical expertise computation is different from simply finding the readability of the text. We propose an unsupervised conceptual terrain model using Latent Semantic Indexing (LSI) for re-ranking search results based on the conceptual difficulty of the documents. We connect the sequences of terms under a latent space by the semantic distance between the terms and compute the traversal cost indicating the technical difficulty. The resulting geometry can be visualized as a high-dimensional conceptual terrain. We have conducted extensive experiments in three domains and have showed the effectiveness of our proposed model.

Analyzing Current Readability of Historical English Texts
Adam Jatowt; Katsumi Tanaka

ABSTRACT

Human language is subject to constant evolution driven by the need to reflect the ongoing changes in the world and to become more efficient means of communication. In this paper we report results of our studies on the readability of historical documents from the viewpoint of present users. We demonstrate correlation in outcomes of readability measurements and publication dates of documents on the basis of two datasets, the Victorian Women’s Writers Project, the Corpus of Late Modern English Texts. As a second contribution we perform extensive analysis of many lexical factors which impact document readability of historical
texts. For this purpose we study changes in the word usage over time in two large scale lexical corpora: Corpus of Historical American English and Google Books 1-gram. Our ultimate objective is to quantify diachronic change in readability which is a measure of the ease of reading and comprehending text by present users.

Bi2SoN - A Digital Library for Supporting Biomedical Research
Benjamin Köhncke; Sascha Tönnies; Wolf-Tilo Balke

ABSTRACT

In the domain of biology a huge amount of different data sources is available. Therefore, information gathering and searching are challenging tasks. To avoid a manual assessment of all relevant data sources, their knowledge has to be integrated. The presented system focuses on all aspects needed for suitable data integration and retrieval for domain experts from the field of biology. The knowledge from different data sources is combined and further used for, e.g. synonym enrichment of the query term. The resulting prototype was presented to a group of domain experts who confirmed that the system delivers suitable results supporting the scientists by their literature search.

CADAL Digital Calligraphy System
Pengcheng Gao; Jiangqin Wu; Yang Xia; Yuan Lin

ABSTRACT

CADAL(China Academic Digital Associate Library) plays a primary role in Universal Digital Library. By the end of 2011, CADAL has digitized 1.85 million books. Chinese calligraphy occupies an important place in Chinese culture, and the collection of digitized Chinese calligraphy is the large part of CADAL resources. So, the services of making full use of the collections are required for diverse users, art historians, students and the public. Here we propose a CADAL Digital Calligraphy System, in which over 1100 works and 70000 characters are included, the services of multi-level metadata-based search(metadata-based books search, works search and characters search) and multi-grain calligraphic character search(content-based search and radical-based search) are provided. In the end, some search-related applications of CADAL Digital Calligraphy System are discussed.

Characterize Scientific Domain and Domain Context
Jinsong Zhang; Chun Guo; Xiaozhong Liu

ABSTRACT

Domain knowledge map construction as an important method can describe the significant characters of a selected domain. In this research, we will address three problems for knowledge graph generation. Firstly, this paper will construct domain (core journals and conference proceedings) knowledge and domain context
(domain citation) knowledge graphs, and propose a novel method to integrate those graphs. Secondly, two different methods will be investigated to associate keywords on the graph: Co-occur Domain Distance and Citation Probability Distribution Distance. Last but not least, the paper will propose an innovative method to
evaluate the accuracy and coverage of knowledge graphs based on training keyword oriented Labeled-LDA model and validate different domain or domain context graphs.

Collaboration and Communication Tools used by the Biodiversity Heritage Library: Refining Strategies for Success
Trish Rose-Sandler; Keri Thompson; Constance Rinaldo; William Ulate; Martin Kalfatovic

ABSTRACT

Through the application of multiple strategies and tools, the Biodiversity Heritage Library has created an effective and collaborative multi-institutional virtual organization. The purpose of this paper is to explore the communication and collaboration strategies used by the BHL to create, maintain, and provide open
access to its corpus of biodiversity literature. BHL, in its seventh year, is a mature service and no longer a pilot project. Largely driven from the ground up, and without any institutional mandate, the BHL has successfully and organically fostered an organizational model that has encouraged innovation, user engagement, and global expansion.

Data Determination, Disambiguation, and Referencing in Molecular Biology
Shuheng Wu; Besiki Stvilia; Dong Joon Lee

ABSTRACT

Entity and instance determination, disambiguation, and referencing, referred to as authority control in libraries, are essential for scientific research. This study examines the authority control practices and issues in molecular biology using literature and scenario analyses. The analyses imply that the concept of
authority control in molecular biology is associated with three tasks: named entity recognition, disambiguation, and unification. The identified authority control issues are conceptualized as quality problems caused by four sources: inconsistent or incomplete mapping, context change, entity changes, and changes in entity metadata. This study can inform librarians and repository curators of the needs and issues of authority control in molecular biology and other related disciplines (e.g., biomedicine, biochemistry).

Digital Libraries for Computational Journalism: Supporting Human Computation of Collections of Memes
Luis Francisco-Revilla

ABSTRACT

Computational journalism is driving the evolution of news media. As digital artifacts such as tweets and memes proliferate, computational journalism must devise new ways for collecting and analyzing them. This paper presents Breadcrumbs PDL, a specialized Personal Digital Library system that helps readers and journalists to use a collection of user-detected memes. PDL is part of Project Breadcrumbs, which aims to capitalize on public participation in the news media cycle. PDL supports browsing and exploration of a personal workspace, it provides recommendations for workspace organization, and alternative memes to read. Based on the clipping and organizational behaviors of users, and textual similarities between clips, PDL can infer relationships between memes that computers alone cannot easily detect.

Digital Library Interfaces: Comparison of Three Systems
Gilok Choi

ABSTRACT

Digital libraries often require very specialized interfaces in order to present various types of digital content. It is therefore critical to create interfaces that improve presentations of digital information and maximize users’ experience with digital collections. In this respect, this research aims to examine interfaces of three digital libraries providing collections of digital text. Three digital libraries include Open Library, Google Book and Hathi Trust. An evaluation matrix was developed to measure usability, aesthetics and interface components.

The overall findings of the study showed the majority of the participants preferred the Open Library interface followed by Google Books. The statistical analysis indicates that Open Library is significantly better than Google Book and Hathi Trust in terms of usability, aesthetics, and interface components. The preference for the Open Library stemmed largely from aesthetic choices. Participants also appreciated the use of elements that are analogous to their physical counterparts.

Digital Preservation in a Box: Outreach Resources for Digital Stewardship
Butch Lazorchak; Susan Manus; Dever Powell; Jane Zhang

ABSTRACT

In this poster we describe "Digital Preservation in a Box," a major activity of the National Digital Stewardship Alliance Outreach Working group. This toolkit of digital stewardship outreach resources can be utilized by a diverse set of communities as a gentle introduction to the concepts of preserving digital information.

Distinguishing Venues by Writing Styles
Zaihan Yang; Brian D. Davison

ABSTRACT

A principal goal for most research scientists is to publish. There are different kinds of publications: journals, conferences, workshops, etc., covering different topics and requiring different writing formats. It has been demonstrated that authors tend to have unique personal writing styles; however, no work has been carried
out to find out whether publication venues are distinguishable by their writing styles. Our work takes the first step into exploring this problem. Moreover, when a researcher that is new to a certain research domain finishes his work, it is sometimes difficult for him to find a proper place to submit his paper. To solve this problem,
we provide a collaborative-filtering-based recommendation system that can provide venue recommendations to researchers. In particular, we consider both topic and writing-style information, and differentiate the contributions of different neighboring papers to make such recommendation. Experiments based on real data from ACM
and CiteSeer digital libraries demonstrate that venues also have distinct writing styles, and that our approach can provide effective recommendations.

Do Public Library Websites Consider the Disabled or Senior Citizens?
Yong Jeong Yi; Ji Hei Kang

ABSTRACT

The issues of mobility and sight impairment are with respect to virtual accessibility are as important as physical accessibility when it comes to using public library services. However, few studies have discussed public library website accessibility from the perspective of underrepresented user groups. The purpose of this study is to evaluate the accessibility of websites of public libraries and further identify the association between accessibility and public libraries‟ budgets. The study selected 20 public library systems that have the highest percentages of the disabled or senior citizen patrons. The study employed the Pearson correlation test in order to investigate the correlation between the accessibility and the budgets of public libraries. Preliminary findings show that most current public library websites do not comply with the Section 508. The findings indicate that public libraries did not consider their users or potential users with physical disabilities when designing their websites. Therefore, the findings suggest that public library websites are not suited to deliver effective information services for underrepresented user populations who need special assistance. Furthermore, this study finds that there is no significant association between the public library websites' accessibility and the budgets.

eDeposit: Current Activities and Future Plans at The Library of Congress
Erik Delfino; Jane Mandelbaum

ABSTRACT

The Library of Congress has been engaged in an initiative to acquire electronic serials for the Library of Congress through the Copyright Office. The Library plans to build on the lessons learned to provide a framework for scaling existing capabilities as well as addressing and integrating future capabilities for digital acquisition streams. Opportunities exist to take advantage of best practices and trends in the communities (such as publishers) from which the Library obtains digital content, as well as the communities which the Library serves.  

Electronic Records Processing: It's a CINCH!
Amy Rudersdorf; Dean Farrell; Lisa Gregory

ABSTRACT

In August 2011, five project partners (the State Library of North Carolina, the North Carolina State Archives, North Carolina Libraries for Virtual Education, Elon University, and the University of North Carolina at Charlotte) began a collaboration to develop a computer application that collects, ingests, and authenticates the electronic records that libraries and archives are often mandated to maintain. The application, called "CINCH," incorporates existing digital curation technologies, but adds to their functionality by creating a pull-down (or capture) utility to gather content made available through Internet file sharing. The final product will be a lightweight, open-source software tool that libraries, archives, and agencies with similar requirements – to collect and authenticate records on ingest – can employ to retrieve and process their digital content.

Enhancing Digital Libraries and Portals with Canonical Structures for Complex Objects
Scott Britell; Lois Delcambre; Lillian Cassel; Edward Fox; Richard Furuta

ABSTRACT

Individual digital library resources are of interest in their own right, but, in some domains resources can be part of (perhaps multiple) complex objects. We focus on domains with complex objects where a digital library user can benefit from seeing and browsing a resource in the context of its structure(s). The first contribution of our work is the definition of canonical structures that can represent local digital library structures; the canonical structures allow us to provide sophisticated browsing/navigation aids in a generic way. We demonstrate our approach by mapping curricula of varying structure in an educational repository to canonical structures. We exploit these structures in a navigation widget that shows a resource within the current curriculum and also shows other resources that refer to this resource. The second contribution of our work is the evaluation of a means to transfer the structure of our resources to a digital library portal. We implement and evaluate approaches based on OAI-PMH and OAI-ORE using Dublin Core – with and without a custom namespace. We also transfer the canonical structure to a portal where our navigation widget is implemented. We implemented these ideas in the Ensemble portal for computing education and we formalize all of the necessary data structures and mappings.

Global Web Archive Integration with Memento
Robert Sanderson

ABSTRACT

In this poster, we describe the approach taken to designing and implementing a tera-scale multi-repository index of archived web resources using massively parallel processing.

GROTOAP: GROund Truth for Open Access Publications
Dominika Tkaczyk; Artur Czeczko; Krzysztof Rusek; Łukasz Bolikowski; Roman Bogacewicz

ABSTRACT

The field of digital document content analysis includes many important tasks, for example page segmentation, the goal of which is to nd all geometric components (zones, lines and words) of the document, or zone classication, which attempts to associate each zone of the document with a label denoting zone's specic role (eg., title, authors, abstract, bibliographic reference). It is impossible to build effective solutions for such problems and evaluate their performance without a reliable test set, that contains both input documents and expected results of segmentation and classication.

Has It Been Already Digitized? How to Find Information about Digitized Documents
Tomas Foltyn

ABSTRACT

The Digitization Registry of the Czech Republic is the research project, which aim is to create national registry of digitized documents that enables to avoid unwanted duplicities in the digitization as well to share the digitization results across the Czech Republic. This could make the digitization more effective and also save the financial resources.

How Can Spreaders Affect the Indirect Influence on Twitter?
Xin Shuai; Ying Ding; Jerome Busemeyer

ABSTRACT

Most studies on social influence have focused on direct influence, while another interesting question can be raised as whether indirect influence exists between two users who're not directly connected in the network and what affects such influence. In addition, the theory of complex contagion tells us that more spreaders will enhance the indirect influence between two users. Our observation of intensity of indirect influence, propagated by n parallel spreaders and quantified by retweeting probability on Twitter , shows that complex contagion is validated globally but is violated locally. In other words, the retweeting probability increases non-monotonically with some local drops.

Improving an Hybrid Literary Book Recommendation system through Author Ranking
Paula Cristina Vaz; David Martins de Matos; Bruno Martins; Pavel Calado

ABSTRACT

Literary reading is an important activity for individuals and choosing to read a book can be a long time commitment, making book choice an important task for book lovers and public library users. In this paper we present an hybrid recommendation system to help readers decide which book to read next. We study book and author recommendation in an hybrid recommendation setting and test our approach in the LitRec data set. Our hybrid book recommendation approach purposed combines two item-based collaborative filtering algorithms to predict books and authors that the user will like. Author predictions are expanded in to a book list that is subsequently aggregated with the former list generated through the initial collaborative recommender. Finally, the resulting book list is used to yield the top-n book
recommendations. By means of various experiments, we demonstrate that author recommendation can improve overall book recommendation.

Introducing High Performance Computing in Digital Library Processing Workflows
Bill Barth; Maria Esteva; Jon Gibson; Ladd Hanson; Christopher Jordan

ABSTRACT

As larger collections need to be processed for digital library projects, libraries have to adopt technologies of scale. We present a case that involved creating image derivatives using High Performance Computing (HPC) resources. This experience opens up possibilities to conduct various processing tasks effectively and in reasonable time frames. Most importantly, it enables library IT staff access to cyberinfrastructure that can address the computing challenges of large-scale digital library projects.

Investigating User Perceptions of Engagement and Information Quality in Mobile Human Computation Games
Dion Goh; Khasfariyati Razikin; Chei Sian Lee; Alton Chua

ABSTRACT

Mobile Human Computer Games (HCGs) were developed with the objective of generating useful information as a byproduct of gameplay. The information generated in the games could then be used for other purposes. As user participation in HCGs is critical for the application to succeed, the game mechanics in mobile HCGs should ideally continually engage users to contribute location-based information and generate useful information for others. In this paper, we investigate the user perceptions of engagement and information quality of a mobile HCG by comparing those qualities with a mobile content sharing application. Results suggest that the mobile HCG enabled participants to occupy their leisure time but the information contributed were not as relevant as those contributed in the mobile content sharing application. Additionally, the variables control and completeness in information influence intention to use the mobile HCG. Implications of this study are discussed.

Lessons Learned from Developing and Evaluating a Comprehensive Digital Library for Engineering Education
Yunlu Zhang; Alice Agogino; Andrea Niess

ABSTRACT

Educating the engineering education community in today‘s digital world requires straightforward yet flexible access to high-quality educational resources. The Teach Engineering and NEEDS (National Engineering Education Delivery System) digital libraries collaborated in 2005 to create and steward the K-Gray Engineering Pathway (EP), a premier portal to comprehensive engineering and computing education resources within the greater National Science Digital Library (NSDL). We collaborated to design navigation, implement features, and find imagery that could effectively address both K-12 and higher education audiences. A system was designed to serve both target audiences, including an expanded simple search on every page to include grade/audience level search fields. This search, on all main pages, also includes a choice of learning resource type and a link to the Advanced Search with expanded search fields. EP tailored many features such as community pages and cataloging to be distinguishable by K-12 versus higher education users. We also added disciplinary / interdisciplinary pages to further tailor the search and resources for different teaching and learning communities. Evaluation studies show that our current strength is a consistent interface with strong usability features. In this paper, we a provide retrospective and summarize our lessons learned and evaluation results, along with our directions for future research and development.

Meta-Line: Lineage Information for Improved Metadata Quality
Sascha Tönnies; Benjamin Köhncke; Wolf-Tilo Balke

ABSTRACT

Controlled content quality also in terms of indexing is one of the major advantages of using digital libraries in contrast to general Web sources or Web search engines. However, considering today’s information flood the mostly manual effort in acquiring new sources and creating suitable (semantic) metadata for content indexing and retrieval is already prohibitive. A recent solution is given by automatic generation of metadata, where various methods currently become more widespread. But in this case neglecting quality assurance is even more problematic, because heuristic generation often fails and the resulting low-quality metadata will directly diminish the quality of service that a digital library provides. To address this problem, we propose a metadata quality model to determine the overall quality of a metadata set and validate individual requirements imposed on that metadata set. Furthermore, lineage information is provided to trace the quality evolution of a metadata set. This model is incorporated into a novel architecture for metadata quality control similar to data warehousing. It is enhanced by an online repository shared between both content and service providers to make the metadata generation process and the respective quality assessment transparent and reusable. Our experiments based on documents from the field of chemistry indeed show that the quality highly depends on data formats and the semantic interpretation of different metadata fields. Therefore, transparent generation processes are mandatory for quality controlled digital libraries in both harvesting and indexing content.

Multiple Views of the ACM Classification System
Xia Lin; Mi Zhang; Haozhen Zhao; Jan Buzydlowski

ABSTRACT

The ACM Computing Classification System (CCS) is a hierarchical classification system used to index and classify all the published literature of ACM. The terms and hierarchy of CCS are selected and organized by experts in the field. They reflect major areas and topics of the field of computing and they often serve as an overview and navigational guide to the field. However, similar to all the traditional classification systems and subject domain thesauri, such an overview and navigational guide is static and sketchy, representing an idealistic, top-down representation of a domain. It does not reflect how well the classification system covers the literature and how well the users can use it to index and classify their documents. In this paper, we look into a 10-year period of ACM literature and examine how the CCS terms are
actually used in the ACM digital library and how the patterns of term usages show different term relationships than those defined in the CCS. By comparing the dynamic statistical patterns of term usage with the static hierarchical structures of the terms, we show that much can be gained by integrating both of them into an interactive interface to provide better overview maps and navigational guides to the domain of computing.

National Digital Newspaper Program: A Case Study in Sharing, Linking, and Using Data
Nathan Yarasavage; Robin Butterhof; Christopher Ehrman

ABSTRACT

This poster presents a case study describing how the National Digital Newspaper Program’s (NDNP) metadata specification and public website, Chronicling America, have been designed to promote a wide range of data sharing. Through use of the website’s extensive application programming interface (API) and open-source software counterpart, several institutions are benefiting from the publicly-funded program’s data.

Responsibility for Research Data Quality in Open Access: A Slovenian Case
Janez Stebe

ABSTRACT

In the framework of a project aiming to realize a strategy of open research data access in Slovenia in accordance with OECD principles, we conducted a series of interviews with different target audiences in order to assess the initial conditions in the area of data handling. The emphasis in the current paper is on aspects of data quality curation. It reports foreseen problems and suggested solutions to overcome barriers, along with the accountability and mutual expectations of various stakeholders and institutions such as data creators, specialized services (e.g. research libraries) and policymakers. The data creators and data services expressed a high level of awareness about data quality issues, especially in relation to good publication potential. Barriers to ensuring the greater accessibility of data in the future include the little recognition and reputation for doing the related extra work involved in preparing data and documentation, the need for financial rewards for such additional work, and the undeveloped culture of data exchange in general. The concern for quality in the data creation phase is based on narrow professional competencies. Possessing those competencies is a condition for enabling wider open data access. The motivation to provide open access to such data will involve a combination of requirements prescribed for data delivery, and the provision of support services and financial rewards, in particular changing the views held by the professional scientific community about the benefits of open data for research activities.

Scientific Cyberlearning Resources Referential Metadata Creation Via Information Retrieval
Xiaozhong Liu; Han Jia

ABSTRACT

The goal of this research is to describe an innovative method of creating scientific referential metadata for a cyberinfrastructureenabled learning environment to enhance student and scholar learning experiences. By using information retrieval and metasearch approaches, different types of referential metadata, such as
related Wikipedia Pages, Datasets, Source Code, Video Lectures, Presentation Slides, and (online) Tutorials, for an assortment of publications and scientific topics will be automatically retrieved, associated, and ranked. In order to test our method of automatic cyberlearning referential metadata generation, we designed a user
experiment for the quality of the metadata for each scientific keyword and publication and resource ranking algorithms. Evaluation results show that the cyberlearning referential metadata retrieved via meta-search and statistical relevance ranking can effectively help students better understand the essence of scientific keywords and publications.

Semantic Digital Libraries for Experimental Data and Provenance
Mark Hedges; Tobias Blanke

ABSTRACT

This paper describes an environment for the ’sheer curation’ of the experimental data and provenance of a group of researchers in the life sciences. The approach involves embedding data capture and interpretation within researchers’ working practices, so that it is automatic and invisible to the researcher. The environment does not capture just the individual datasets generated by an experiment, but the entire workflow that represent the ’story’ of the experiment, including intermediate files and provenance metadata, so as to support the verification and reproduction of published results. As the curation environment is decoupled from the
researchers’ processing environment, the provenance is inferred from a variety of domain-specific contextual information, using software that implements the knowledge and expertise of the researchers. We also present an approach to publishing the data files and their provenance according to linked data principles by using OAI-ORE (Open Archives Initiative Object Reuse and Exchange) and OPMV.

Sharing and Using STEM Digital Content in School Libraries
Marcia Mardis; Casey McLaughlin; Grant Gingell

ABSTRACT

Digital content can benefit K-12 science, technology, engineering, and mathematics (STEM) teaching and learning, but it is not widely integrated. many school librarians are not sure how to build upon their expertise to share and link digital learning resources in their roles as resource providers and instructional
collaborators. This paper introduces early results of a survey of school librarians’ digital collection practices and then presents Web2MARC, a web-based application for integration of digital resources into school library collections. Further work on the state of school library STEM collections, survey analysis, and Web2MARC is slated to be complete in 2012.

Social Network-based Recommendation: A Graph Random Walk Kernel Approach
Xin Li; Xin Su; Mengyue Wang

ABSTRACT

Traditional recommender system research often explores customer demographics, product characteristics, and transactions in providing recommendations. With the development of Web 2.0, social networks are now an inherent feature of many websites. Such social relationships reflect social influences and are related to individuals’ preferences. This study investigates the recommendation problem based solely on people’s social network information. Taking a kernel-based approach, we capture consumer social capital similarities into a graph random walk kernel and build SVR models to predict consumer opinions on products. We employ a dataset from a movie review website for evaluation, in which our proposed model outperforms trust-based (i.e., social influence) based models and other state-of-the-art graph kernels.

The David Livingstone Spectral Imaging Project
Stephen Davison; Adrian Wisnicki; Elizabeth McAulay

ABSTRACT

The David Livingstone Spectral Imaging Project is a collabora-tive, international effort to use spectral imaging technology and digital publishing to make available a series of faded, illegible texts produced by the famous Victorian explorer when he was stranded without ink or writing paper in Central Africa. The poster describes existing achievements of the project and preser-vation challenges.

The Logical Form of the Proposition Expressed by a Metadata Record
Karen Wickett; Allen Renear

ABSTRACT

Metadata records are a ubiquitous and foundational feature of contemporary information systems. However, while their simple surface structure may lead us to think that the semantics of a metadata record is unproblematic and easily discerned, our analysis of an example record suggests otherwise. We show that there are at least three plausible possibilities for the logical form of the proposition expressed by a metadata record. Not only are all three substantially different in the first order constructs utilized, but no two can be recognized as equivalent for the purposes of information organization. The semantics of the common metadata
record is elusive. The main source of this problem appears to be the identier attribute. Although identier attributes have the syntactic appearance of any other attribute in the metadata vocabulary, this uniformity conceals their potential for assuming a distinctive semantic role, and one which appears to cross the traditional object language / metalanguage boundary, suggesting that translation of colloquial metadata records into logic-based knowledge representations does not take place entirely at a first-order level.

Toponym Extraction and Resolution in a Digital Library
James Creel; Katherine Weimer

ABSTRACT

Geospatial metadata enable rich and varied interfaces to digital collections. The promise and power of such interfaces has entered the popular imagination with the advent of interactive maps on the Web. While the extraction of geospatial metadata has much in common with extraction of traditional bibliographic metadata like titles, authors, and subject-headings, geospatial metadata present unique challenges and aordances. Digital libraries present a compelling medium for geospatial metadata. We describe our findings employing the Texas A&M University (TAMU) Libraries geoparser towards toponym extraction and name disambiguation techniques to the University's Institutional Repository.

Towards Social Collecting for #linking #using #sharing
Michael Zarro; Catherine Hall

ABSTRACT

In this paper, we describe the concept of social collecting by examining the website Pinterest.com. Social collecting exists in a space between tagging, social media, and private collecting or Web-clipping services. Patrons, the users of social collecting sites, are curators in this model; linking, sharing, and using digital images in collections they create and maintain. In this work social collecting is compared to traditional library services and social tagging models.

YADDA2 - Assemble Your Own Digital Library Application from Lego Bricks
Wojtek Sylwestrzak; Tomasz Rosiek; Łukasz Bolikowski

ABSTRACT

YADDA2 is an open software platform which facilitates creation of digital library applications. It consists of versatile building blocks providing storage, relational and full-text indexing, process management, and asynchronous communication, to name a few. Its loosely-coupled service-oriented architecture enables deployment of highly-scalable, distributed systems. Drawing from rich experience in developing digital library infrastructure systems, the framework's authors have captured common patterns and requirements and created a set of building blocks for seamless assembly of a tailor-made future-proof digital library. The platform's openness-
by-design allows its existing components to be easily modied to cater for specic application's requirements, or to be extended with additional components, complementing the already existing ones.