Relational Database Design and Implementation for Biodiversity Informatics
Paul J. Morris
The Academy of Natural Sciences 1900 Ben Franklin Parkway, Philadelphia, PA 19103 USA
Received: 28 October 2004 - Accepted: 19 January 2005
Abstract
The complexity of natural history collection information and similar information within the scope of biodiversity informatics poses significant challenges for effective long term stewardship of that information in electronic form. This paper discusses the principles of good relational database design, how to apply those principles in the practical implementation of databases, and examines how good database design is essential for long term stewardship of biodiversity information. Good design and implementation principles are illustrated with examples from the realm of biodiversity information, including an examination of the costs and benefits of different ways of storing hierarchical information in relational databases. This paper also discusses typical problems present in legacy data, how they are characteristic of efforts to handle complex information in simple databases, and methods for handling those data during data migration.
Introduction
The data associated with natural history collection materials are inherently complex. Management of these data in paper form has produced a variety of documents such as catalogs, specimen labels, accession books, stations books, map files, field note files, and card indices. The simple appearance of the data found in any one of these documents (such as the columns for identification, collection locality, date collected, and donor in a handwritten catalog ledger book) mask the inherent complexity of the information. The appearance of simplicity overlying highly complex information provides significant challenges for the management of natural history collection information (and other systematic and biodiversity information) in electronic form.