Difference between revisions of "Journal:No specimen left behind: Industrial scale digitization of natural history collections"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Adding more.)
(No difference)

Revision as of 23:43, 3 March 2016

Full article title No specimen left behind: Industrial scale digitization of natural history collections
Journal ZooKeys
Author(s) Blagoderov, V.; Kitching, I.J.; Livermore, L.; Simonsen, T.J.; Smith, V.S.
Author affiliation(s) Natural History Museum - London
Primary contact E-mail: v.blagoderov@nhm.ac.uk
Editors Penev, L.
Year published 2012
Volume and issue 209
Page(s) 133-146
DOI 10.3897/zookeys.209.3178
ISSN 1313-2970
Distribution license Creative Commons Attribution 3.0 Unported
Website http://zookeys.pensoft.net/articles.php?id=2916
Download Click "PDF" button on website to generate


Traditional approaches for digitizing natural history collections, which include both imaging and metadata capture, are both labour- and time-intensive. Mass-digitization can only be completed if the resource-intensive steps, such as specimen selection and databasing of associated information, are minimized. Digitization of larger collections should employ an “industrial” approach, using the principles of automation and crowd sourcing, with minimal initial metadata collection including a mandatory persistent identifier. A new workflow for the mass-digitization of natural history museum collections based on these principles, and using SatScan® tray scanning system, is described.

Keywords: Digitization, imaging, specimen metadata, natural history collections, biodiversity informatics


Natural history collections are of immense scientific and cultural importance. Specimens in public museums and herbaria and their associated data represent a potentially vast repository of information on biodiversity, ecosystems and natural resources for the widest range of stakeholders, from governments and NGOs to schools and private individuals. Numerous examples of the uses to which biodiversity data derived from natural history collections have been put in research on evolution and genetics, nature conservation and resource management, public health and safety, and education are widely available (summarized in Chapman 2005, Baird 2010).[1][2] The universe of natural history collection data has been estimated to be between 1.2 and 2.1 × 109 units (specimens, lots and collections) (Ariño 2010).[3] To ensure efficient access, dissemination and exploitation of such an immense wealth of biodiversity relevant data, it is evident that a well-coordinated and streamlined approach to global digitization is required, in particular because it is absolutely essential for the scientific value of the generated data that the outputs (images, metadata, etc.) are linked together and also back to the original specimens via unique identifiers (uIDs).

In recent years, substantial efforts and resources have been invested into the digitization of natural history collections, with museums and herbaria routinely employing specimen level collection databases to replace older, paper-based card indexes and ledgers. In theory, this should make dissemination of specimen data through biodiversity informatics portals such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org/) very simple and straightforward. However, the truth is that natural history collections are almost as far from complete digitization as they were 20 years ago. Ariño (2010)[3] estimated that no more than 3% of biological specimen data is web-accessible through GBIF, the largest source of biodiversity information. Consequently, there is neither a central database of collection holdings, nor a complete collection index available to users. The reason for this deficiency is partly the immense effort it would take to digitize the vast number of collections units involved (Vollmar et al. 2010).[4] The cost of traditional digitization workflows is vast, both in financial and human terms. Our simple calculations have shown that complete databasing of the ~30 million insect specimens housed in the entomological collection of the Natural History Museum, London, would require 23 years of continuous work from the entire departmental staff to complete (65 people). Depending on the particular collections and curatorial practices used, estimates vary from US$0.50 to several dollars per specimen to capture full label data (Heidorn 2011).[5] The cost of traditional imaging and databasing of every natural history object in all European museums was recently estimated as €73.44 per object (Poole 2010).[6] Thus, the complete digitization of all natural history collections may cost as much as €150, 000 million, and take as long as 1,500 years.

The most common solution proposed to overcome the enormous cost of digitization is prioritization based on user demand (Berents et al. 2010).[7] Currently, most digitization projects concentrate their efforts on obtaining high quality images of selected specimens accompanied by high quality data (e.g., comprehensive and expertly interpreted label information) rather than total collections coverage. Such specimen-centric digitization efforts are thus inevitably fragmented into numerous small-scale and labour-intensive projects that usually image single specimens, one at a time.

To solve the problem of cost, as well as the inherent fragmentation in collection based biodiversity informatics, new, industrial-scale approaches to digitization are clearly needed. The larger a digitization project becomes, the lower are the transaction costs and thus the lower is the cost per specimen. Such an industrial-scale process must necessarily fulfill certain standardized criteria if it is to be of use to and adopted by a wide spectrum of natural history collections:

  • As much as possible of the procedure must be automated, except when physical handling of specimens is necessary.
  • The approach should, whenever possible, focus on “wall-to-wall” total digitization of entire collections, because it is faster to digitize an entire collection than to select individual specimens or drawers of particular interest.
  • Complicated labour-intensive procedures must be divided into a series of separate, shorter steps, each with a distinct outcome. For example, preparation of specimens for imaging should be a separate step from the imaging itself; and unique specimen identifiers can be assigned simultaneously to all specimens in a drawer rather than individually and sequentially. Such a modularised process can then be more easily crowd-sourced among the professional and volunteer communities. Properly organized crowd-sourcing projects would be able to mobilise the efforts of thousands of enthusiasts around the world (Hill et al. 2012).[8]
  • Collection of metadata must be simplified and standardized. In most cases, digital representation of the specimen and minimal metadata (uID, specimen location in the collection) is sufficient for collection management purposes. Only minimal information should be collected when initially digitizing an entire collection, but in such a way that it can be amended and expanded upon later.

Here we describe a new method for “wall-to-wall” mass-digitization of natural history museum collections based on the SatScan® tray scanning system. The method allows for standardized scanning of museum collection trays of the highest image quality possible, followed by simplified (and easily expandable) collection of metadata.


  1. Chapman, A. (2005). "Uses of Primary Species-Occurrence Data, version 1.0". Report for the Global Biodiversity Information Facility, Copenhagen. Global Biodiversity Information Facility. pp. 100. http://www.gbif.org/resource/80545. 
  2. Baird, R. (2010). "Leveraging the fullest potential of scientific collections through digitization". Biodiversity Informatics 7 (2): 130–136. doi:10.17161/bi.v7i2.3987. 
  3. 3.0 3.1 Ariño, A.H. (2010). "Approaches to estimating the universe of natural History collections data". Biodiversity Informatics 7 (2): 81–92. doi:10.17161/bi.v7i2.3991. 
  4. Vollmar, A.; Macklin, J.A.; Ford, L. (2010). "Natural history specimen digitization: Challenges and concerns". Biodiversity Informatics 7 (2): 93–112. doi:10.17161/bi.v7i2.3992. 
  5. Heidorn, P.B. (2011). "Biodiversity informatics". Bulletin of the American Society for Information Science and Technology 37 (6): 38–44. doi:10.1002/bult.2011.1720370612. 
  6. Poole, N. (November 2010). "The Cost of Digitising Europe's Cultural Heritage". Collections Trust. pp. 82. http://www.collectionstrust.org.uk/item/739-the-cost-of-digitising-europe-s-cultural-heritage. 
  7. Berents, P.; Hamer, M.; Chavan, V. (2010). "Towards demand driven publishing: Approches to the prioritisation of digitisation of natural history collections data". Biodiversity Informatics 7 (2): 113–119. doi:10.17161/bi.v7i2.3990. 
  8. Hill, A.; Guralnick, R.; Smith, A. (2012). "The notes from nature tool for unlocking biodiversity records from museum records through citizen science". ZooKeys 209: 219–233. doi:10.3897/zookeys.209.3472. 


This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Additionally, a missing reference (Vollmar et al. 2010) was added.