TBRC – Harvard Partnership to Support the Long-Term Preservation of Tibetan Texts

June 07, 2014

Long-term preservation

I am over thrilled to announce a historic partnership between Tibetan Buddhist Resource Center and Harvard University. The Harvard Library and Harvard Digital Repository Services will support the long-term preservation of our entire digital collection. Please see the article in Harvard Magazine.

Scope

The Harvard Library and Harvard Digital Repository Services (DRS) will receive TBRC's entire body of work to-date, some 9.4 million pages, spanning an enormous range of Tibetan literature. Read more about the scope of the Tibetan literary heritage here. In addition to our scanned corpus, we will deposit our entire framework of archiving methodologies, cultural registries, metadata and source code for the digital library. Depositing such a complete package is possible because we developed a modular archival system that provides the technical basis for corpus dissemination. At the same, a complete deposit is necessary to provide a comprehensive bibliographic and cultural context for the texts in the long-term.

TBRC's Tibetan collection is the largest collection of Tibetan literature in the world – 20 times the size of Derge Parkhang, the largest printing house in Tibet. In addition to Tibetan, TBRC's digital collection includes Sanskrit and Mongolian Buddhist texts, as well as some Chinese and English language texts important in the study of Tibet. This collection was created and curated by our founder, the eminent scholar of Tibet, E. Gene Smith.

Scanned texts

Harvard Library will steward a comprehensive cache of our corpus for preservation. This includes:

Archival scans – the high resolution versions that come right out of the scanner.
Web derivative scans – the page images that are compressed and resized for web delivery.
Print masters based on archival scans for printing paper texts.
eBooks – digital-born electronic files from publication projects around the world.

The total cache is upwards of 20 million files.

Metadata

In addition to scanned texts, Harvard Library will archive Tibetan metadata generated from TBRC's extensive cataloging effort starting in 1999. This includes our entire collection of metadata related to Persons, Places, Subjects, detailed Outlines (dkar chag) as well as other data collections such as lineage transmissions, important in Tibetan culture. The total output of metadata is upwards of 300,000 records. The long-term preservation of metadata is essential for both preservation and access of the scanned pages into the distant future.

Electronic Files

In addition to scanned texts and metadata, the TBRC digital collection also includes a growing collection of electronic files – digitalized scanned texts. These files have been input by preservation centers and submitted to TBRC for access and preservation. Electronic files were submitted by Thrangu Rinpoche, Drigung Chetsang, Karma Delek, Sechen Monastery, Karma Lekshey Ling, Larun Gar, among others.

Source Code

In addition to the metadata and scanned corpus, we will also deposit our digital library software source code. This will make it possible to easily build a working digital library in the future.

State of the Art Preservation

In order to execute this project, we have had the pleasure of working with amazing individuals at Harvard and their state of the art library. In a simplified form, the preservation system includes two components (1) Hollis cataloging records and (2) DRS presevation framework.

Hollis Records

Hollis records are essentially Library of Congress Marc records. These provide the primary bibliographic context for the preservation resources in DRS. Hollis records will be created for each TBRC digital work (nearly 8,000) , including scanned texts and digital-born works.

Digital Repository Services

We are very fortunate to be one of the first depositors into DRS2, the second generation DRS system. The system is designed to handle a wide variety of digital resources. The types of digital resources are described according to content models, and each content model has a specific deposit workflow. Here is an inventory of the types of content we are depositing (TBRC DRS HOLLIS Content Inventory) along with the DRS content models we are using.

We are indebted to the excellent and professional team at Harvard Library and DRS.

What does this mean?

What does this arrangement mean? How will access to the literature change? Has TBRC been incorporated into Harvard?

TBRC will continue to do its work scanning, preserving, organizing and disseminating Tibetan texts. The TBRC digital collection will essentially be dark archived in DRS. If at a point in time, TBRC ceases to exist, the DRS links would be turned on, and made available to the public. As long as TBRC exists, access will be through our tbrc.org.

Long-term stability of Tibetan literature

The primary focus of this arrangement is on long-term preservation. Digital files are ubiquitous but also perishable. From the DRS website, having a "a set of professionally managed services to ensure the usability of securely stored digital objects over time" is critical to the long-term stability of Tibetan literature. For TBRC, this is mission-fulfilling as it means that all our work will be available for a very long time!

tbrc.org

The primary means of access to TBRC will continue to be through its multi-language online library at tbrc.org. This is a resource that is relied on by thousands of people every day and this will remain as the primary access point. Readers will see records in Hollis, but the links to the digital texts will be to tbrc.org

Acknowledgements

There are series of individuals that should be acknowledged, and without whose support, this would not have been possible. In particular, I'd like to acknowledge:

Dan Hazen for his support and encouragement.
Richard Lesage for his dedication and work making this happen in the Harvard Library.
Andrea Goethels, who was very encouraging at the early stages and offered technical vision for making the metadata available.
Corinna Baksik who has worked tirelessly to improve the TBRC Marc service.
Lauran Hartley and her team at Columbia who provided enormous support for the creation of the TBRC Marc service.
Vitaly Zakuta who has guided us through the DRS content models
Peter Suber who was and continues to be enormously helpful in advising us on access issues.
Kyle Courtney, the spirited librarian/attorney who helped us understand copyright issues.
Michael Sheehy of TBRC for guiding our staff on normalizing our bibliographic metadata.
Paldor Zagatse of TBRC for normalizing our titles and his editorial work on outlines and the scanned corpus
Jigme Tashi of TBRC for organizing our digital born text corpus.
Karma Gongde of TBRC for his tireless cataloging efforts
Christine Tomlinson of TBRC for her incredible technical stamina and precision implementing a wide range of technical issues.
Our board of directors Tulku Thondup Rinpoche, Alak Zenkhar Rinpoche, Leonard van der Kuijp, Janet Gyatso, Tim McNeill, Patricia Gruber, Richard Lanier, Cangioli Che, Michele Martin, Gray Tuttle, Kurt Keutzer. And to our former directors Shelley Rubin and David Lunsford.
And all those who either directly or indirectly supported TBRC.

Dedicated to our founder E. Gene Smith

Through all our efforts, we have, as a community, created the conditions for this work to sustain. This work is dedicated to the brilliant vision of our founder, E. Gene Smith.