The last 36 hours have been a time of many celebrations, including Holi, Purim, St. Patrick's Day, and the Buddhist holiday of the Display of Miracles, "Chotrul Duchen." In keeping with the spirit of this auspicious time BDRC is excited to announce an app that many of our users have requested for years. Our Optical Character Recognition (OCR) App can "recognize" Tibetan writing that appears in BDRC scans and convert it into text that you can use in translation tools, word processors, digital media, and so on. This is the first publicly available desktop app for Tibetan OCR and it represents another breakthrough for BDRC.
The app is easy to use: you upload images of Tibetan texts, select the script type, and then run the OCR process. It only takes about 15 seconds to run OCR on a single image, and it can work on entire volumes at once. Scholars and translators embrace this technology because it cuts down on the time it takes to transcribe long passages. The text that has been OCRed can be searched, thus enabling discoveries that might take a reader days to do the old-fashioned way. The app is free, and the download link and instructions for installation and use are found at:
https://github.com/buda-base/tibetan-ocr-app (please scroll halfway down this page to find the right section).

As alluded to above, this app can recognize a variety of Tibetan scripts. It does best with Uchen scripts (especially in computer font and woodblock print), although it also recognizes the two major Ume scripts found in Tibetan Buddhist manuscripts. The OCR models that are the brains of the app were developed by Eric Werner in partnership with Élie Roux, Pentsok W Rtsang, and the Monlam AI team, especially Tashi Tsering. You can read the full story of how the app was developed in a previous blog post here.
This is a beta release and we hope to be able to further refine the OCR models and add additional functionality. Please try it today and send your questions and feedback to help@bdrc.io.
Sorry, the comment form is closed at this time.