by Henrike Lähnemann
For the Digital Community of Practice organised by Emma Huber biweekly in the Taylorian Library, I had promised to speak on Transkribus on Thursday 2 November 2023 but have got a date clash; hence this short report which is meant to be supplemented by two practical demonstrations: by Karen Wenzel on transcribing vernacular Bible manuscripts for the project Österreichischer Bibelübersetzer (here a report on her work with Transkribus, see also the slides she used) and Lena Vosding on setting up the second volume of the Lüne letters for a workflow from hyper-diplomatic transcription to critical edition.
Transkribus is – as their own description http://transkribus.eu says – an “AI-powered platform for text recognition, transcription and searching of historical documents – from any place, any time, and in any language” which can be used either as a browser-based “lite” version or in a desktop version. It started as an EU project but has now become a commercial software; limited usage is still free but for large-scale projects “credits” have to be bought.
1. First Impressions
I first encountered Transkribus while it was still being developed, when I was evaluating a large research project in Augsburg in 2018, the above mentioned ‘Österreichischer Bibelübersetzer’, a large scale project involving dozens of German manuscripts, where one part of the project team was working with it, the other was transcribing manually. I thought the idea interesting but was not fully convinced – it involved hundreds of pages of manually transcribing, then multiple rounds of correcting. What already impressed me was the structuring of pages which made it easier to transcribe text manually because there was less likelihood in skipping lines or similar. But it was definitely not the tool of choice for normalised editions when the effort to turn a line-by-line semi-accurate diplomatic transcription into a readable text was much higher than what a practised editor could do directly from the manuscript. When I returned in 2023 for the next evaluation, the situation had really improved – and not the least because of the massive training data which the project had amassed; they are a recognised partner and use the service for free in recognition of their input.
2. Transkribus as Teaching Tool
My first real encounter with Transkribus came in 2021 when I was supervising two research interns, Agnes Hilger and Eva Neufeind, who worked on the digital edition, transkribus & encoding with the colourful Arnold von Harff manuscript at the Bodleian which led to the edition https://editions.mml.ox.ac.uk/editions/harff/
I found that it was decidedly quicker for me to correct pages transcribed by the programme than those done by the students themselves – the programme made more mistakes but they were much more obvious (e.g. they concerned a high proportion of capital letters: Lerusalem instead of Jerusalem) and easy to spot. With their training data, Agnes and Eva reached an accuracy of over 97% which is particularly impressive for a very idiosyncratic hand with a high degree of flourish which takes a while to get used to as human reader. You can watch their project presentation here from minute 7:23
3. The ScanTent
Looking more like a gimmick than a scholarly instrument, one fun side product of Transkribus was the ScanTent. Here a test directly after buying it:
In my experience, it can be a pain to set up (the rubber ends for the sticks did not prove too reliable and bits of the plastic support broke off) but once set up, it works well to keep shadows off the book and to speed up the process for bulk jobs. The DocScan app could be improved but it makes DIY digitisation portable and fast. The LED light is a nice add-on but not necessary in most situations.
4. Transkribus as Quick Fix
I’ll finish with a survey of different projects where I tested it or observed somebody testing it.
Transkribus worked well for
- Serial archival documents in Kurrentschrift (final year undergraduate student in History who wanted to do a quantitative analysis of documents from the Franco-Prussian war which he could not have processed manually)
- Medieval prayerbook by one scribe for which a highly diplomatic transcription is needed.
- Complex birth certificate with marginal additions, half printed formula, different hands (provided a structured transcription which even though it was only ca. 70% correct, reduced the time for compiling the different information from the document considerably)
Transkribus worked less well (or rather was over-complex) for
- Early modern printed material: very easy to read text which I wanted in standardized form meant it was quicker to type from scratch than to correct
- Short irregular notes in different hands: not enough material to come to a reliable reading
- OCRable text in modern typeface – any scan app produces results more quickly
It is not always obvious what will work well – so give it a try and let me know if you find it useful!