Digital Humanities Student Projects

Travelling to the internet: A digital edition of MS. Bodley 972

During their time as interns in Oxford, Eva Neufeind and Agnes Hilger started working on a digital edition of MS. Bodley 972, a manuscript of the travelogue of the knight Arnold von Harff from the 16th century. In this blogposts, they report on the developments of the project ahead of the official launch of the edition.

by Eva Neufeind and Agnes Hilger

The many faces of Arnold von Harff

Arnold von Harff’s travelogue is a treasure trove for very different fields of interest: As a source for the topic of travelling and pilgrimage in the late Middle Ages. As an adventure journey of a young knight. As part of a discourse on fictional beings and places. As a collection of many examples of different languages and their alphabets. And last but not least, as a story of encounters with people from other cultures and their depiction in images – which sometimes seem problematic from our modern point of view. The Medieval German Graduate Seminar held in Oxford last Hilary Term was not the only example of how productive an examination of Arnold von Harff’s travelogue can be. The recent blogposts by Aysha Strachan (drawing on a History of the Book project from a few years ago), Mary Boyle (based on her recently published book Writing the Jerusalem Pilgrimage in the Late Middle Ages), and Marlene Schilling/Josephine Bewerunge (inspired by their Master seminar on ‘Medieval Women’s Writing’) also point to interesting perspectives for engaging with this text or its Oxford manuscript.

Arnold von Harff imprisoned in Gaza. (fol. 106r)

An adequate textual basis is needed to deal with all these topics and questions. However, the only edition available until now is the first edition by Eberhard von Groote from 1860.[1] Problems with this edition have been pointed out for more than 30 years.[2]  A particular problem is that the manuscripts Groote used are all lost today. He referred to them as A, B, and C and took the text mainly from A. When something was missing or he suspected an error, however, he intervened and used B or C, but did not indicate his corrections.

The title page of the Groote edition of 1860.

In 1992, Volker Honemann and Hartmut Beckers announced a new edition of the travelogue.[3]  This was also to be based on Groote’s edition, but would correct ‘obvious’ errors on his part. Groote’s text was to be compared with that of the Maria Laach manuscript and translated. A German translation has been published in 2009 (by Helmut Brall-Tuchel and Folker Reichert) but Volker Honemann died in 2019 and the announced edition has not been published. In any case, for the Taylor Edition series the aim is to make versions available which are close to the digitised manuscripts.

Our project

This brings us to the project that we started in Hilary Term 2021 at the University of Oxford: A digital edition of MS. Bodley 972, a paper manuscript of the travelogue from the 16th century. The manuscript is not as old as the manuscript A used by Groote, but close to the text of Groote’s edition. How close, though, will finally be determined more precisely with the finished transcription. In a first step, this transcription will be published as part of the Taylor Editions alongside a facsimile – the high-resolution IIIF images were provided by the Bodleian Library on digital.bodleian through the Polonsky German project. Later, we hope to add a normalised version that at least includes modern punctuation and an English translation.

Brief description of the editorial guidelines

The transcription is as diplomatic as possible. Only different spellings of a letter, such as different versions of the letter s, as well as the capital letters I and J, which are difficult to differentiate, are combined into one letter. The combination ij and the letter y were differentiated according to whether they had dots above them or not. On the other hand, the letters u and v, for example, were transcribed as in the original. Slashes and paragraph marks were also adopted. And the numerous abbreviations for which we could not find a suitable Unicode character were annotated in xml with the abbreviation element.

Text Recognition (HTR)

Since Arnold’s travelogue is quite extensive, comprising 158 pages of text in MS. Bodley 972, we decided to partially automate the transcription process. Automatically recognising text from modern printed pages is now part of our everyday university life, whether at the scanner in the library or in the PDF program on our notebooks. But all of this is much more difficult with older texts and especially with manuscripts. Not only do manuscripts differ more from one another. Even within a manuscript, letters can be written differently, depending on the mood of the scribe, the word surrounding it, the ink used, etc. 

The transcription software Transkribus, which we used for our edition, therefore works with machine learning. If you will, the program learns to read a specific handwriting. Transkribus already provides prepared models that can read different handwritings. However, the manuscripts used for these models differed too much from MS. Bodley 972. Therefore, we had to train our own model on our manuscript. This model is continuously getting better at reading our manuscript while respecting the editorial guidelines we have set. Currently, it has an error rate of 3.03%. That means roughly: Out of every 100 characters three are transcribed wrong. By automatically recognising more pages, correcting them, and re-training our model, we can improve this error rate step by step.

However, the computer cannot do all the work for us. This is true not only for the post correction, but also for the actual encoding, which, in addition to conventional problems, has to take into account some specific peculiarities of MS. Bodley 972.

A polyglot knight: Arnold and his language collection 

Throughout his travelogue, Arnold von Harff enthusiastically documented a number of extraordinary things he has learned during his travels to the Holy Land and beyond. Besides showing numerous illustrations of exotic animals to his readership, Arnold also compiled information about other cultures and languages. He gave an overview of different scripts and put together lists of the most important words and phrases.

To encode the alphabets, we created a vector image for every letter and encoded them in a table. (Many thanks to Sebastian Dows-Miller for coming up with this brilliant idea!) By mirroring the alphabet this way, the special features of the alphabets in Bodl. 972 do not get lost and they can be easily compared to other manuscripts or editions.

To achieve this, we imported the manuscript pages with the alphabets into a picture editor (e.g. GIMP) and used Fuzzy Select to isolate areas of the image based on colour similarity. The selected area than can be exported. (Nota bene: It is important to manually save them as .svg-files, otherwise they will not be progressed properly. In a second step, they have to be saved as .png to use them in the xml-file.) To ensure that the letters are all recognisable and coloured evenly, we touched up the colour scheme by using the virtual paint brush.

The code of a table to encode the first row of the Hebrew alphabet. We named the images according to their row and their place in the row
The word lists as tables in the digital editions

All of the cut-out letters had to be saved in the same folder as the xml-file to easily link them to the edition. In the upper row, we put the name of the letter and put in spaces in unicode. In the row below, we linked the pictures one by one. It is crucial to name the files systematically to avoid confusion at a later point. The word lists are also encoded in a table to mimic the layout of Bodl. 972.

The first 13 locations of Arnold’s pilgrimage and their GPS data in a table.

Item van... : Mapping Arnold’s Pilgrimage

Parts of the journey from Cologne to Jerusalem, visualized with Palladio.

The structure of the pilgrimage report is well-arranged. For this reason it was an intriguing idea to track Arnold’s supposed itinerary and visualize it in a map. To achieve this, we needed two components: all place names Arnold mentioned, and their coordinates. We started filling a table with all the place names and, despite the strong dialect, started researching what they are called today. In another column, we then collected GPS-data, which can be easily found out via Google Maps or other GPS pages.

Finding out which places Arnold visited became increasingly harder the further he moved away from his hometown Cologne. It is especially difficult to put our finger on smaller villages in Eastern Europe, Africa or Asia. The table consists of roughly 320 places at the moment and is still “work in progress”. I am certain that looking at other travel accounts that served as Arnold’s templates will reveal insightful evidence to close the gaps and allow us to create a very detailed map.

We create the map with Palladio. It is a great tool to visualize data as networks or maps. By uploading a spreadsheet filled with place names and coordinates, Palladio can quickly generate a detailed map.


Thanks to Emma Huber, Henrike Lähnemann, and Mary Boyle for their help and for supporting this project.

Agnes Hilger studies German literature, History, and Digital Humanities at the Julius-Maximilians-Universität Würzburg, where she has worked as a student research assistant in the project on the beginnings of modern poetry, in the Narragonien Digital project, and in a digital edition project on Konrad von Fußesbrunnen’s Kindheit Jesu. In Hilary term 2021, Agnes Hilger was an Erasmus intern at the Faculty of Medieval and Modern Languages, University of Oxford.

Eva Neufeind studies History at the Heinrich-Heine-Universität Düsseldorf, where she is working as a student research assistant for the Department for Medieval History. She mostly engages in historiography and Hussite history. In Hillary term 2021, Eva Neufeind was an intern at the Faculty of Medieval and Modern Languages, University of Oxford.

Image Permissions

Digital.Bodleian (images from MS Bodl. 972): Creative Commons non-commercial license, with attribution (CC-BY-NC 4.0). Images © Bodleian Libraries, University of Oxford.

[1] Groote, Eberhard von: Die Pilgerfahrt des Ritters Arnold von Harff von Cöln durch Italien, Syrien, Aegypten, Arabien, Aethiopien, Nubien, Palästina, die Türkei, Frankreich und Spanien, wie er sie in den Jahren 1496 bis 1499 vollendet, beschrieben und durch Zeichnungen erläutert hat. Nach den ältesten Handschriften und mit deren 47 Bildern in Holzschnitt, Cöln 1860. The two translations of the travelogue are based on the edition of Groote:  Helmut Brall-Tuchel, Rom – Jerusalem – Santiago. Das Pilgertagebuch des Ritters Arnold von Harff (1496-1498), mit den Abbildungen der Handschrift 268 der Benediktinerabtei Maria Laach und zahlreichen anderen Abbildungen, Köln 2007 and Malcolm Henry Ikin Letts: The pilgrimage of Arnold von Harff, knight. From Cologne through Italy, Syria, Egypt, Arabia, Ethiopia, Nubia, Palestine, Turkey, France, and Spain, which he accomplished in the years 1496 to 1499, London 1946.

[2] Jorgensen, Peter A.: Die Bodleian Handschrift der Reisebeschreibung des Ritters Arnold von Harff, in: Rheinische Vierteljahrsblätter 52 (1988), p. 221–225; Honemann, Volker and Hartmut Beckers: Zu einer Neuausgabe der Reisebeschreibung des Arnold von Harff, in: ZfdPh 111 (1992), p. 392–396.

[3] Honemann, Volker and Hartmut Beckers: Zu einer Neuausgabe der Reisebeschreibung des Arnold von Harff, in: ZfdPh 111 (1992), p. 392-396.

Leave a Reply

Your email address will not be published. Required fields are marked *