Transcription | Georgian Papers Programme

Transcribing the Georgian Papers

The Georgian Papers Programme approaches transcription with a simple objective — to make the content of the manuscript documents usable and accessible to anyone for research and interpretation.

AIM & INTENT

The Programme aims to produce plain-text transcripts of the documents. With this approach, the emphasis is placed on producing usable transcript text (or data) that enhances these documents enabling them to be searched and used in creative ways by researchers and scholars around the globe. On a grand scale, our approach is part of an important shift in archival and cultural heritage practices, with transcription projects steering away from full-scale documentary editing and production of an authoritative scholarly editions. Instead, in presenting simple, non-authoritative transcripts, the project is producing a body of information that is available, and accessible to anyone for interpretation, and research.

METHODS & TOOLS

While it may seem like the digitisation and transcription of the Georgian Papers documents is removed from the academic side of the project, researchers and academics play an important role in the process. With their guidance, we develop transcription guidelines that help us reach the overarching project goals.

One thing that emerged early on in this collaborative process is the realization that a project of this scope, with such a wide range of documents, cannot be approached using only one transcription method.

To tackle this issue, we have chosen two methods for transcription: traditional, manual crowdsource transcription, and an automated handwritten text recognition (HTR) tool, Transkribus, developed by the READ Project.

Crowd-Source Transcription
Transcribe Georgian Papers (transcribegeorgianpapers.wm.edu) is our online crowd-source transcription site, which is built on an Omeka + MediaWiki platform. It provides an opportunity for digital volunteers to engage and directly contribute to the programme by transcribing Georgian Papers documents.

Transkribus – Handwritten Text Recognition
Transkribus logo Georgian Papers collections that span volumes or are complex tabular manuscripts are managed by a team of library professionals and student transcribers at William & Mary using Transkribus. William & Mary was recently awarded a grant by the U.S. National Endowment for the Humanities to support a collaborative project between the University’s libraries and computer science department to improve the capabilities of Transkribus to process tabular data.

CONVENTIONS & MODELS: WHO WE ARE LEARNING FROM?

The Georgian Papers Programme is one of many large-scale transcription projects underway, and it is fundamental to all such digital libraries and digital humanities endeavors we learn from one another. Our Programme’s transcription model and conventions are largely informed by these two projects.

Smithsonian Digital Volunteers: Transcription Center
https://transcription.si.edu/

Transcribe Bentham
Bentham Project at University College London
https://blogs.ucl.ac.uk/transcribe-bentham/

COLLABORATORS

Collections within the Georgian Papers may intersect with other transcription projects based around the subject matter, or the individual whom created the materials. In these cases we strive to collaborate with projects in the most beneficial manner.

Image to Text: Mary Hamilton papers at the University of Manchester is our first transcription collaborator.
The Mary Hamilton papers team are transcribing the Correspondence between George, Prince of Wales, and Mary Hamilton series from the collection Additional papers of George IV, as Prince, Regent and King. Though their protocol and aim for transcription takes a different view than the GPP, the cooperation between the projects enriches the goals of each. Mary Hamilton papers retain use of the document images, and their transcribed text, and in exchange the GPP receives the transcripts which will be transformed to match GPP guidelines, and will be made available as part of the GPP corpus.