About the Project

Project Partners and Funders

Manuscripts Online was created by a partnership between the Universities of Sheffield, Leicester, Birmingham, York, Glasgow and Queen's University Belfast. Natural language processing, indexing, the development of the search engine and implementation of the website front end were carried out by the Humanities Research Institute (University of Sheffield). The website front end was implemented using designs provided by Mickey and Mallory. See below for Project Staff.

The project was made possible by a generous grant from the JISC e-Content Capital Programme.

The project is a sister-site to the JISC-funded Connected Histories which provides integrated access to online historical resources for the period 1500 to 1900. A great deal of Manuscripts Online builds upon the methods and expertise developed by the developers of Connected Histories: the University of Sheffield, the Institute of Historical Research and the University of Hertfordshire.

Technical Methods

Manuscripts Online has not created any new digital content. Instead, it provides integrated access to electronic content already available on distributed websites. Our search engine does not search these resources directly. Instead, it searches indexes we have created from the full content of each resource. Our approach to indexing depends on the nature of the electronic resource available:

  1. Databases, such as the Cause Papers, and semi-structured sources where the text is marked up with xml tags, such as the Production and Use of English Manuscripts, were processed by extracting identified information on names, places, dates, document references and languages into indexes.
  2. Text which is largely unstructured, such as The Norman Blake Editions of The Canterbruy Tales, was processed using natural language processing in order to identify names, places, dates, document references and languages in the original texts. ANNIE, an open source information extraction system, was used, in conjunction with custom-scripted pattern action rules, to apply named entity recognition to the texts. Gazetteers were constructed from a range of sources, including the Cause Papers, Taxatio and Gaugh Map. This methodology is subject to a degree of error.

Spelling variation during the medieval period makes keyword searching an imprecise activity. We have attempted to improve this by providing an 'include variant spellings' option. If this is selected then any search for a keyword will simultaneously search a range of spelling variants. The spelling variants are calculated in two ways: 1) using variant data which has been generously made available to us by the Middle English Dictionary and 2) computationally guessing common character substitutions, such as "oun" for "on".

Another technical challenge is the use of non-Latin characters such as thorn, yogh and eth. There are two challenges here: 1) determining whether or not a text uses a non-Latin character or its latin equivalents and 2) establishing how different content creators have chosen to represent non-Latin characters digitally. We have addressed the first challenge by computationally guessing character substitutions, so a search for "therfor" with 'include spelling variants' will also search for "þerfor". We have addressed the second challenge by auditing each dataset at the outset, identifying the individual transcription methods and then replacing thorn, yogh and eth representations with unicode entities.

The search engine uses the Apache Lucene text search engine, within a Java environment.

You can read more about the project's technical challenges here.

Access to Subscription Sources

Some of the resources searched by Manuscripts Online are only accessible via subscription. While Manuscripts Online allows users to search these resources and examine snippet results free of charge, we do not and cannot provide non-subscribers full access to these resources. To arrange such access, it is necessary to contact the proprietors of the relevant resource directly.

If you do have subscription access to a resource and encounter a login page you cannot get through, you should first log in to that resource using your normal access procedure before clicking on links in Manuscripts Online.

Future Sources to be Included

The next major update of Manuscripts Online will take place in the summer of 2013, when several new resources will be added.

We welcome proposals for the inclusion of additional resources. If you are responsible for an electronic resource which you believe is appropriate for Manuscripts Online, please consult our Participate: Contributing New Resources page.

Project Staff

  • The Directors of this project are Michael Pidd (Humanities Research Institute, University of Sheffield) and Dr Orietta Da Rold (University of Leicester).
  • The project's Editorial Board members are Professor Wendy Scase (University of Birmingham), Professor Linne Mooney (University of York), Dr Estelle Stubbs (University of Sheffield), Professor John Thompson (Queen's University Belfast) and Professor Jeremy Smith (University of Glasgow).
  • The Project Manager is Dr Sharon Howard.
  • Katherine Rogers (Humanities Research Institute, University of Sheffield) is the principal Developer, in charge of data processing and development of the search engine.
  • Matthew Groves (Humanities Research Institute, University of Sheffield) is the second Developer, in charge of implementing the website front-end.

Acknowledgements

We are grateful to the following for their help in bringing this project to completion:

  • Peter Findlay, Programme Manager, Digitisation at JISC, for helpful advice at every stage of the project.
  • Our advisory panel, for providing helpful feedback throughout the project. The panel included: Professor Peter Ainsworth (University of Sheffield), Stephen Brooks (ProQuest), Aleks Drozdov (The National Archives), Dr Adam Farquhar (The British Library), Dr Ian Johnson (University of St Andrews), Professor Robert Shoemaker (University of Sheffield) and Dr Jane Winters (Institute of Historical Research).
  • Sarah Charlton and the design staff of Mickey and Mallory.
  • And above all, the creators of the resources included in Manuscripts Online for agreeing to participate in this project, and for providing us with copies of their data so that we could create the indexes which are searched by this website.

Cite this page:

"About the Project" Manuscripts Online (www.manuscriptsonline.org, version 1.0, 29 May 2017), https://www.manuscriptsonline.org/about