Manuscripts Online was created by a partnership between the Universities of Sheffield, Leicester, Birmingham, York, Glasgow and Queen's University Belfast. Natural language processing, indexing, the development of the search engine and implementation of the website front end were carried out by the Digital Humanities Institute (University of Sheffield). The website front end was implemented using designs provided by Mickey and Mallory. See below for Project Staff.
The project was made possible by a generous grant from the JISC e-Content Capital Programme.
The project is a sister-site to the JISC-funded Connected Histories which provides integrated access to online historical resources for the period 1500 to 1900. A great deal of Manuscripts Online builds upon the methods and expertise developed by the developers of Connected Histories: the University of Sheffield, the Institute of Historical Research and the University of Hertfordshire.
Manuscripts Online has not created any new digital content. Instead, it provides integrated access to electronic content already available on distributed websites. Our search engine does not search these resources directly. Instead, it searches indexes we have created from the full content of each resource. Our approach to indexing depends on the nature of the electronic resource available:
Spelling variation during the medieval period makes keyword searching an imprecise activity. We have attempted to improve this by providing an 'include variant spellings' option. If this is selected then any search for a keyword will simultaneously search a range of spelling variants. The spelling variants are calculated in two ways: 1) using variant data which has been generously made available to us by the Middle English Dictionary and 2) computationally guessing common character substitutions, such as "oun" for "on".
Another technical challenge is the use of non-Latin characters such as thorn, yogh and eth. There are two challenges here: 1) determining whether or not a text uses a non-Latin character or its latin equivalents and 2) establishing how different content creators have chosen to represent non-Latin characters digitally. We have addressed the first challenge by computationally guessing character substitutions, so a search for "therfor" with 'include spelling variants' will also search for "þerfor". We have addressed the second challenge by auditing each dataset at the outset, identifying the individual transcription methods and then replacing thorn, yogh and eth representations with unicode entities.
The search engine uses the Apache Lucene text search engine, within a Java environment.
You can read more about the project's technical challenges here.
Some of the resources searched by Manuscripts Online are only accessible via subscription. While Manuscripts Online allows users to search these resources and examine snippet results free of charge, we do not and cannot provide non-subscribers full access to these resources. To arrange such access, it is necessary to contact the proprietors of the relevant resource directly.
If you do have subscription access to a resource and encounter a login page you cannot get through, you should first log in to that resource using your normal access procedure before clicking on links in Manuscripts Online.
The next major update of Manuscripts Online will take place in the summer of 2013, when several new resources will be added.
We welcome proposals for the inclusion of additional resources. If you are responsible for an electronic resource which you believe is appropriate for Manuscripts Online, please consult our Participate: Contributing New Resources page.
We are grateful to the following for their help in bringing this project to completion: