Update: I revised this post on December 2, 2010 to incorporate suggestions by Will Fitzgerald and additional examples of digitized books found since I wrote this piece.
In my seminar in digital scholarship and media studies at Emory this fall, I’m embarking on a project that involves the digitization and presentation of a few books in the Sacred Harp tradition. Searching for the best platform for presenting these books alongside original research has led me to look into various technical solutions for displaying digitized books on the web.
Some approaches focus on text encoded according to TEI specifications. Such sites may include digital images of pages from the printed book, but focus on the presentation of the text as easily readable markup, and utilize the advantages of hyperlinks for footnotes or annotations. An example of this approach is The Emory Women Writers Resource Project:
- The Emory Women Writers Resource Project.
- Here is a page from the site that demonstrates how the interface displays the scanned page alongside the TEI-encoded text.
- The 1860 printing of The Sacred Harp is presented in a text-centric fashion by the Michigan State University Library (the text, while formatted, is not TEI-encoded and does not feature footnotes or annotations).
For tunebooks in the shape note tradition, a text-centric approach makes less sense. Some publishers of online editions of shape note songbooks have taken a multimedia approach, including an image file for each page in JPEG format alongside audio files in MIDI format or MP3, and perhaps including the text of each song as well.
- One example of this approach is the On-line Southern Harmony, hosted by the CCEL, which features JPEG scores, text, MIDI, and MP3 recordings.
- A more recent example is the beautifully designed web edition of the Harmonia Sacra, created by Will Fitzgerald and James Nelson Gingerich. This web site presents JPEGs and dowloadable PDFs in two shape note formats for each song as well as MIDI files. The site also features extensive indices ranging from tune name to meter to incipit.
Other web sites focus on the scanned image without presenting the text or merely present the scanned book for download.
- Emory’s Yellowback fiction project is one such collection.
- Several web sites make oblong tunebooks available in this fashion, generally as a PDF file for download. For example, see A Supplement to the Kentucky Harmony on BostonSing.com or The Federal Harmony, hosted by IMSLP
- Other web sites present an index of the songbook in question with links JPEGs of each song. For example, see The Hesperian Harp or A Supplement to the Kentucky Harmony (again!) on Berkley Moore’s Out of Print Shape Note Books Site.
It strikes me that the best approach preserves the individual pages and presents them in a format that is user-friendly and may be browsed through a book-like interface while also retaining the advantages of search and accessibility gained by OCR. Such a site should also provide downloadable files in accessible formats. A site that does much of this, but that is closed to user-submitted books, is Google Books. The BookReader developed by the Internet Archive and the Open Library features a similar design and is an open source project.
- Google Books.
- This page demonstrates the browsing experience using Google Books.
- The Internet Archive BookReader.
- An example of how the BookReader mimics the page turning experience.
- An alternate view of Michigan State’s 1860 Sacred Harp allows easy browsing of the page images, though without the more advanced technological structure of the applications listed above.
BookReader seems like a promising format, and may be open to enhancements through plugins (another desirable feature for the purposes of my work). Are there other, more attractive options?
5 thoughts on “Presenting Digitized Books on the Web”
Looking forward to seeing what your efforts bring forth.
I’m sure you’re aware of the Michigan State edition of the Sacred Harp at http://digital.lib.msu.edu/collections/index.cfm?action=view&TitleID=610 which seems important to acknowledge.
As well as the CCEL version of the Southern Harmony. http://www.ccel.org/ccel/walker/harmony/files/harmony.html
Oh, and my and James Gingerich’s version of the Harmonia Sacra.
Thanks for reminding me of these examples, Will. All three are important reference points.
I think the inclusion of MIDI files on your Harmonia Sacra site (as well as on the online Southern Harmony site) is a nice touch.
I see from your post around the time you published the online Harmonia Sacra that the site is generated with Ruby. I gather this means it is database driven, though static? Was the code you employed written from scratch and particularly for this project? Would it be easily generalized? Do you have any thoughts on the advantages or drawbacks of your approach?
The ‘database’ is/was just a flat file that James Gingerich exported from some Mac-based database (maybe Filemaker). The code was pretty hacky; I wasn’t particularly proud of it. And thus I would not particularly recommend it. Regenerating all the static pages was not particularly onerous, and so I tended to do so based on the slightest changes.
One could imagine, though, having something approaching a standard set of schemata for tunebooks and associated data, and having some standard code/templates for managing these. I started some of this once upon a time (we’ve talked about this, I think).
Managing the ‘resources’ (the pdfs of the tunes, the midi files) was actually a bigger problem, since it’s important to have standard mappings from a tune/page location to the resource. For example, ensuring that Harmonia Sacra’s Bethany (188t) knows how to map to the 4 shape file (hs_4_pdf/188a.pdf) — or even the base HTML page (188t.html).
There are some hints, there, too that dealing with collation of tunes/pages was a bit of a hack. 188t has to come before 188b, but of course these is not a standard sorting (James solved this by using ‘a’ and ‘b’, but I wanted the pages to reflect the normal singer’s practice).
It also took a few iterations to get the collation on meters correct. There are some (loose?) standards for this (LM first, then CM, etc) but at some point I ended up just using a alphabetic sort, I think.
I wish I had the text of the tunes available, but these were too hard to collect from the original sources. Of course, James typed them in, but they get interspersed in various ways. One advantage of a system like Lilypad is that you can define the lyrics ‘declaratively’ and thus get these exported somewhat more easily, and in the right order.
If you have further questions, feel free to drop me a line.
Sounds like you’re on the right track with a UI that presents page images plus an encoded layer to support search. Depending on how sophisticated your search needs — keyword only? — it might turn out that aiming for SGML for the encoding, TEI or otherwise, would be overkill. Though if you’re digitizing a small enough body of work, you can afford to lavish more time on it, and maybe you’ll end up wanting the songs tagged with <stanza> and whatnot so you can display the lyrics usefully by themselves.
Thanks for the advice. You may be right that TEI is overkill. Though I am digitizing a small body of work at this point, I do want my process and tools to be scalable. It might be nice to have the capacity to display lyrics by themselves, though.