Working with In-Copyright Materials for Digital Humanities Research: Legal, Ethical, and Practical Issues

McLaughlin, Stephen Reid

dc.contributor.author	McLaughlin, Stephen Reid
dc.date.accessioned	2016-06-16T22:12:52Z
dc.date.available	2016-06-16T22:12:52Z
dc.date.issued	2015-04-10
dc.identifier.uri	http://hdl.handle.net/10106/25722
dc.description	Twenty-Minute Presentation	en_US
dc.description.abstract	To date, a significant chunk of digital humanities research projects have focused on analysis of works in the public domain, virtually all of them published prior to 1923. Greater access to recent publications would be a boon to the field, and in fact legal access to a large corpus of in-copyright works may be coming soon. A 2014 ruling by the Second Circuit Court of Appeals found that book digitization for the purpose of full-text search falls under fair use protection (Parker, 2014), leading the HathiTrust Research Center (HTRC) to announce that it will soon make its corpus of in-copyright works available for remote analysis on its own servers (“2014 Mid-Year Review,” 2014). It is so far unclear, however, when this project will go live and to whom access will be granted. In the meantime, there are several alternatives available to DH researchers. An unambiguously legal method for working with protected material is the manual digitization of physical books, either via typing or scanning. This practice clearly falls under fair use, provided such copies aren’t distributed publicly. However, the time and effort required to produce an acceptably clean version of a text limits the efficacy of this approach for all but the smallest-scale projects. Stepping into a legal gray area, one can use free tools such as Calibre to strip digital rights management (DRM) protection from ebooks purchased through online stores such as Amazon. The Digital Millennium Copyright Act (1998) prohibits DRM removal, but a recent ruling by a federal judge in New York suggests that the practice may in fact be legally acceptable for personal use (Cote, 2014). In any case, when carried out for the purpose of research, removal of DRM appears clearly ethically justified. In practical terms, commercially formatted ebooks are ideal for use in digital humanities research. An EPUB file is simply a compressed directory containing XHTML-formatted text and XML-encoded metadata (“EPUB 3 Overview,” 2014). Unlike in a plain text document, each chapter of an EPUB is clearly delimited, as are a book’s frontmatter and backmatter. And unlike working with PDFs, there is no need to correct gaps introduced by page breaks. With a bit of up-front work, then, many if not most recently published books can be quickly re-formatted for textual analysis — that is, if one is willing to purchase a copy. Stretching the limits of propriety, ebook piracy is a convenient (if ethically questionable) alternative available to contemporary DH scholars. Over the past half decade, websites offering illicit copies of ebooks have grown significantly in scope and comprehensiveness. Library Genesis (http://gen.lib.rus.ec) is an ad-free site based in Russia offering nearly two million ebooks and thirty-six million academic articles. AAAAARG (http://aaaaarg.org) hosts a comparatively smaller collection, clustered around a core collection of critical theory, art history, and philosophy. Finally, Ebook.farm (http://ebook.farm) is a very large private site which — unlike the others listed here — charges its users a small fee for each download.	en_US
dc.language.iso	en_US	en_US
dc.subject	Corpus building -- Digital Humanities	en_US
dc.subject	Digital Rights Management	en_US
dc.subject	DRM	en_US
dc.subject	Ebooks -- Piracy	en_US
dc.title	Working with In-Copyright Materials for Digital Humanities Research: Legal, Ethical, and Practical Issues	en_US
dc.type	Presentation	en_US

Files in this item

Name:: McLaughlin.jpg
Size:: 530.5Kb
Format:: JPEG image
Description:: JPEG

View/Open

This item appears in the following Collection(s)

TXDHC 2015 Presenter Abstracts

Show simple item record