Google Books and Fair Use: From Implausible to Inevitable?

Prof. Jane C. Ginsburg, Columbia University School of Law*
October 19, 2015

A for-profit corporation scans millions of in-copyright books and permanently stores their full contents in its database, all without seeking permission or paying the books’ authors or publishers.  Over 10 years ago, when Google began its massive digitization and storage program, with the cooperation of the University of Michigan library, which supplied the books that Google scanned, not many copyright scholars would have thought that the systematic copying of immense volumes of full text for commercial purposes, without the creation of new copyrightable expression building on the copied content, could plausibly assert the mantle of fair use.  By the time the Second Circuit upheld Google’s fair use defense last week,1 that result seemed inevitable.

How did the fair use doctrine go from a safety valve to enable second authors to create new works that productively incorporate reasonable portions of prior works, to a free (in both senses of the word) pass for mass commercial digitization – at least so long as the outputs from the commercial database communicate no expression or insufficient expression to infringe?  And, perhaps, so long as the compiler of that database can keep the contents safe from hacking.  In this column, I will set the decision in light of its forebears, and then consider its impact on future fair use defenses.

First, a summary, in the court’s words, of the Google Books holding:

(1) Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses.  The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals.  Google’s commercial nature and profit motivation do not justify denial of fair use.  (2) Google’s provision of digitized copies to the libraries that supplied the books, on the understanding that the libraries will use the copies in a manner consistent with the copyright law, also does not constitute infringement.  Nor, on this record, is Google a contributory infringer.

This column will address only the first holding.  The key determination focused on the “highly transformative” purpose of the copying.  In modern fair use jurisprudence, particularly in the Second Circuit, once a court rules the “nature and purpose of the use” (statutory fair use factor (1)) to be “transformative,” a successful outcome for the defense is almost assured.  In the last 20 years of fair use caselaw, the meaning of “transformative” has itself become transformed.  In an influential article written in 1990, Judge Pierre Leval of the U.S. Court of Appeals for the Second Circuit – who also authored the Google Books opinion – coined “transformative use” to describe what used to be called “productive uses,” through which the fair use doctrine traditionally allowed follow-on authors to bestow their intellectual labor in reworking selections from a prior work, provided the borrowings did not prejudice the profits or prospects of that work.2  The Supreme Court, four years later, for the first time recognizing parody as a potential fair use, adopted the label.3  In Campbell v. Acuff-Rose Music, Inc., the Supreme Court inquired whether the defendants’ musical parody had made a “transformative” use: not one that merely supersedes the objects of the earlier work by copying it, but one that “adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message.”4  In the context of the Supreme Court’s Campbell decision, the Two Live Crew rap parody “transformed” Roy Orbison’s “Pretty Woman” by creating a new (and rather raunchy) work.5  But courts came to interpret Campbell’s reference to “something new, with a further purpose”6 to encompass copying that does not add “new expression,” so long as the copying gives the prior work “new meaning.”7  Fair use cases began to drift from “transformative work” to “transformative purpose”; in the latter instance, copying of an entire work, without creating a new work, could be excused, particularly if the court perceived a sufficient public benefit in the appropriation.

In the initial shift from “transformative work” to “transformative purpose” the defendant had in fact created an independent work of authorship, even though that work did not significantly alter the copied work.  Thus, in Bill Graham Archives v. Dorling Kindersley Ltd. (which did not concern digital technologies), the Second Circuit held a coffee-table-book biography’s reduced-sized complete images of posters of the legendary rock band The Grateful Dead were “transformative” because the book used the images of the posters as “historical artifacts” to document the Dead’s concerts, rather than for the posters’ original aesthetic purpose.8  But the documentary/aesthetic distinction also significantly expanded the application of the fair use exception to new technological uses that did not yield new works.  The search engine practice of permanent storage of works for the purpose of “indexing” has been the principal digital beneficiary of the “documentary” or “new purpose” brand of transformativeness.9

Other applications of the aesthetic/documentary distinction – more broadly characterized as a distinction between expression and information – to the inputting of copyrighted works into databases then emerged.  In A.V. ex rel. Vanderhye v. iParadigms, LLC, the Fourth Circuit ruled the constitution of a commercial database of student papers by the “Turn It In” plagiarism detection service a “fair use”: “the archiving of plaintiffs’ papers was transformative and favored a finding of ‘fair use.’  iParadigms’ use of these works was completely unrelated to expressive content and was instead aimed at detecting and discouraging plagiarism.”10  In a decision that in many ways presaged Google Books, the Second Circuit in Authors Guild v. HathiTrust, concerning library uses of their holdings, as digitized by Google, found the scanning and permanent storage of full copies of in-copyright books to further the “transformative use” of allowing “data mining” of the contents of the books.11  Such uses are non expressive in two senses: They produce no new expression by the copying and storage entities, and the “mining” of the scanned book seeks not to expose its expression, but rather to extract information.12

In light of this progression, the decision in Google Books probably surprised no one, even though Google Books presented two ultimately non-salient differences with HathiTrust.  While the University of Michigan is a nonprofit educational institution, and (apart from a limited program for the visually impaired) it confined its use of the database to datamining so that it conveyed none of the scanned works’ contents to the public, Google is not a charitable entity, and its book search program communicated to the public “snippets” of the books.  “A snippet is a horizontal segment comprising ordinarily an eighth of a page”; on a 24-line page, a snippet works out to three lines.   The court accorded scant weight to the commercial nature of Google’s enterprise, stressing that the Second Circuit has “repeatedly rejected the contention that commercial motivation should outweigh a convincing transformative purpose and absence of significant substitutive competition with the original.”  Distinguishing between outputs that convey information about the scanned book from outputs that convey its expression, the court ruled that neither the datamining uses nor the snippet views exploited the copied works for their expressive value.  Hence “the creation of complete digital copies of copyrighted works [results in] transformative fair uses when the copies ‘served a different function from the original.’”

With respect to the datamining uses there is a powerful argument that exploiting a work for its non-expressive information (bibliographic or bean-counting – how many times and in what works a given word or phrase appears) is not even prima-facie infringing, and that the digitization of lawfully possessed copies (loaned from the University of Michigan library) to create a database that enables non-expressive, but progress-of-knowledge-enhancing outputs must therefore be equally free.  By contrast, the snippet views did convey limited amounts of expression, but the court repeatedly emphasized the very constrained and controlled, “fragmentary and scattered,” “cumbersome, disjointed, and incomplete nature of the aggregation of snippets made available through snippet view.”  As a result, “at least as presently structured by Google, the snippet view does not reveal matter that offers the marketplace a significantly competing substitute for the copyrighted work” (emphasis supplied).  The court appears to be endeavoring to avoid slippery-slope expansion of the content or presentation of fair use-permissible snippets.

As currently constituted, the snippet views may well provide too little copyrightable expression to substitute for purchase of the full book, or for licenses of smaller, expression-bearing, increments of the book.  But, as Appendix A to the court’s decision (reproducing a snippet view of one book) demonstrates, it is not clear that the snippets in fact offer sufficient “information about” the books to perform their “transformative” fair use function of allowing the user to ascertain whether the book is relevant to her purposes.  More expression might better promote that objective, but might also risk displacing the potential licensing market for expressive excerpts.  The decision upholds a status quo that may in fact satisfy neither authors nor users; the slippery slope thus may yet loom.

Similarly, in response to the authors’ concern that the database might be vulnerable to hacking, the court responded that “Google has documented that Google Books’ digital scans are stored on computers walled off from public Internet access and protected by the same impressive security measures used by Google to guard its own confidential information.”  Less “impressive security” might doom a fair use defense, given the devastating consequences of unfettered access to reproduce and further communicate the full text of digitized works.  Thus, while the court found that Google’s program did not present a sufficiently credible risk of harm, it is not clear who else’s programs could clear the decision’s high security bar.

The court’s cautious circumscription thus suggests that the Google Books decision does not herald a new extension of an already-expanded fair use defense, but (at least until a competitor with equivalent resources appears) is instead sui generis.  The Second Circuit’s abstention from addressing some of the district court’s fair use analyses similarly betokens the decision’s modest scope.  For example, the district court embraced the long-spurned argument that the defendant’s copying does the plaintiff a favor by bringing the work to greater public attention,13 but the Second Circuit’s opinion forgoes such contentious flourishes.

That said, in cases involving complete copying of entire works, questions undoubtedly remain regarding the reach of the recasting of “transformative use” as a dichotomy between the provision of “information about” a copied work (likely fair use) and the communication of “expressive content” (less likely fair use, though defendant may still show that its use does not substitute for an actual or potential market for the plaintiff’s expression).  Indeed, if fair use turns on non disclosure to the public of a digitized work’s copyrightable expression, then, if Google had scanned and stored millions of books, and used the content for internal purposes without permission but also without disclosing any outputs of any kind to the public, would its use still be “fair”?

