Google continues to send shock waves through the publishing and library establishments with big and bold moves. The latest is a plan to collaborate with major research institutions such as Harvard, University of Michigan, Stanford, Oxford, and the New York Public library to scan books and other content to make them available through Google’s search engine. This move puts us one step closer to a world where all written knowledge is universally available electronically, but it doesn’t come without a host of questions.
Some details: Google will foot the bill for scanning the texts using its own nondestructive technology. Google will make available the full text of older material that is out of copyright; newer material will be scanned and indexed but made available online only in short excerpts. Michigan and Stanford are opening their entire collections to the project; agreements with the other institutions focus on certain types of content. At Michigan alone, the goal is an amazing 7 million volumes. All of the digitized materials will be available as part of the Google Print program, which also includes current works made available under agreements with publishers. When a user finds a work in this collection, Google Print uses the OCLC WorldCat library holdings data to refer the user to local libraries that hold the book. (More details in a good Gary Price article at Search Engine Watch)
There are other book digitization projects around, but this one is the Big Kahuna simply because it is Google and because the scope is so large.
Some Outsell perspectives on this move:
- There is no shortage of consortia and library groups that have been working on digitization issues in libraries for years. However, it took an outsider third party, Google, to pull this off. In part, that’s because it is probably the only entity with the necessary financial resources to finance this huge undertaking, but it is also because Google is the only player with the audacity to act on the grand vision. Books have been a hole in the online universe – article-length works (news, magazines, scholarly publications) have been available online for years, but it took an outsider to really go after the content buried in books.
- Are the libraries getting enough out of the deal? The agreements provide participating libraries with a copy of the digitized content (which raises the question “who cares?” if it’s all on Google anyway) – but initial reports don’t mention any money changing hands. Google does plan to earn advertising revenue from the content it digitizes, so an opportunity for revenue-sharing may have been missed.
- Where does this leave the role of the library? A common apocalyptic vision is reflected in the Washington Post this morning (“Google – 21st Century Dewey Decimal System: College kids may never darken the library’s door again”). But the evolution of libraries was underway well before Google came along. For some time now we have been writing about the move away from the library as a “content warehouse.” Those that focus on the end of the warehousing era are missing the point; we said around this time last year that “the future of the library is that there is no library - at least not as we know it today.” This isn’t a death knell for libraries; it’s another shove to get librarians out from behind the stacks and harness their expertise, including subject-matter expertise, and to enhance users’ ability to find, use, and access information in any format. Getting out of the business of simply storing books should be a welcome goal, and it is increasingly the goal among forward-looking libraries.
- Be careful what you wish for. As long as Google has indexed Web pages and article-length material, its simple search interface has been an asset. With the inclusion of multiple content types, the interface issues become a bit more gnarly. Google already has special search tricks for isolating Google Print materials, and the evolution of its interface will have to keep up with an increasingly complex content set.
- The other 800-pound gorilla in this game is Amazon. Google has some publishers cooperating in its Google Print program, but Amazon is really the other key player here. We’re moving toward a world in which Google dominates the archival and out-of-copyright works, and Amazon masters the in-print book world. How long before they join forces, or one vanquishes the other in order to fully rule the book world? And in the end, the publishing and library worlds will wonder how this all happened – how did two companies that did not exist a decade ago come to so dominate the information environment?