2/4/07

Google will "scan every book ever published"

The New Yorker's Annals of Law column has an article that describes the tribulations that Google faces in reaching its no-longer-unimaginable goal of digitizing every book ever published.

I've said before that Google's plans were quite transparent. They started out planning to copy and index every page on the Internet. Now they are doing the same for printed books. The will do the same for images, maps, email, data. Copy everything and index it.

But is that legal? Here is the beginning of the article:
GOOGLE’S MOON SHOT: The quest for the universal library
by JEFFREY TOOBIN
Issue of 2007-02-05; Posted 2007-01-29

Every weekday, a truck pulls up to the Cecil H. Green Library, on the campus of Stanford University, and collects at least a thousand books, which are taken to an undisclosed location and scanned, page by page, into an enormous database being created by Google. The company is also retrieving books from libraries at several other leading universities, including Harvard and Oxford, as well as the New York Public Library. At the University of Michigan, Google’s original partner in Google Book Search, tens of thousands of books are processed each week on the company’s custom-made scanning equipment.

Google intends to scan every book ever published, and to make the full texts searchable, in the same way that Web sites can be searched on the company’s engine at google.com. At the books site, which is up and running in a beta (or testing) version, at books.google.com, you can enter a word or phrase—say, Ahab and whale—and the search returns a list of works in which the terms appear, in this case nearly eight hundred titles, including numerous editions of Herman Melville’s novel. Clicking on “Moby-Dick, or The Whale” calls up Chapter 28, in which Ahab is introduced. You can scroll through the chapter, search for other terms that appear in the book, and compare it with other editions. Google won’t say how many books are in its database, but the site’s value as a research tool is apparent; on it you can find a history of Urdu newspapers, an 1892 edition of Jane Austen’s letters, several guides to writing haiku, and a Harvard alumni directory from 1919.

No one really knows how many books there are. The most volumes listed in any catalogue is thirty-two million, the number in WorldCat, a database of titles from more than twenty-five thousand libraries around the world. Google aims to scan at least that many. “We think that we can do it all inside of ten years,” Marissa Mayer, a vice-president at Google who is in charge of the books project, said recently, at the company’s headquarters, in Mountain View, California. “It’s mind-boggling to me, how close it is. I think of Google Books as our moon shot....”

1 comment:

Anonymous said...

I think Google will run out of memory before they even finish scanning Jacob Neusner's books.