The Google Books Settlement and the NSA

Posted by Aaron Massey on 15 Nov 2013.

This week Google won a lawsuit brought against them by the Author’s Guild. The lawsuit has been dragging on for some time, and I’m not going to be able to thoroughly summarize it here. Instead, I want to compare and contrast some of the reasoning used in the Google Books case with rationale that could be used to justify mass collection of communications information because I believe this may be a teachable moment for software engineering ethics.

Obviously, the Google Books settlement is not actually equivalent to any mass surveillance program. However, both programs are similar at a basic technical level. The idea behind Google’s book scanning project is to digitize every book (yes, all of them) and make them searchable. The problem, and the reason for the lawsuit, is that author’s have some rights over the works they have created. The idea behind mass surveillance programs is to collect every communication (yes, all of them) and make them searchable. The problem is that citizens have some rights over their communications. At a high level, if you abstract away the details, these are pretty similar programs.

Here’s the basic outline of the arguments that won the Google Books lawsuit, as quoted from the Reuters article earlier:

The decision, if it survives an expected appeal, would let Google continue expanding the library, which it said helps readers find books they might not otherwise locate.

One of the key ideas of mass surveillance is that it would be useful in finding terrorists that we might not otherwise locate.

He also said Google’s digitization was “transformative,” meaning it gave the books a new purpose or character, and could be expected to boost rather than reduce book sales.

If digitizing books is ‘transformative,’ rather than the more banal ‘copying,’ then couldn’t you argue that mass surveillance is a protective measure rather than a civil rights violation? Perhaps we should expect it to boost our privacy rather than violate it.

The judge noted that Google takes steps to keep people from viewing complete copies of books online, including by keeping some snippets from being shown.

Aha! So that’s how we are protected. The government can’t actually view our complete communications, just parts of them, like the metadata.

“In my view, Google Books provide significant public benefits,” Chin wrote. “Indeed, all society benefits.”

Yeah, overall, the public benefits. You’re only hurt by surveillance if you’re an author… err, I mean, if you have something to hide.

Google began creating the library after the company agreed in 2004 with several major research libraries to digitize current and out-of-print works.

Among the libraries that have had works scanned are Harvard University, Oxford University, Stanford University, the University of California, the University of Michigan and the New York Public Library.

The government can’t surveil society on it’s own. It needs help from ISPs, telecommunications companies, and other organizations…

As I said before, I don’t actually think these two cases are equivalent. Reasonable people can agree with the basic arguments in favor of allowing Google’s book scanning to continue and disagree with the basic arguments in favor of allowing government surveillance to continue. Or vice versa. Or any other combination of views on these basic arguments. These aren’t trivial issues, and the context matters. That is, of course, the point of the comparison.

Basic technologies, like those used to store or search huge quantities of digitial information, are context-neutral. The building blocks for both programs is the same: a giant set of servers, mechanisms for performing searches under certain circumstances, mechanisms for ensuring no one sees too much of the data. Databases don’t care whether they are full of books or phone calls. Neither of the arguments are technical in nature. They are policy arguments.

These technologies also share another important detail: they require engineers to build and maintain them. Software rots, and data does too. The engineers that build and maintain technologies have an ethical responsibility to understand the policy arguments used to justify whatever they are building. Engineers cannot claim that they were ‘just following orders’ and abdicate their ethical responsibilities.