Author's Guild Inc. v Google Inc

Gaffen · January 4, 2023, 5:53pm

Author’s Guild Inc. v Google Inc. Is a case that turned up whilst I was crafting a thinkpeice that was ruled and re-ruled in 2013/2015. It has been cited with regard to data mining and copyrighted works, and is part of what lets the google books platform operate.

I’ve done some initial skimming and the case was won as an example of fair use by google due to the “transformative nature” of the digitised works. However this was appealed on multiple grounds - it wasn’t successful but I think the substance of that appeal bears revisiting. To do this we need some understanding of fair use and copyright law:

Fair use rulings are reliant on four criteria;

the purpose and character of the use (it has to be transformative)
the nature of the copyrighted work (fiction is more protected that nonfiction, unpublished is more protected than published)
the amount and substantiality of the portion taken (how much or it was used)
the effect of the use upon the potential market (does it undermine the value of the original work)

The decision was granted only on the grounds of its transformative nature - i.e. that the entire content of a book had been scanned and comitted to an .epub or the like. The definition of “transformative” in this regard usually entails creating new insights, viewpoints, meaning or expression. i.e. has the usage ‘transformed’ the meaning of the work into something that is, arguably, entirely new or distinct.

I would also be interested in seeing further exploration of how digitising the entirety of a work and selling it affects the value of the original work. On a wholescale level this seems very murky, there is definitely more reading to be done here but on the whole it feels a little suspect that Google won this case and I wonder if there is space to push back on the ruling?

PrivacyDingus · January 4, 2023, 6:37pm

Wait no; just to aid my workday-addled brain, this is the same as me taking a VHS, turning it into an .mp4, and then distributing it?

I worry here that there are two possible fallouts, regarding:

https://discourse.leagueofconcernedusers.org/t/common-crawl-a-large-scale-theft-of-intellectual-property/

a fine that is too small, or
a looking of the other way for a company that is too big…

Gaffen · January 4, 2023, 6:44pm

You’re not far off. I guess the ‘extra’ transformation is maybe encoding subtitles and chapter timestamps? Basically the scanned text is searchable and manipulable in a way a print document isn’t. Which you can argue is ‘transformational’, but arguing that it ‘adds meaning’ feels like a big fuckin stretch to me

Gaffen · January 4, 2023, 6:46pm

The interesting thing here is that google attempted a settlement but the Author’s Guild tried to push back and failed… I wonder if there is a string to pull on here? But yes you’re right - there is definitely a big company/small problem issue here. I also wonder whether extralegal action is appropriate here too. Just because something is ‘technically’ legal doesn’t mean we can’t make a stink about it.

I find looking at this issue through the lens of a Youtuber who’s a single copyright strike away from complete channel deletion veeeeeeeery interesting.

Gaffen · January 7, 2023, 5:29pm

Some more relevant research trawled from the net:

Nice to see others are thinking similar thoughts.

It’s worth noting that they reference this paper. I haven’t read the full thing because it’s long, however it seems to be arguing from the perspective of comparing human rights to robot rights. It’s used as justification for “fair learning”, or the allowance of considering ai training data as fair use. As mentioned, I haven’t read it but my gut reaction is that there’s a difference between a lone person/consciousness learning from copyrighted material and a productized service built by a company with a lot of funding doing the same.

Reading the rest of the verge article this seems like a clear example of zuboffian “incursion”. Midjourney is here now, I can’t imagine destroying GPT3 and starting again is an option for these companies.

“AI Data Laundering” is a stroke of genius. I think it really sums up what is happening here.

PrivacyDingus · January 23, 2023, 4:14pm

Sploosh. Apart from the obvious horribles here, it looks like Microsoft might have found an answer to the question why does Bing exist? which isn’t so that we can have API customers and make money through advertisments (such as DuckDuckGo, Qwant, Ecosia etc.)

@Gaffen to your point on Common Crawl, it is very likely and possible that these figures are massively off, but CC’s last crawl came to 3.15bn pages.