The IPKat has received and is pleased to host the following post by Katfriend Georgia Jenkins (University of Liverpool) on the recent Anthropic settlement. Here’s what Georgia writes:
What is the value of a pirated book copied to train LLMs? Apparently only USD$ 3,000
by Georgia Jenkins
![]() |
| Anthropic's Claude ... |
It is Bartz that has sent copyright into overdrive, particularly the colourful nuggets of Judge Alsup – for example: “If Anthropic loses big it will be because what it did wrong was also big.”
In August 2024, authors Andrea Batz, Charles Graeber and Kirk Wallace Johnson commenced copyright infringement proceedings against Anthropic, a software firm that offers an AI software service called Claude. The case hinged on Anthropic’s unauthorised use of pirated and purchased copies of books to create a central library for the purpose of training the large language models (LLMs) that underly Claude. The library comprised both ‘traditional’ copies and versions called ‘data mixes’ which optimise training data and improve LLM performance.
At the height of the summer, Anthropic moved for summary judgement citing fair use in relation to the pirated and purchased copies used for training LLMs and creating a permanent library.
Training copies
The authors’ first argument turned on the similarity of training LLMs to the creative process. They argued that its intention was to memorize their works’ creative elements or, put differently, to train it to read and write. In short, an inherently human process that ‘should’ fall outside of the first factor of fair use, purpose and character. In stark contrast, Judge Alsup found that training is a ‘quintessentially transformative’ use, stating that: “The technology at issue was amongst the most transformative many of us will see in our lifetimes.”
Even if this was transformative, the authors argued that Anthropic engaged in extensive copying that was not strictly necessary. And while entire books were copied, Judge Alsup found that the training copies differ from the work’s ordinary use (e.g. reading). Additionally, although Anthropic demonstrated that it could have used a smaller set of books, in terms of output no portion of the works were exposed to the public. Though this process generates potentially competing works and could foreclose future licensing opportunities for authors, Judge Alsup pointed toward competition not being a justification for copyright.
The central library
Anthropic’s library comprises digital copies of lawfully acquired print books, pirated digital books, and copies of each through data mixing:
1. Purchased library copies
Here the authors complained that Anthropic “destructively” changed the format from print to digital. However, as they destroyed the print versions, there were no new copies, and this process also eased storage and enabled searchability. This echoed cases like Google Books, Sony and Napster which affirm digitization as falling outside remit of the copyright holder’s interest, at least in certain specific cases. There was no issue of the amount taken as format shifting required the whole work.
2. Pirated library copies
Unsurprisingly, the pirated copies did not benefit from Anthropic’s argument that they had future potential to train LLMs. This is something that Anthropic confusingly also hinted, stating that:
You can’t just bless yourself by saying I have a research purpose and, therefore, go and take any textbook you want. That would destroy the academic publishing market if that were the case.
![]() |
| ... and Claude Kat |
A (brat-ish) outcome
The case hung on the pirated copies and copies that were not (yet?) used to train LLMs. The former were not transformative, and the latter was inconclusive due to lack of evidence. Both would require a trial.
But in a twist worth of deeming “AI copyright autumn”, Anthropic soon agreed to pay at least $USD 1.5 billion plus interest to authors (now a class action). As there are approximately 500,000 works, it amounts to $USD 3,000 per work. They’ve agreed to destroy pirated datasets and, in exchange, they avoid litigation relating to conduct up to 25 August 2025.
Not one to be left out, Judge Alsup commented that he was “disappointed that counsel left important questions to be answered in the future”, and ended up postponing the class settlement and ordering parties to address 34 questions (here and here) relating to the settlement. Many questions centre upon unpacking the approach to the settlement, particularly multiple claim scenarios (authors and/or publishers) and the potential "gamesmanship” of the process.
The parties’ joint response swayed Judge Alsup as two weeks later he reportedly approved the settlement. Described as “the largest copyright recovery of all time”, anyone can register for the class action here if they believe that Anthropic may have downloaded their books from the pirated sources. However, potential members must have had their book downloaded before August 2022, have an ISBN or ASIN, and registered with the US Copyright Office before the book was downloaded.
Some have already speculated what authors will eventually receive, post legal fees, due to the narrow qualifying criteria and multiple claim scenarios:
[T]raditionally published authors might see around $1,000-1,500 per book. Self-published authors who own their rights would keep more. Academic authors or others who signed away their rights might get nothing.
While this saga underlines the importance of licensing as a departure point from sticky copyright questions, one can’t help but think we have been launched into more chaos. It is worth highlighting that Anthropic is classed as a startup whose valuation has steadily increased following Claude’s release in March 2023 and which is backed by Amazon. It also raised $USD 13 billion in funding at a $USD 183 billion post-money valuation while nutting out the details of the settlement.
Bartz evidences a turning point in a post-AI world for copyright enthusiasts but, for Anthropic, alongside its supporters and competitors, perhaps this is simply the cost of doing business. Some have quoted former Google CEO Eric Schmidt’s comment last year that:
[I] f your product takes off, you “hire a whole bunch of lawyers to go clean the mess up,” because “if nobody uses your product, it doesn’t matter that you stole all the content.
But, for this Katfriend, Mark Zuckerberg’s comments to “move fast and break things” seems more appropriate. Only the thing that has been broken is the social and cultural value of human creativity, priced at $USD 3,000 per work (and only for copying pirated books).
So, Judge Alsup spoke for many when he stated while adjourning the hearing: “I’ve learned a lot.”
[Guest post] What is the value of a pirated book copied to train LLMs? Apparently only USD$ 3,000
Reviewed by Eleonora Rosati
on
Saturday, September 27, 2025
Rating:
Reviewed by Eleonora Rosati
on
Saturday, September 27, 2025
Rating:




No comments:
All comments must be moderated by a member of the IPKat team before they appear on the blog. Comments will not be allowed if the contravene the IPKat policy that readers' comments should not be obscene or defamatory; they should not consist of ad hominem attacks on members of the blog team or other comment-posters and they should make a constructive contribution to the discussion of the post on which they purport to comment.
It is also the IPKat policy that comments should not be made completely anonymously, and users should use a consistent name or pseudonym (which should not itself be defamatory or obscene, or that of another real person), either in the "identity" field, or at the beginning of the comment. Current practice is to, however, allow a limited number of comments that contravene this policy, provided that the comment has a high degree of relevance and the comment chain does not become too difficult to follow.
Learn more here: http://ipkitten.blogspot.com/p/want-to-complain.html