|
IPKat-approved Laion
|
A few days ago, the District Court of Hamburg delivered what appears to be
the first judgment in Europe on the construction and application of the national transpositions of the text and data mining (TDM) exceptions found in Arts. 3 and 4 of the
DSM Directive (310 O 227/23).
As reported on
The IPKat and
elsewhere, the Hamburg court ruled that
LAION could rely on the exception found in
Section 60d UrhG (TDM for scientific research purposes). By this provision, Germany had transposed Art. 3 of the DSM Directive into its own law.
So far, the decision has attracted commentary mostly focused on: (1) the construction, by the court, of relevant notions in the EU/German provision – notably ‘text and data mining’, ‘scientific research’ and the possibility for ‘research organizations’ (like LAION) to collaborate with commercial partners; (2) the remarks made by the court in relation to the rights reservation possibility under the German equivalent of Article 4(3) of the DSM Directive; and (3) the interplay between TDM and Artificial Intelligence (AI) training, also in light of the
AI Act.
A fundamental aspect of the decision that deserves greater attention is that the analysis of the court is incomplete. As such, it may not represent good guidance for either concerned stakeholders or other courts in Europe faced with questions of unlicensed TDM and subsequent AI training.
Specifically (and likely because of how the plaintiff photographer pleaded the case), the court failed to consider that the TDM exception for scientific research would not cover all of LAION’s activities as described in the judgment itself, notably the circumstance – following the completion of TDM activities – that LAION made the resulting dataset publicly available for anyone to use and for any purpose, including commercial AI training.
The scope of TDM exceptions: extraction and reproduction
Like the corresponding EU provisions, the German sections considered by the court are exceptions to specified restricted acts under copyright and other rights:
- Like Art. 5(1) of the InfoSoc Directive, Section 44a UrhG (Temporary acts of reproduction), which was held inapplicable, only encompasses acts of reproduction;
- Like Art. 4 of the DSM Directive, Section 44b UrhG (TDM), whose applicability in the case at hand – while doubtful – was deemed not necessary to decide, only encompasses acts of reproduction and, insofar as the sui generis database right is concerned, extraction;
- Like Art. 3 of the DSM Directive, Section 60d UrhG (TDM for scientific research purposes) is limited to acts of extraction and reproduction.
As
explained in an earlier IPKat post, the LAION dataset is a table document that contains nearly 6 billion, seemingly non-curated hyperlinks to publicly accessible images or image files on the internet, as well as further information about the corresponding images, including an image description that provides information about the content of the image in text form. The dataset can be used for various purposes, including AI training, by downloading the images for the whole dataset or a subset of it.
The image below shows the results of a LAION dataset search for ‘blue cat’:
The copyright relevance of subsequent restricted acts
Assuming that the German court was correct in holding that the German equivalent of Art. 3 DSM Directive would apply to LAION’s own TDM activities, following their conclusion, LAION performed restricted acts that are not within the scope of any TDM exception (whether Art. 3 or 4 of the DSM Directive).
By creating a dataset and making it available to the public, LAION performed a (1) potential new act of reproduction (by uploading copies of protected content on the dataset) and (2) an act of communication/making available to the public (by making the dataset publicly accessible on the internet).
(1) New act of reproduction
Insofar as reproduction is concerned, the circumstance that it appears possible to search the LAION dataset and retrieve full-size pictures means that new actionable acts of reproduction under Art. 2 of the InfoSoc Directive (as transposed into German law) might have been performed.
It is worth recalling that the specific type of content at issue here – that is: photographs – can be protected through copyright (if the pictures are sufficiently original) but also through national related rights (irrespective of their originality), in accordance with the freedom afforded to EU Member States under Art. 6
Term Directive, which Germany (like several other EU Member States) exercised. In turn, the copying of even a seemingly simple picture may trigger the right of reproduction.
(2) Act of communication/making available to the public
Turning to communication/making available to the public, this is engaged by both the public display, on the LAION dataset, of the pictures and the provision of links to third-party websites where the pictures are hosted.
Under EU copyright, since the 2014
Svensson decision and subsequent case law
[IPKat here], the provision of a link to protected content can be actionable under Art. 3 of the InfoSoc Directive and be so even if the link provider does not pursue a for-profit intention. Think of
GS Media [IPKat here] and the relevance of the link provider’s own knowledge, even where the link provider might have provided the link in question for non-profit purposes.
More recently, in
VG Bild-Kunst [IPKat here], the CJEU further confirmed that linking to protected content may be restricted not only through technical means (e.g., a paywall), but also – at certain conditions – through contractual terms.
In sum
In light of the foregoing: one thing is the undertaking of restricted acts covered by applicable TDM exceptions; another is the doing of further restricted acts, including those that are propaedeutic to the subsequent training and offering of AI models.
Hence, statements of the court like “whether the dataset … is also used by commercial companies for the training or further development of their AI systems is irrelevant because the research of commercial companies is still research” are problematic in that they unduly simplify and overall misunderstand both (i) the steps needed to transition from LAION’s own TDM activities to the subsequent training and offering of AI models and (ii) their relevance and treatment under copyright.
While the AI Act, as the court noted, recognizes the relevance of TDM to AI training, it does not say – as the court appeared to imply instead – that TDM is synonymous with AI training or that everything in-between TDM and AI training is covered by Arts. 3 or 4 of the DSM Directive.
Other aspects
The judgment is problematic for other reasons too.
Conflating the TDM exception with AI training reveals another problem in the reasoning, While the court correctly considered the need to construe the TDM exception in light of the three-step test (as per Art. 7(2) of the DSM Directive), it erred when it used it to interpret the TDM exception in such a way as to allow the blanket training of AI models, opining that otherwise the TDM exception in Art. 3 would become devoid of meaning. The three-step test in Art. 7(2) is relevant to construe inter alia Arts. 3 and 4 therein, not to include within their scope the doing of additional acts restricted by copyright that a research organization or others can perform.
While there is little doubt, as also noted by the court, that when the TDM exceptions were adopted in 2019, AI training was already understood to be a possible application of TDM, it was an undue simplification on the side of the court not to consider that restricted acts other than those covered by the TDM provisions would be performed in the scenario at hand.
An additional problematic aspect of the ruling is that, despite the remarks on rights reservation under Art. 4(3) of the DSM Directive, the court seemed to consider the concept of ‘lawful access’ (which is a requirement inter alia under the TDM exceptions) as synonymous with ‘public accessibility’. These are different things: for example, posting an image on a website for everyone to see makes that image publicly accessible, but not necessarily also lawfully accessible, given that the person who posted it might have done it without the consent of the concerned rightholder. Cases like
GS Media (linking to publicly accessible yet unlawful
Playboy pictures) or
Renckhoff (use of a photograph posted on a website only for use there) come to mind.
The circumstance that the plaintiff’s watermarked preview photograph was publicly available on a website does not mean that it could be used by anyone for any purpose. Opining otherwise would also be contrary to the characterization of copyright’s exclusive rights as being preventive in nature and the prohibition of exhaustion of the right of communication/making available to the public (Art. 3(3) of the InfoSoc Directive).
Conclusion
The Hamburg decision offers some valuable guidance regarding TDM and AI training, including observations – like those concerning rights reservation – that might prove helpful in future cases.
That said, the judgment appears incomplete in some key respects and for failing to address – let alone answer – the question of the relationship between TDM and AI training. Above all, it is flawed in the part in which it does not acknowledge the limitations in the scope of Art. 3 of the DSM Directive (as well as Art. 4 therein). Further guidance is therefore needed in order to tackle the interplay between TDM and AI training correctly.
As far as I know, LAION doesn't host the picture, so I don't understand how whatever interface they offer on their site could possibly contain a reproduction by LAION. To the extent that the pictures are reproduced during the creation process of their dataset, the reproductions are immediately and irrevocably deleted once the content extraction process is finished (according to LAION).
ReplyDeleteI think what happens is that their demo web interface embeds (embedded? I can't get it to work) pictures from their original locations to showcase how nicely their dataset works. As far as the "actual purpose" of their dataset goes, however, LAION (only) provides a toolbox for mass-downloading the files (eg the "img2dataset tool") to enable the training process; but it never supplies the images, only the URL/content pairs. Unlike the author of this post I don't think that any "uploading [of] copies of protected content on the dataset" ever actually occurred.
Therefore any infringements that may have potentially occurred in the demo interface seem to be logically confined to the realm of the communications right. However, the only thing plaintiff asked for was to enjoin LAION from "reproducing or causing others to reproduce the following photograph [...] for the purpose of creating AI training data sets" (paras 30f of the decision on openjur). So it makes sense to me that the court wouldn't get into any such issue.
All anyone could learn from the "Act of communication/making available to the public" arguments and relevant articles even on this site itself, is that in making some decisions saying *linking* can infringe upon rights, a veritable legal minefield was created that nonetheless most people publishing to a functional internet plow right through, and it's only when a special interest pops up that it is attempted to be used as a surgical blade with little regard for where or how it will end up cutting.
ReplyDeleteQuote:
ReplyDelete"While the AI Act, as the court noted, recognizes the relevance of TDM to AI training, it does not say – as the court appeared to imply instead – that TDM is synonymous with AI training or that everything in-between TDM and AI training is covered by Arts. 3 or 4 of the DSM Directive."
That's not what the court implied. The court does know the difference between TDM and AI training. But the court also knows that the EU AI Act gives rules about how TDM can be used, it specifically mentions the Article 53 1. (c) of the EU AI Act:
"Providers of general-purpose AI models shall put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790;"
This part of the EU AI Act would not make sense if the TDM exceptions in the DSM wouldn't apply for the use of a dataset with the future purpose of AI training. It wouldn't be mentioned in the EU AI Act at all if there wasn't the purpose of giving guidance to providers of general-purpose AI models and it is the law's first sentence to give providers of general-purpose AI models guidance.
It seems that many of the concerns raised in this post fall outside the scope of the judgment. LAION doesn’t host the images, meaning there’s no additional reproduction beyond the initial processing. The image in question was hosted on a site with the right holder’s consent and was made publicly available without any restrictions. Therefore, LAION had “legal access” to the image, and linking to it was legal under the GS Media ruling.
ReplyDeleteIt’s difficult to see why the judgment would be deemed so “problematic” when the issues highlighted by the author extend beyond the case.
The point on "TDM is not AI training " is of course strictly speaking correct as it is a precursor for AI training. However the TDM right makes little sense on its own - you don't mine data to then do nothing with it. Back during the copyright directive debate we always spoke about "big data analysis", the buzz word at the time, but ultimately it comes down to the same thing: identifying useful patterns in the data, which is exactly what happens during AI training. What you then do with the knowledge of these patterns is secondary - you can use them as pattern for a wallpaper in your living room or feed them into an image or text creation tool. The only question then is whether it is a commercial purpose or not.
ReplyDelete