When does AI infringe copyright?

Rise of the MaKatchines
IBM’s research lab recently announced that it had trained facial recognition software using a huge database of images taken from the photo-sharing website Flickr. The press coverage focuses on the data protection implications. But what about the copyright ones? 

Katfriend Oliver Fairhurst (Stewarts) investigates.

Here's what Oliver writes:

When does AI infringe copyright?
by Oliver Fairhurst

There has been much commentary on the ownership of works created by so-called “AI”. This post focuses instead on the risk of infringement, particularly during the AI ‘training’ process. 

The study of artificial neural networks (“ANNs”), which are a part of the broader concept of AI, is a complex and evolving area. ANNs imitate the way that the human brain operates, learning from examples, e.g. learning to identify pictures of cats (or even Kats) by showing the ANN lots of examples of pictures with and without cats / Kats. One key challenge in developing ANNs is ‘training’ them. This is typically done through the use of large data sets or other materials from which the ANN learns. 

Your face or mine? 

The potential applications of machine learning are varied. Whether they know it or not, lawyers are already getting hands on with machine learning. Litigators may know of “Technology Assisted Review”, which is an iterative process in which the software is ‘taught’ to identify potentially relevant documents in disclosure. If the lawyer tags one type of document as relevant, and another as not relevant, the software learns from those choices

One of the most advanced and widely deployed applications is in facial recognition. While standing at passport control waiting for the gate to recognise your face can be frustrating, the advancements over the past decade are astonishing. 

Much of this development has come as a result of what the IBM paper calls “data-driven deep learning methods”, i.e. the feeding of vast amounts of data into ANNs and letting them learn from what they see. 

The IBM researchers point out the major flaw in this process. The machine learns based on what it is fed. Data sets are often biased, reflecting the data gathering choices of the authors and the availability of data. They are also often not representative. Rubbish in, rubbish out. So the researchers decided to find the biggest source of publicly available images they could, and turned to Flickr. 

Flickr allows users to permit third parties to use their works under the “Creative Commons” licence. Users can grant a free licence with various restrictions, e.g. requiring attribution, not permitting use for commercial purposes, and prohibiting the creation of derivative works. 

According to the paper, the researchers created a data set of images from Flickr that featured faces. They then “proceeded with the download only if the license type was Creative Commons”. Download would appear to mean copying in the copyright sense. 

To be clear, I am not querying the legality of the IBM project itself for a number of reasons, including that the researchers relied on the Creative Commons licence, the paper does not go into detail on the technical methodology required to assess the copyright position, and the research took place in the USA. But what is the position with regard to training AI using third party copyright works in the UK generally? 


To start, s.16(2) and (3) of the Copyright, Designs and Patents Act 1988 (“CDPA”) provide that copyright in a work is infringed by a person who without authorisation does any of the restricted acts contained in the CDPA. One of those restricted acts is copying: s.17(2) CDPA provides that “Copying in relation to a literary, dramatic, musical or artistic work means reproducing the work in any material form. This includes storing the work in any medium by electronic means.” Section 17(6) states that, “Copying in relation to any description of work includes the making of copies which are transient or are incidental to some other use of the work.” 

There appears to be three main ways in which training an ANN could infringe copyright through copying:
  1. By copying the works onto a memory drive in preparation for training the ANN; 
  2. By processing the works on computer hardware during the training process; and 
  3. Potentially, through the creation by the ANN of a derivative work that reproduces elements of an original work. 

In relation to (3), the question will come down to whether the ANN-created work reproduces a substantial part of the earlier work on which the ANN was trained. This is a qualitative test: does the new work reproduces elements that are an expression of the intellectual creation of the original work’s author (see, e.g. Infopaq C-5/08)? 

Liable ... or not?
There are three potentially relevant defences to (1) and (2): 

First, s.28A CDPA provides that “Copyright in a literary work, other than a computer program or a database, or in a dramatic, musical or artistic work, the typographical arrangement of a published edition, a sound recording or a film, is not infringed by the making of a temporary copy which is transient or incidental, which is an integral and essential part of a technological process and the sole purpose of which is to enable— (a) a transmission of the work in a network between third parties by an intermediary; or (b) a lawful use of the work; and which has no independent economic significance.” 

The logic behind this defence (which originates in Article 5(1) of Directive 2001/29/EC, i.e. the “InfoSoc Directive”) is that otherwise the reproduction right could prevent reasonable uses of works, such as transmission through computer networks or browsing the internet. 

Second, s.29(1) CDPA provides that “Fair dealing with a work for the purposes of research for a non-commercial purpose does not infringe any copyright in the work provided that it is accompanied by a sufficient acknowledgement”. 

Third, s.29A CDPA provides that copyright in a work is not infringed by a person who has “lawful access to the work” and performs “computational analysis of anything recorded in the work for the sole purpose of research for a non-commercial purpose”. 

Is training an ANN non-commercial? Where the use is research for research’s sake, the answer is probably yes. It is less clear cut where the output of the research is a commercially valuable product. In some cases, ANNs may compete with the authors of the original works e.g. artists and musicians. The authors of Copinger also doubt the application of the defences that rely on the purpose being non-commercial when the research “is contemplated or intended should be ultimately used for a purpose which has some commercial value”. 

Is the use lawful? Recital 33 to the InfoSoc Directive explains that “A use should be considered lawful where it is authorised by the rightholder or not restricted by law.” With the IBM example, and putting aside the reliance on the Creative Commons licence, some have queried whether the use is in accordance with data protection legislation. One might also question if the “sole purpose” of the copying is to enable a lawful use of the work, or whether the copying of the work is one purpose? And does the use lack independent economic significance, or could the body of works have been licensed for the purpose? These are all questions that are yet to be answered directly by the case law, and illustrate the importance of those developing these new technologies considering the legal implications of their proposals. 


Whether or not you fall into the computers will rule the world camp, or think that all of this talk of AI is overblown, the development of machine learning is here to stay. It has incredible potential, from assisting healthcare practitioners in making better decisions to helping lawyers and their clients keep a handle on costs. 

The law must set out tolerably clear lines that researchers and businesses can understand and navigate. For my part, it does seem that the UK position at least is reasonably clear. However, the dial is set firmly in favour of copyright owners and against commercial entities looking to develop machine learning. Is this right? 

There is no easy answer. On one hand, the risk of IP infringement claims hampers the successful development of machine learning. On the other hand, the developers of such applications cannot ignore the rights of those who have toiled in the creation of the very works they wish to use. Perhaps copyright law should protect human authors from computers that can gobble up their creative works and spit out an infinite number of new works. Or perhaps copyright law should foster innovation by limiting the exclusive rights granted to those who create. 

These are arguably value judgements rather than legal ones. However, absent any significant tinkering with copyright defences (e.g. by permitting research for commercial purposes), the only chance of a loosening up in this area will come from court decisions. So far there have been few (if any) copyright infringement claims relating to the use of machine learning. However, as the industry grows, this is unlikely to remain the case.
When does AI infringe copyright? When does AI infringe copyright? Reviewed by Eleonora Rosati on Wednesday, March 20, 2019 Rating: 5


  1. Kind of strange that this post doesn't mention the proposed defences relating to text and data mining in Articles 3 and 3A of the final draft Copyright Directive. It is not just Articles 11 and 13 of the Directive that contain important provisions! I guess that may be academic in terms of UK law, as the Directive is unlikely to be implemented pre-Brexit, but it does address the balance (and offers greater freedom for non-commercial research than for commercial projects, which to my view, must be right).

    1. I take your point. Though the benefit of the new exemptions will be limited to non-profits, which does not address the point I am making about whether this gives enough freedom for the private sector. There are also lots of uncertainties over exactly what impact the directive will have (if it gets through), when/how/if it is implemented in the UK and how some of the concepts are interpreted by the CJEU. Oliver


All comments must be moderated by a member of the IPKat team before they appear on the blog. Comments will not be allowed if the contravene the IPKat policy that readers' comments should not be obscene or defamatory; they should not consist of ad hominem attacks on members of the blog team or other comment-posters and they should make a constructive contribution to the discussion of the post on which they purport to comment.

It is also the IPKat policy that comments should not be made completely anonymously, and users should use a consistent name or pseudonym (which should not itself be defamatory or obscene, or that of another real person), either in the "identity" field, or at the beginning of the comment. Current practice is to, however, allow a limited number of comments that contravene this policy, provided that the comment has a high degree of relevance and the comment chain does not become too difficult to follow.

Learn more here: http://ipkitten.blogspot.com/p/want-to-complain.html

Powered by Blogger.