Archiving online citations: are we all Americans now?

As an occasional academic himself, the IPKat takes great interest in the issue raised in the following request for information from his friend Susan Hall (Cobbetts), who writes to him as follows:

"As citing online sources in academic papers becomes more acceptable, one issue which is becoming more important is the fragile nature of online materials and the obvious concern when reviewing or answering papers which cite online sources, that such sources remain accessible for verification and follow-up. A tool called WebCite, offered here, purports to offer an answer, allowing people who cite online sources to cache them on the WebCite servers, on the terms set out on the site in question.

However, this seems to produce a number of interesting IP implications on both sides of the Atlantic (incidentally, the examples of archiving of site uses are drawn from the Guardian).

The advertised service appears intended to protect people using online sources in academic research from dead links and future take-downs, but there seem difficulties fitting the WebCite model into a UK copyright framework. Further, although the WebCite FAQs put forward a justification based on fair use under US law, this is itself not devoid of problems:

"Caching and archiving webpages is widely done (e.g. by Google, Internet Archive etc.), and is not considered a copyright infringement, as long as the copyright owner has the ability to remove the archived material and to opt out. WebCite® honors robot exclusion standards, as well as no-cache and no-archive tags. Please contact us if you are the copyright owner of an archived webpage which you want to have removed".

A U.S. court has recently (Jan 19th, 2006) ruled that caching does not constitute a copyright violation, because of fair use and an implied license (Field vs Google, US District Court, District of Nevada, CV-S-04-0413-RCJ-LRL, see also news article on Government Technology). Implied license refers to the industry standards mentioned above: If the copyright holder does not use any no-archive tags and robot exclusion standards to prevent caching, WebCite® can (as Google does) assume that a license to archive has been granted. Fair use is even more obvious in the case of WebCite® than for Google, as Google uses a “shotgun” approach, whereas WebCite® archives selectively only material that is relevant for scholarly work. Fair use is therefore justifiable based on the fair-use principles of purpose (caching constitutes transformative and socially valuable use for the purposes of archiving, in the case of WebCite® also specifically for academic research), the nature of the cached material (previously made available for free on the Internet, in the case of WebCite® also mainly scholarly material), amount and substantiality (in the case of WebCite® only cited webpages, rarely entire websites), and effect of the use on the potential market for or value of the copyrighted work (in the case of Google it was ruled that there is no economic effect, the same is true for WebCite®)." (FAQs)

Asks Susan, "Is anyone doing any work on archiving of online sources and the legal issues entailed?" If so, she -- and, out of sheer curiosity, the IPKat -- would love to know.

Not just any old IPKat ...

* "Most Popular Intellectual Property Law Blawg" of all time according to Justia rankings, April 2025.

* "Most Popular Copyright Blawg" of all time according to Justia rankings, April 2025.

* "Best UK Intellectual Property blog" of all time according to FeedSpot, April 2025.

* PermaKat Eleonora Rosati has been quoted, and the IPKat has also been hyperlinked on the New York Times, April 2024.

* PermaKat Eleonora Rosati and The IPKat are expressly recommended as sources to follow to get an "unstuffy look at IP issues" according to Legal Business, April 2023.

* PermaKat Eleonora Rosati received the 2022 Adepi Award.

* PermaKat Eleonora Rosati listed as one of the World Intellectual Property Review's "Influential Women in IP" of 2020.

* PermaKat Eleonora Rosati listed as one of the Managing Intellectual Property magazine's "Fifty Most Influential People" of 2018.

* IPKat founder and Blogmeister Emeritus Jeremy Phillips listed as one of the Managing Intellectual Property magazine's "Fifty Most Influential People" of 2005, 2011, 2013, and 2014.

* Recommended by the European Patent Office as reading material for candidates for the European Qualifying Examinations, 2013.

* Listed as "Top Legal Blog" in The Times Online, March 2011.

* One of the only two non-US blogs listed in the Blawg 2010 ABA Journal 100.

* Court Reporter Top Copyright Blog award winner, November 2010.

* Number 1 in the 2010 Top Copyright Blog list compiled by the Copyright Litigation Blog, July 2010.

* Selected by the United States Library of Congress for inclusion in its historic collections of Internet materials related to Legal Blawgs as of 2010.

* Top Patent Blog poll 2009: 3rd out of 50 in the "Favourite Patent Blog" poll and 2nd out of 50 in the "Most-read" poll.

* ComputerWeekly IT Law and Governance Blog of the Year, 20 August 2008.

* Best of the Blogs, Times Online, 21 August 2008.

7 comments:

AnonymousTuesday, 29 June 2010 at 18:22:00 GMT+1
"Please contact us if you are the copyright owner of an archived webpage which you want to have removed."

This introduces the same lack of control that renders reliance on such as the Internet Archive legally questionable - material can disappear at the copyright owner's whim.
AnonymousTuesday, 29 June 2010 at 22:10:00 GMT+1
Try these guys: http://www.webarchive.org.uk/ukwa/
Francis DaveyTuesday, 29 June 2010 at 23:06:00 GMT+1
An alternative would be to encourage academics from other fields to adopt the highly successful model of arXiv. Quality appears to be very high (peer-reviewed journals are not free from error) and the physics community in particular have been working this way for a long time.

Consensual sharing strikes me as being a more workable solution, given varying copyright laws around the world, than something like this. Of course that doesn't deal with older material, but that is a problem anyway.

I'd be interested to know what US lawyers think of the fair use argument.
LakanalTuesday, 29 June 2010 at 23:25:00 GMT+1
A case can be made that the ambiguity of the US fair use doctrine allows entrepreneurs of all stripes to try out new business models for at least as long as it takes for a copyright owner to work out what the law is and bring an action to judgment. Even from a US perspective, WebCite's justification is magnificent in its preposterousness. No caching is taking place. No "transformative use" is taking place. And Berne's three step test? A joke to those who believe in a society based on law; meat and drink to Google and co.
JaviWednesday, 30 June 2010 at 12:16:00 GMT+1
there was a similar case in Barcelona regarding Google's cache system.

http://www.linksandlaw.com/news-update60-megakini-case-spain.htm
Charles OppenheimWednesday, 30 June 2010 at 12:53:00 GMT+1
The only person, I think, who is researching these issues is Adrienne Muir from Loughborough University; she knows all there is to know about legal issues of archiving web pages. Incidentally, it may be acceptable under US copyright law, but it is definitely illegal under UK law, so the Webcite service is not recommended for UK folk!
Maximilian SchubertThursday, 1 July 2010 at 14:04:00 GMT+1
Art 43b (1) of the Austrian "Medien Gesetz" explicitly allows the Austrian National Library to archive websites under Austrian top-level domains (.a) or websites which contain content that is related to Austria.
Web@rchive Austria
The data can be accessed only from inside the National Library trough a single terminal. You may look at the website, copying is not allowed, print-outs are possible.

Although the service is seriously limited (one user at a time) I have to say that the service works surprisingly well and I think its just a matter of time until this service is extended to more frequent "harvests" (only two so far) and more depth (only a few MB per site so far). The fact however that users can not copy but only "print" websites ... should serve as a reminder that Austrian copyright obviously & desperately needs an overhaul.

All comments must be moderated by a member of the IPKat team before they appear on the blog. Comments will not be allowed if the contravene the IPKat policy that readers' comments should not be obscene or defamatory; they should not consist of ad hominem attacks on members of the blog team or other comment-posters and they should make a constructive contribution to the discussion of the post on which they purport to comment.

It is also the IPKat policy that comments should not be made completely anonymously, and users should use a consistent name or pseudonym (which should not itself be defamatory or obscene, or that of another real person), either in the "identity" field, or at the beginning of the comment. Current practice is to, however, allow a limited number of comments that contravene this policy, provided that the comment has a high degree of relevance and the comment chain does not become too difficult to follow.

Learn more here: http://ipkitten.blogspot.com/p/want-to-complain.html