Wednesday, 6 April 2011

A Treasure-Trove of Travaux

The sun is shining, the birds are singing (the Kat assumes. To be honest, it’s mostly the song of traffic that is reaching his ears at the moment, but he likes to imagine that today is the type of day on which birds would, if the mood took them, sing and that he would therefore hear them if the passage of vehicles on the Euston Road was not drowning them out – he’s a country Kat at heart) and the Kat has been perusing the EPO’s website. He was delighted to stumble upon (although if he had read the press release that was posted on the EPO’s website on Monday then he might have found it by less serendipitous methods) a most significant repository for anyone interested in the history and development of the EPC.

Buried within the archive of the legal texts sub-menu of the law and practice section of the website lies treasure (and treasure it most certainly is): The Travaux Préparatoires of the EPC 1973. All of the historical documentation relating to the 178 Articles of the Convention and 106 Rules of the Implementing Regulations, all lovingly scanned and bundled in .pdf format, can now be accessed at the click of a button. However, English monoglots beware – while most of the material is available in the three official languages of the Convention, there is a stern warning that “The documents produced before 1969 cannot be provided in English as this was not an official language in the period before that date. These documents therefore are provided in French and German”. It is also worth noting that the scanned documents are saved as .pdf images and have not been put through OCR software, therefore no free-text search is possible and the bounties of any online translation tool will also be withheld unless you are willing to transcribe the documents first. Therefore, if you are interested in the provenance of the categories of excluded subject matter, the voting rights under the Convention, or the Rule on the form and content of the claim, or indeed any other of the remaining 281 Articles and Rules, then you need look no further. A treasure-trove of information awaits.

Happy reading!


AndyJ said...

Just a brief note on the last item (Travaux). I'm sure you know that .pdf documents viewed in Adobe Reader or OpenOffice can be searched for text strings using the 'find' option.

Anonymous said...

.pdf is a flexible file format that allows both graphic and text elements.

The TP docs i have accessed contain only graphic elements which cannot be searched via the [find] function nor can the text be cut and paste as text.

Matt is right, these would need to be passed through optical character reconition (OCR) package to generate the text elements.

Anonymous said...

@AndyH - only PDFs with embedded text data can be searched. These do not have embedded text data; they are purely images.

Ron said...

There are at least two different formats of PDF files.

The original PDF encoding saves the file as an image, and all you can do with these is select and save part of the file as an image: searching is not possible.

A later implementation involves character recognition and an encoding technique that produces an image file while retaining the individual character information. This format does allow searching by text string and selection and copying of text in word-processable form, as opposed to an image.

The EPO's more recent published decisions for example, do allow searching in this way, but all the "Travaux" files that I have dipped into have been in the old PDF format and are not searchable by text string.

I believe that there are applications that allow the text in the original PDF Image format to be recognised for word processing, but this functionality is not provided in the standard Adobe Acrobat reader.

Anonymous said...

- super, this means that I can now let go of about 3' of paper that I obtained 1973f courtesy the Max Planck Institute in Munich and kept ever since. Only once did I really need those documents! But having them gave a nice feeling.

George Brock-Nannestad

