[GuestPost] How the European Patent Office uses AI to facilitate patent searches

The AmeriKat has the t-shirt...now what?


In a second of a series on AI and patents
from our KatFriends at GJE, Kate Voller  reports on a recent CIPA webinar with the EPO on how the EPO is leveraging AI tools in examination - with the key message of "assisting", not "replacing" examiners.  

Over to Kate for the report:

"The European Patent Office (EPO) has embraced artificial intelligence (AI) to enhance the efficiency of its patent document searching process. In a recent CIPA webinar, Alexander Klenner-Bajaja of the EPO explained how the EPO leverages AI tools to support examiners, increasing productivity and improving the quality of patent searches.

We learnt that at the core of the EPO’s AI integration are several specialised tools designed to streamline the search process. Unsurprisingly, natural language processing (NLP) and machine translation technologies help translate and interpret the often complex language and claim-specific syntax. Computer vision is another key tool, using machine learning and neural networks to interpret and analyse visual content in patent documents, including figures, tables, and other graphical elements. This AI-powered technology automatically decodes information from graphical elements, which would often be overlooked in a text-based search alone.

The EPO were keen to emphasize that its AI tools are intended to assist, not replace, human examiners. While AI manages vast amounts of data, the human element remains crucial for final decisions. Examiners retain responsibility for the final review of relevant documents, ensuring their expertise and judgment remain central to the patent search and examination process.

A key advancement in the EPO’s AI efforts came in 2020 with the introduction of the EP-AutoCla model, an AI-powered classification system. The classification system is complex, structured hierarchically into a number of main sections each divided into classes, groups, and subgroups. EP-AutoCla automatically classifies patent applications, relieving examiners of the time-consuming job of classifying documents manually.

It was interesting to learn that training the EP-AutoCla model presented some challenges, particularly due to sparse data in the lower branches of the classification tree. Manual work was required to enhance the training dataset at these lower levels to ensure accuracy of the training process. The model uses supervised machine learning, trained on a dataset comprising 8 million manually classified documents, with an additional 800,000 documents were used for testing the model. Now fully integrated into the EPO's search engine, EP-AutoCla suggests classifications and provides a confidence score indicating the likelihood of correctness.

The EPO's AI-enhanced search process involves several steps to narrow down relevant documents for examiners. Vector space modelling, which converts documents into vector representations, allows comparisons based on conceptual similarity rather than just keyword matching. This helps narrow down millions of potential documents to a manageable number for examiners to review. A k-nearest neighbours (k-NN) algorithm then generates a shortlist of highly relevant documents, even if they don’t share identical keywords but are conceptually similar to the new patent application. Examiners review these shortlisted documents to finalize the search results. The success of this process is measured by whether at least one highly relevant "X citation" document appears in the pre-search results produced by the k-NN algorithm. Around 60% of the top 80 documents generated include an "X citation, demonstrating the system's high accuracy. Though computationally intensive, this process saves examiners significant time, allowing them to focus on only the most relevant documents.

The EPO is continuing to develop AI-driven features to further enhance patent searches and legal research. One upcoming feature is a figure content analysis model that identifies reference signs in prior art figures and maps them to corresponding text in patent descriptions, enabling more precise figure analysis not just based on pixel data but on the content represented in the images, regardless of orientation or style. A similar model is being designed for chemical formulae.

In legal research, the EPO is working on an interactive platform, similar to ChatGPT, but specialized for legal documents. This tool will answer questions about case law and legal texts, providing evidence and citations to support its responses and minimize hallucinations. We were also excited to learn that a new version of Espacenet is being developed that will allow users to perform natural language queries, such as "find patents by company X about concept Y."

Data privacy is a common concern when using AI with patent applications, particularly regarding whether uploading documents to an AI model constitutes a public disclosure. The EPO clarified that its classification model should not be used for unpublished documents, as it runs on a third-party cloud platform. However, the internal AI models, which operate on private servers, can be safely used for all document types, including newly filed and unpublished applications.

In conclusion, the EPO’s integration of AI marks a significant evolution in how patent documents are searched, classified, and analysed. While AI automates many aspects of the process, human examiners remain essential. The EPO’s AI tools help examiners manage growing volumes of data, making patent searches more efficient, accurate, and comprehensive. By continuously developing new AI applications the EPO is setting a new standard for the future of patent examination."

[GuestPost] How the European Patent Office uses AI to facilitate patent searches [GuestPost]  How the European Patent Office uses AI to facilitate patent searches Reviewed by Annsley Merelle Ward on Wednesday, November 13, 2024 Rating: 5

2 comments:

  1. I've played around with the AI search tools of a number of companies and not found them to be particularly great. The issues are:

    1) "AI search tools" that appear to just be processing your request to identify keywords and then carrying out a keyword search. You might as well just input keywords in to a keyword search.
    2) "Summaries" that appear to be just a randomly-selected paragraph from the text of the patent specification.
    3) "Ask this document a question" is actually bad, since you need to actually look at what is said in context.
    4) A general inability to handle the importance of drawings being looked at alongside the text.
    5) Systems which make requests to e.g., OpenAI, resulting in them being very slow.

    AI search is a developing field, but it just isn't where it needs to be to actually assist a professional searcher.

    ReplyDelete
  2. That the EPO takes into account new technologies is, as such, is nothing surprising. That AI can be envisaged as promising is not surprising either.

    Like any learning machine model, the training data are of utmost importance. Otherwise a bias could be introduced without even noticing it. How has it been assured that no bias has been introduced?

    The following statement is questionable: “The success of this process is measured by whether at least one highly relevant "X citation" document appears in the pre-search results produced by the k-NN algorithm.”

    Why is a search delivering at least only highly relevant X citation in pre-search results the guarantee a good search? A good search is not necessarily a search revealing a highly relevant X citation. A search not revealing any X document but only Y or A citations can also be a good search.

    An electronic search can be good at finding highly relevant documents for a lack of novelty, but even so important, are good documents for assessing inventive step, in other words Y citations, preferably in pairs. This is were electronic searches have their biggest drawback: finding documents which can be combined for assessing inventive step.

    Not all X citations can be combined in a way to so as to enable a good argumentation against inventive step. It is easy to draft a novelty objection, and afterwards to overcome it, but way more difficult to find the documents allowing a proper argumentation on inventive step. This is why the Y category has been created. Those documents are the most difficult to find in an electronic search.

    ReplyDelete

All comments must be moderated by a member of the IPKat team before they appear on the blog. Comments will not be allowed if the contravene the IPKat policy that readers' comments should not be obscene or defamatory; they should not consist of ad hominem attacks on members of the blog team or other comment-posters and they should make a constructive contribution to the discussion of the post on which they purport to comment.

It is also the IPKat policy that comments should not be made completely anonymously, and users should use a consistent name or pseudonym (which should not itself be defamatory or obscene, or that of another real person), either in the "identity" field, or at the beginning of the comment. Current practice is to, however, allow a limited number of comments that contravene this policy, provided that the comment has a high degree of relevance and the comment chain does not become too difficult to follow.

Learn more here: http://ipkitten.blogspot.com/p/want-to-complain.html

Powered by Blogger.