A Big Bang is coming, or at least this is how the World Intellectual Property Office (WIPO) has billed the entry into force of the new sequence listing requirement, ST.26 on 1 July 2022.
There is an international requirement for the DNA, RNA and protein sequences disclosed in a patent application to be provided in the form of a sequence listing. Sequence listings are then used by patent offices to search for the listed sequences. At the moment, the format of sequence listings is governed by ST.25. On 1 July 2022, all patent offices (international and national) will transition to using the new ST.26 format. All patent applications containing sequences filed on or after this "big bang" date will have to comply with ST.26.
DNA to amino acid code |
How to prepare ST.26 sequence listings
The introduction of ST.26 has been preceded by the release of some new purpose-built WIPO software for preparing ST.26 sequence listings: the prosaically named WIPO sequence. Whilst there was no absolute requirement for ST.25 sequence listings to be prepared using specialist software, using the free software from the EPO and the USPTO considerably sped-up the process (PatentIn and BiSSAP respectively). The more complex XML format of the ST.26 format renders the use of the new WIPO software for ST.26 sequence listings essential. From this Kat's quick trial of WIPO sequence, the software is fairly straight-forward to use (and thankfully slightly more user-friendly than PatentIn).
WIPO sequence can be used to convert old ST.25 TXT sequence listings into the new ST.26 XML format. Patent offices will not do this conversion automatically, and so this responsibility will fall to the applicant. The need to convert ST.25 sequences will be particularly relevant for divisional applications filed on or after the big bang date, given that ST.26 will apply to these cases (at least for the USPTO and EPO - see comment on the requirements for the UKIPO below). The WIPO software can also be used to check for errors in a ST.26 sequence, and to view ST.26 XML files in a human readable format within a browser.
Increased ease-of-access to patent application sequences?
The purpose of changing sequence listings to an XML format is to improve access to the global repository of sequences contained in patent applications. However, the change may in fact have a negative impact on how easy it is to view the sequences from a particular published patent application. In particular, the change from a TXT to XML format has significant implications for the human readability of sequence listings. The new XML sequence listings include all of the XML coding elements, making it almost impossible to decipher the raw XML file itself. It is necessary to upload the XML file to the WIPO sequencing software in order for the file to be viewed in a human-readable format in a browser window (given that it is the nature of XML files that simply opening the XML file in browser software does not work).
A factor impacting the accessibility of patent application sequences to third parties will be the form in which the new sequence listings are published by individual patent offices. In terms of public access, it would be preferable for the XML files themselves to be published on the relevant patent registers. A PDF copy of the XML file, e.g. included in the PCT publication, will be difficult (if not impossible) for third parties to convert into a human readable format, even if they had access to WIPO sequence.
Even if the XML files are provided on the public register, third parties will need to download the XML sequences and run them on the new software before the sequence may be extracted. Whilst most biotech patent attorneys can be expected to have the necessary software, the lay scientist may not. As such, there is a risk that the non-human readable XML format may reduce ease of access to the sequences in patent applications for non-patent experts. Of course, the sequences provided in a sequence listing are also provided throughout the description of a patent application. However, the sequence listing has, up until now, been a quick and easy method of, for example, working out what a claim to "SEQ ID 128" actually refers.
Whilst ST.26 is a global WIPO initiative, the decision of how accessible to make the new sequence listings on the public register is now in the hands of individual patent offices.
Update 28 Feb 2022: IPKat has been informed by the UKIPO that "Having considered a range of views and issues arising with the filing of divisional patent applications, we can confirm that the ST.26 sequence listing format will be required only for divisional applications arising from parent applications filed on or after 1 July 2022 (i.e. the ‘big bang’ date)."
Apart from the verbosity of the new standard, which when compared to TXT files is pretty significant, there aren't really any obstacles to obtaining the sequence data.
ReplyDeleteAt least one free office suite is capable of loading the XML file into a spreadsheet, and allowing the user to choose which tags to query for the desired content , i.e. the sequences themselves. Obviously, the limit to that would be the number of lines of data that the spreadsheet can support, but still, this is nonetheless fairly significant.
Alternatively, there are multiple XML reader/editors out there, some of them free, some of them integrated into a database environment (e.g. BaseX, existDB, etc) which allow parsing of the XML using XQuery to display only the desired content, and then export that to another format, e.g. CSV, Excel, etc.
I don't really see this as the problem the Kat makes this out to be, unless I've missed something.
I hesitate to suggest that you have rather proved than disproved my point, if sequence listings are to be easily accessible to none "BioInformatics Nerds"? Whilst XML readers exist (including of course WIPO sequence), interpreting sequence listings will require more processing steps and more patent/bioinformatics expertise. My point was not that the sequences will be completely inaccessible, merely that they will require more time and expertise to understand. Your point also assumes that a txt form of the XML file will be published, and not just a PDF, and I don't believe this is yet clear.
ReplyDeleteI would just like to add here that if the aim of a person downloading a sequence listing is to inspect and interpret the technical content of the sequence, then in all probability they are likely to be a sophisticated enough user to be able to handle the xml wrapping around that content. Let's not forget that sequence listings aren't any more understandable to the lay person just because they have a txt file extension and are laid out in fixed-space non-proportional font. There is no getting around the requirement to understand what each sequence means, irrespective of the container in which it is stored, otherwise what would the use case be for someone needing to download, retrieve and interpret the data contained in the sequence listing, and how would wrapping that data in an xml wrapper make understanding that data any more arduous ?
DeleteJust to say that we've got a CIPA webinar scheduled for Monday 21st March all about WIPO ST.26, and including a speaker from the UKIPO. Details here: https://www.cipa.org.uk/events/everything-you-need-to-know-about-sequence-listings-wipo-st-26-and-the-forthcoming-big-bang/
ReplyDelete