ST. 26 sequence listings: A forward or backward step for ease of access to patent sequence data?

A Big Bang is coming, or at least this is how the World Intellectual Property Office (WIPO) has billed the entry into force of the new sequence listing requirement, ST.26 on 1 July 2022.

There is an international requirement for the DNA, RNA and protein sequences disclosed in a patent application to be provided in the form of a sequence listing. Sequence listings are then used by patent offices to search for the listed sequences. At the moment, the format of sequence listings is governed by ST.25. On 1 July 2022, all patent offices (international and national) will transition to using the new ST.26 format. All patent applications containing sequences filed on or after this "big bang" date will have to comply with ST.26.

DNA to amino acid code

The stated aims of the switch to ST.26 is to harmonise sequence listing practice among different patent offices, and to render the sequence listing format compatible with international sequence databases. Particularly, the most obvious and practical difference between a ST.25 and ST.26 sequence listings is the change from TXT to XML format. The use of XML allows for an easier transfer of the sequences to international sequence databases without loss of data. In response to the growing diversity of non-typical sequence types, ST.26 also requires mandatory annotation of D-amino acids, branched sequences and nucleotide analogs. More minor changes in ST.26 include a switch to the use of "t" instead of "u" to represent uracil in RNA sequences, and use of the one-letter as opposed to three-letter amino acid code.

How to prepare ST.26 sequence listings

The introduction of ST.26 has been preceded by the release of some new purpose-built WIPO software for preparing ST.26 sequence listings: the prosaically named WIPO sequence. Whilst there was no absolute requirement for ST.25 sequence listings to be prepared using specialist software, using the free software from the EPO and the USPTO considerably sped-up the process (PatentIn and BiSSAP respectively). The more complex XML format of the ST.26 format renders the use of the new WIPO software for ST.26 sequence listings essential. From this Kat's quick trial of WIPO sequence, the software is fairly straight-forward to use (and thankfully slightly more user-friendly than PatentIn).

WIPO sequence can be used to convert old ST.25 TXT sequence listings into the new ST.26 XML format. Patent offices will not do this conversion automatically, and so this responsibility will fall to the applicant. The need to convert ST.25 sequences will be particularly relevant for divisional applications filed on or after the big bang date, given that ST.26 will apply to these cases (at least for the USPTO and EPO - see comment on the requirements for the UKIPO below). The WIPO software can also be used to check for errors in a ST.26 sequence, and to view ST.26 XML files in a human readable format within a browser.

Increased ease-of-access to patent application sequences?

The purpose of changing sequence listings to an XML format is to improve access to the global repository of sequences contained in patent applications. However, the change may in fact have a negative impact on how easy it is to view the sequences from a particular published patent application. In particular, the change from a TXT to XML format has significant implications for the human readability of sequence listings. The new XML sequence listings include all of the XML coding elements, making it almost impossible to decipher the raw XML file itself. It is necessary to upload the XML file to the WIPO sequencing software in order for the file to be viewed in a human-readable format in a browser window (given that it is the nature of XML files that simply opening the XML file in browser software does not work).

A factor impacting the accessibility of patent application sequences to third parties will be the form in which the new sequence listings are published by individual patent offices. In terms of public access, it would be preferable for the XML files themselves to be published on the relevant patent registers. A PDF copy of the XML file, e.g. included in the PCT publication, will be difficult (if not impossible) for third parties to convert into a human readable format, even if they had access to WIPO sequence.

Even if the XML files are provided on the public register, third parties will need to download the XML sequences and run them on the new software before the sequence may be extracted. Whilst most biotech patent attorneys can be expected to have the necessary software, the lay scientist may not. As such, there is a risk that the non-human readable XML format may reduce ease of access to the sequences in patent applications for non-patent experts. Of course, the sequences provided in a sequence listing are also provided throughout the description of a patent application. However, the sequence listing has, up until now, been a quick and easy method of, for example, working out what a claim to "SEQ ID 128" actually refers.

Whilst ST.26 is a global WIPO initiative, the decision of how accessible to make the new sequence listings on the public register is now in the hands of individual patent offices.

Update 28 Feb 2022: IPKat has been informed by the UKIPO that "Having considered a range of views and issues arising with the filing of divisional patent applications, we can confirm that the ST.26 sequence listing format will be required only for divisional applications arising from parent applications filed on or after 1 July 2022 (i.e. the ‘big bang’ date)."

Not just any old IPKat ...

* "Most Popular Intellectual Property Law Blawg" of all time according to Justia rankings, April 2025.

* "Most Popular Copyright Blawg" of all time according to Justia rankings, April 2025.

* "Best UK Intellectual Property blog" of all time according to FeedSpot, April 2025.

* PermaKat Eleonora Rosati has been quoted, and the IPKat has also been hyperlinked on the New York Times, April 2024.

* PermaKat Eleonora Rosati and The IPKat are expressly recommended as sources to follow to get an "unstuffy look at IP issues" according to Legal Business, April 2023.

* PermaKat Eleonora Rosati received the 2022 Adepi Award.

* PermaKat Eleonora Rosati listed as one of the World Intellectual Property Review's "Influential Women in IP" of 2020.

* PermaKat Eleonora Rosati listed as one of the Managing Intellectual Property magazine's "Fifty Most Influential People" of 2018.

* IPKat founder and Blogmeister Emeritus Jeremy Phillips listed as one of the Managing Intellectual Property magazine's "Fifty Most Influential People" of 2005, 2011, 2013, and 2014.

* Recommended by the European Patent Office as reading material for candidates for the European Qualifying Examinations, 2013.

* Listed as "Top Legal Blog" in The Times Online, March 2011.

* One of the only two non-US blogs listed in the Blawg 2010 ABA Journal 100.

* Number 1 in the 2010 Top Copyright Blog list compiled by the Copyright Litigation Blog, July 2010.

* Selected by the United States Library of Congress for inclusion in its historic collections of Internet materials related to Legal Blawgs as of 2010.

* Top Patent Blog poll 2009: 3rd out of 50 in the "Favourite Patent Blog" poll and 2nd out of 50 in the "Most-read" poll.

* ComputerWeekly IT Law and Governance Blog of the Year, 20 August 2008.

* Best of the Blogs, Times Online, 21 August 2008.

4 comments:

BioInformaticsNerdThursday, 24 February 2022 at 11:36:00 GMT
Apart from the verbosity of the new standard, which when compared to TXT files is pretty significant, there aren't really any obstacles to obtaining the sequence data.

At least one free office suite is capable of loading the XML file into a spreadsheet, and allowing the user to choose which tags to query for the desired content , i.e. the sequences themselves. Obviously, the limit to that would be the number of lines of data that the spreadsheet can support, but still, this is nonetheless fairly significant.

Alternatively, there are multiple XML reader/editors out there, some of them free, some of them integrated into a database environment (e.g. BaseX, existDB, etc) which allow parsing of the XML using XQuery to display only the desired content, and then export that to another format, e.g. CSV, Excel, etc.

I don't really see this as the problem the Kat makes this out to be, unless I've missed something.
Dr Rose HughesFriday, 25 February 2022 at 16:58:00 GMT
I hesitate to suggest that you have rather proved than disproved my point, if sequence listings are to be easily accessible to none "BioInformatics Nerds"? Whilst XML readers exist (including of course WIPO sequence), interpreting sequence listings will require more processing steps and more patent/bioinformatics expertise. My point was not that the sequences will be completely inaccessible, merely that they will require more time and expertise to understand. Your point also assumes that a txt form of the XML file will be published, and not just a PDF, and I don't believe this is yet clear.
Jim RobertsonMonday, 28 February 2022 at 15:18:00 GMT
Just to say that we've got a CIPA webinar scheduled for Monday 21st March all about WIPO ST.26, and including a speaker from the UKIPO. Details here: https://www.cipa.org.uk/events/everything-you-need-to-know-about-sequence-listings-wipo-st-26-and-the-forthcoming-big-bang/

All comments must be moderated by a member of the IPKat team before they appear on the blog. Comments will not be allowed if the contravene the IPKat policy that readers' comments should not be obscene or defamatory; they should not consist of ad hominem attacks on members of the blog team or other comment-posters and they should make a constructive contribution to the discussion of the post on which they purport to comment.

It is also the IPKat policy that comments should not be made completely anonymously, and users should use a consistent name or pseudonym (which should not itself be defamatory or obscene, or that of another real person), either in the "identity" field, or at the beginning of the comment. Current practice is to, however, allow a limited number of comments that contravene this policy, provided that the comment has a high degree of relevance and the comment chain does not become too difficult to follow.

Learn more here: http://ipkitten.blogspot.com/p/want-to-complain.html