AlphaFold, a machine learning model for predicting protein structure, is arguably one of the greatest achievements of AI so far. Whilst large language models such as ChatGPT can write poems and make pretty pictures, AlphaFold has the potential to dramatically impact the life-and-death world of drug discovery. AlphaFold represents truly ground-breaking science for which its inventors were recently awarded the Nobel Prize in Chemistry. Aside from the academic prestige, Google is also now looking to reap some return on the considerable investment ploughed into AlphaFold's development. Whilst the first versions of AlphaFold were open-source and free for use by everyone, the latest version of AlphaFold, AlphaFold3, represents a departure from this approach. In contrast to previous versions, AlphaFold3 is clearly intended as a commercial product, led by the new Google DeepMind spin-out company, Isomorphic Labs. Google DeepMind and Isomorphic Labs must now navigate the competing priorities of maintaining their academic prestige in a highly competitive field and securing valuable IP for attracting pharma industry partners.
A revolution in structural biology: From protein structure to drug-design
AlphaFold is a machine learning model for predicting the 3D structure of proteins. AlphaFold3 is purportedly even more accurate than the first iterations of AlphaFold, which were themselves head and shoulders above previous models for protein structure prediction. AlphaFold3 can also now predict the structures of biological molecules other than proteins, including nucleic acids and lipids, and protein-drug interactions. These abilities mean that AlphaFold3 will potentially be a powerful tool for in silico drug design and screening. With this new functionality, Google believes that AlphaFold3 will place it at the forefront of AI-assisted drug discovery (IPKat).
Protein structure - and Kats! |
Trade secrets were never the IP strategy for protecting AlphaFold. Google DeepMind published peer-review articles providing the full details of both the AlphaFold and AlphaFold2 architectures, as well as the underlying code. Even without the code, once the details of the AlphaFold and AlphaFold2 architectures were known, it would likely have been possible for coders in the field to reverse engineer DeepMind's approach. This is a typical problem with the competitive field of AI, especially for landmark developments such as AlphaFold (IPKat). A similar phenomenon has been seen with large language models following the release of OpenAI's ChatGPT (IPKat).
Instead of trade secrets, DeepMind pursued patents. DeepMind has filed many patent applications covering different machine learning approaches to predicting protein structure, including many aspects of the AlphaFold architecture. WO2022112248 A1, for example, relates to the use of multiple sequence alignment (MSA) and pairwise representations (core modules of AlphaFold). WO2021110730 A1 relates broadly to the use of Transformer-like mechanisms for predicting protein structure.
Nobel Prize in the bag - Time to make money?
Despite the IP filings, AlphaFold and AlphaFold2 were provided free for use by academics and industry alike. AlphaFold3, however, marks a shift in approach. The information about AlphaFold3, as well as its conditions of use, are far more restricted. Unlike its predecessors, for which the code was freely available, AlphaFold3 can only be run via Google DeepMind's API, the AlphaFold Server. The terms of use state that the AlphaFold Server is only available for non-commercial use and that users must not use the AlphaFold Server or its outputs for commercial activities or to train machine learning models similar to AlphaFold.
These restrictions represent the clear intention of Google DeepMind to commercialise the model via its spin-out Isomorphic labs. Indeed, Isomorphic has already announced drug-discovery collaborations with big pharma companies Eli Lilly and Novartis. Presumably, these collaborations include a commercial licence for the use of AlphaFold3. Eli Lilly and Novartis probably would not have been willing to pay for access to a tool that was, or would soon become, freely available for everyone else to use.
Academic prestige versus industry collaboration
Google DeepMind has come under fire from the academic field for its new commercial approach to AlphaFold. When AlphaFold2 was published, the entire underlying code was provided, allowing researchers to run the and test the code for themselves. Aligned with its new commercialisation strategy, however, Google DeepMind has not published the code for AlphaFold3. The Nature paper (Abramson et al. 2024) was instead accompanied by so-called "pseudocode", i.e. a mere description of what the code does as opposed to the code itself.
Failing to provide the code for a new architecture is not the norm in machine learning publications, and the absence of code in the AlphaFold3 paper prompted a letter to the Nature editors signed by over a 1000 scientists, arguing that the absence of the code compromised peer-view and made the broad claims of the paper impossible to test or use. The outcry prompted a response on X from Google DeepMind’s vice-president of research, Pushmeet Kohli, that the team were "Happy to also share that we're working on releasing the AF3 model (incl weights) for academic use, which doesn’t depend on our research infra, within 6 months". Nature has indicated that it will publish the code once it is released.
We have now hit the 6 month time point promised for the release of the code, but it has not yet materialised. Given Google's clear commercial intentions for AlphaFold3, the release of the entire underlying code (if this ever happens) is an IP minefield for the company.
Releasing the code for AlphaFold3 could potentially undermine Google's bargaining position in collaboration discussions with industry partners. Whilst it has been indicated that the code will only be provided for academic use, the question remains how this restriction will be enforced. Once the code has been released, how would Google know who is using the code and for what? Even if Google achieves grant of a patent covering use of the model, they will face a similar difficulty in detecting infringement. The real value of AlphaFold3 will likely be its use to design and screen new pharmaceuticals. However, any new pharmaceutical identified using AlphaFold3 will necessarily be subject to extensive lab testing before it gets anywhere near the clinic. Pharmaceutical companies will therefore not need to disclose to regulatory authorities or in patent filings that AlphaFold3 was used in the initial screen. Even leaving aside these considerations, enforcement of machine learning IP is in its infancy. The field waits to see how the big tech players such as Google will utilize their large IP estates (IPKat).
Final thoughts
Despite the current controversy, Google DeepMind's strategy of switching from an open source to a commercial model for AlphaFold has nonetheless placed it in a strong position. By freely providing the code for the first iterations of AlphaFold for everyone to use, Google ensured early and widespread adoption of its model. The availability of the code led to many third party improvements and add-ons to AlphaFold, as well as the development of fast follower models such as RoseTTAFold. All of this helped to build AlphaFold's reputation and effectively provided Google with a mountain of free evidence for the utility of its product. On the back-of the extensive testing of the earlier AlphaFold models within the academic field, the subsequent commercial embodiment is now more likely to be licenced. Importantly, the initial strategy of publishing the code for AlphaFold and AlphaFold2 in full also did not prevent Google from simultaneously filing patent applications for the technology.
Whilst failing to provide the code for AlphaFold3 has been a reputational hit for Google DeepMind, as presumably would be any attempt to enforce its IP, it is unclear how much this matters to the company at this stage. We now wait to see how its leaders will balance the competing pressures of keeping academia onside, whilst securing an attractive IP position for commercial partners in the pharma industry.
Further reading
- DeepMind: First major AI patent filings revealed (2018)
- Full speed ahead for DeepMind's AI patent applications (2019)
- IP strategy for AI-assisted drug discovery (Oct 2024)
They published the inference implementation code on the same day this IPkat article came out.
ReplyDeleteAvailable on github: https://github.com/google-deepmind/alphafold3