Journal:Next steps for access to safe, secure DNA synthesis

Full article title	Next steps for access to safe, secure DNA synthesis
Journal	Frontiers in Bioengineering and Biotechnology
Author(s)	Diggans, James; Leproust, Emily
Author affiliation(s)	Twist Bioscience Corporation
Primary contact	Email: jdiggans at twistbioscience dot com
Editors	Morse, Stephen Allen
Year published	2019
Volume and issue	7
Page(s)	86
DOI	10.3389/fbioe.2019.00086
ISSN	2296-4185
Distribution license	Creative Commons Attribution 4.0 International
Website	https://www.frontiersin.org/articles/10.3389/fbioe.2019.00086/full
Download	https://www.frontiersin.org/articles/10.3389/fbioe.2019.00086/pdf (PDF)

This article should not be considered complete until this message box has been removed. This is a work in progress.

Abstract

The DNA synthesis industry has, since the invention of gene-length synthesis, worked proactively to ensure synthesis is carried out securely and safely. Informed by guidance from the U.S. government, several of these companies have collaborated over the last decade to produce a set of best practices for customer and sequence screening prior to manufacture. Taken together, these practices ensure that synthetic DNA is used to advance research that is designed and intended for public benefit. With increasing scale in the industry and expanding capability in the synthetic biology toolset, it is worth revisiting current practices to evaluate additional measures to ensure the continued safety and wide availability of DNA synthesis. Here we encourage specific steps, in part derived from successes in the cybersecurity community, that can ensure synthesis screening systems stay well ahead of emerging challenges, to continue to enable responsible research advances. Gene synthesis companies, science and technology funders, policymakers, and the scientific community as a whole have a shared duty to continue to minimize risk and maximize the safety and security of DNA synthesis to further power world-changing developments in advanced biological manufacturing, agriculture, drug development, healthcare, and energy.

Keywords: biosecurity, synthetic biology, DNA, cyberbiosecurity, policy

Introduction

In 2010, the United States Department of Health and Human Services (HHS) published the Screening Framework Guidance for Providers of Synthetic Double-Stranded DNA.^[1] The Guidance provided a set of recommended practices to companies synthesizing double-stranded DNA to encourage such companies to screen both their customers and requested sequences. Several of the largest DNA synthesis companies came together to form the International Gene Synthesis Consortium (IGSC), a trade industry organization intended to promote the beneficial application of gene synthesis technology while safeguarding biosecurity.

The IGSC published the Harmonized Screening Protocol^[2] to provide additional tactical detail around the implementation of guidance-compliant customer and sequence screening. The IGSC guidance specifies that synthetic gene sequence orders will be screened against the IGSC's Regulated Pathogen Database (RPD), a dataset of sequences and organisms subject to regulatory control or licensing that is assembled and maintained by the IGSC. The guidance further specifies that IGSC companies will only supply genes from regulated pathogens to “bona fide government laboratories, universities, non-profit research institutions, or industrial laboratories demonstrably engaged in legitimate research.” Since its initial publication, the Harmonized Screening Protocol has been updated only once^[3] to (among other minor edits) add language affirming that IGSC member companies agree not to synthesize any sequence with “best match” to Variola, the virus that causes smallpox, as the disease was declared eradicated by the WHO in 1980. Additionally, the IGSC has also developed an extensive onboarding process for potential new members to assist companies and institutions as they build new screening systems.

In the years since the publication of the guidance, both the DNA synthesis industry and the larger synthetic biology community have rapidly advanced in terms of capability and scale. These advances create new opportunities to revolutionize many industries, from healthcare to industrial chemicals and even digital data storage. With new capabilities come new challenges to the recommendations originally spelled out in the guidance. As the trajectory of technological advancement will inevitably continue to steepen, here we visit potential options for next steps to advance and continue to secure the manufacture of synthetic DNA and prevent the risk of misuse.

Twist Bioscience (a member company and officer of the IGSC) has witnessed first-hand how challenging some of the guidance recommendations can become at increasing scale. Those difficulties must be surmounted while maintaining customer and sequence screening accuracy and still achieving the tight delivery timelines demanded by fierce competition within the global DNA synthesis industry.

As scale drives down cost per base pair, the relatively fixed cost of screening plays a more direct role in overall price. These costs are driven by both customer and sequence screening; commercially-available customer screening solutions still require a great deal of manual review of false positive findings. These false positives create a floor on the possible reduction in labor cost of new customer onboarding. Current sequence screening algorithms are computationally expensive and, given the high false positive rate, the results of sequence screening can be complicated to interpret. These generally require a PhD in bioinformatics both for implementation as well as day-to-day interpretation of hits. This makes scaling interpretation, in the absence of high-quality sequence annotation, an expensive proposition.

Evolving technologies have blurred the lines between the gene- and oligo-length synthesis products originally addressed in the guidance. These include ever-simpler methods for the assembly of pools of oligo-length DNA into gene-length DNA and the use of truly massive oligo pools for data storage. The data storage use case, in particular, will drive a substantial global increase in the number of unique oligo sequences under manufacture, making it ever easier to acquire the oligo-length sequences necessary to assemble genes that would otherwise be subject to regulatory control.

Evolving industry best practices

We believe continued forward-thinking improvements in the biosecurity safety net provided by DNA synthesis order screening will require participation from all interested parties: synthesis companies themselves, policy makers, science and technology funders (both public and private), and the broader synthetic biology community.

Gene-length sequence screening performance

The guidance found in the Harmonized Screening Protocol and the work done by IGSC have together accomplished a great deal in harmonizing the screening practices of the largest synthesis companies. The current IGSC onboarding protocol for new members even includes a set of test sequences to ensure that prospective member institutions have built their custom sequence screening systems with a solid level of accuracy. It is challenging, however, to determine when a custom-built screening system is “good enough,” especially given that the details of each screening implementation remain private to the implementing company. In addition, the recommendations in the guidance do not specify particular performance metrics in terms of overall sensitivity and specificity, or in terms of the degree to which sequence alteration or the source of annotation should impact screening results.

This is no fault of the guidance; it is extremely difficult to express in the abstract a set of performance characteristics for a system intended to screen the universe of all possible sequences. The cybersecurity and defense communities, facing similar challenges of performance estimation for complex systems, have turned to "red teaming"—the practice of looking at a situation from the perspective of disinterested or antagonistic parties—as a way of answering whether a given system is sufficient to accomplish a protective goal.^[4] The best way to estimate whether a skilled adversary can bypass a system is to ask skilled individuals to attempt to do just that. Previous recommendations^[5] have explicitly called for IGSC companies to regularly test procedures or submit to third-party audits; we believe regular red teaming by a sophisticated third party is an effective means to address these concerns. Twist has recently engaged in an extensive red teaming of our sequence screening system (publication in review) and shared the results with other IGSC members to help further improve our respective systems. We strongly recommend that synthesis companies engage in periodic red teaming as a means of assessing evolving risk of vulnerabilities in screening systems.

Red teaming has additional secondary value: sequences shown to bypass a screening system then serve as effective regression tests during follow-on software development once vulnerabilities have been patched. Regression testing is a software testing paradigm^[6] designed to ensure that future changes to software systems do not create new ways for previously discovered vulnerabilities to be exploited. Building and scaling a modern sequence screening system is a complex undertaking and requires using distributed computing and third-party annotation resources, both of which increase the risk of regressions during software development and maintenance. Consistent regression testing, along with a suite of edge-case test sequences, can help manage this risk.

Screening oligo-length sequences

The 2010 guidance set a lower bound of 200 nucleotides on the length of sequence with “best match” to organisms appearing on any of the various regulatory control lists. This was intended to strike a balance between ensuring safe manufacture of gene-length sequences while also avoiding the burden of screening for manufacturers of shorter DNA sequences. In the intervening years, however, capacity for generating enormous, diverse pools of oligo-length sequences has grown^[7], while lower-cost methods for assembling high-quality, gene-length sequences from oligo pools have been developed and matured.^[8] Together, these two factors create a potential vulnerability: what would be considered controlled for gene-length synthesis under current regulatory and technical systems would be permitted for synthesis as an oligo pool and could be converted into a gene length sequence by assembly in a modestly equipped molecular biology laboratory.

Research funding priorities

Research funding by governments and other institutions can play a powerful role in making customer and sequence screening easier to build or acquire and more efficient (and therefore less costly to operate) while increasing the accuracy of risk estimation.

Predicting risk in context

The guidance and all current sequence screening implementations focus on determining whether a given sequence is a “best match” to an entry on a list of organisms subject to regulatory control. These lists include the U.S. Federal Select Agent Program (FSAP) and the Australia Group treaty for harmonized export control. Such lists of organisms, in the context of sequence screening, are generally proxies for a broader goal: determining whether a given ordered sequence could be used to cause significant harm.

For a regulatory control regime to focus on this much more salient challenge, we must move beyond lists of known pathogens and instead focus on the biological context and known “routes to harm.” These can be as simple as a single protein (e.g., in the case of ricin) or as complex as the potentially hundreds of genes required for a bacterial pathogen (e.g., the genes required by Francisella tularensis to cause tularemia). This annotation requires a committed, ongoing effort to catalog, in detail, the ways in which proteins and genetic networks can be used to cause harm in contexts subject to regulatory control. The knowledge of these mechanisms and the genes they require is highly specialized and diffuse across academic, government, and industrial experts. We recognize the assembly of this knowledge in a single, shared location to be both incredibly important and incredibly challenging.

Sustained funding and commitment will be required to build and maintain a database of risk-associated sequences, their known mechanisms of pathogenicity, and the biological contexts in which these mechanisms can cause harm. This database (or at a minimum a screening capability making use of this database)—to have maximum impact on global DNA synthesis screening—must be available to both domestic and international providers. Arguments have previously been made that such a collection would make misuse of biology easier for bad actors. Modern deep learning methods, while powerfully predictive, often require enormous amounts of high-quality, curated training and specialized statistical expertise to make accurate predictions on complex outcomes. Allowing access only to synthesis companies or others with a “need to know” establishes a threshold for who can work on these challenges and limits the degree of global creativity that can be applied to the challenge of predicting biological outcomes from collections of primary sequence. We believe the value provided by the collection and public dissemination of this information, in terms of empowering machine learning and other risk estimation efforts, far outweighs any increased potential for attempted misuse.

We have two excellent examples of this approach in the cybersecurity community: the Common Vulnerabilities and Exposures (CVE) database^[9] and the National Vulnerability Database (NVD).^[10] Both CVE and NVD publicly catalog known vulnerabilities and code exploiting those vulnerabilities. These data are used to build ever-more-capable intrusion detection systems and to inform software development practices to avoid creation of new vulnerabilities. We believe this same paradigm would work well in a biological context.

As these databases grow, additional investment in statistical methods for risk estimation will result in approaches with increasing accuracy in predicting harm. These systems should move from predicting risk on primary DNA sequences to include predicting possible harmful outcomes from genetic circuit designs or even from engineered microbial communities. The Intelligence Advanced Research Projects Agency (IARPA) is funding early work in this area via its Functional Genomic and Computational Assessment of Threats (FunGCAT) program.^[11] We strongly encourage funding of complementary and follow-on approaches.

The metaphorical similarity to the cybersecurity domain is not, admittedly, perfect. Patching software vulnerabilities is far easier and less expensive than “patching” biological vulnerabilities via vaccines or novel medical countermeasures. This does not mean, however, that simply enumerating the genes required for a particular “route to harm” is sufficient information to enable bad actors; a flat list of genes involved in a pathogenic outcome is not a recipe. Furthermore, there are large-scale efforts underway, including the DARPA Pandemic Prevention Platform (P3) program^[12], to enable just this sort of rapid response to novel pathogens. We maintain that the upside of providing this level of detail—low-cost, uniformly accurate, peer-reviewed sequence screening—more than offsets any potential for additional information hazard.

Acknowledgements

Author contributions

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

Funding

This work was funded by Twist Bioscience Corporation.

Conflict of interest statement

JD and EL are employed by Twist Bioscience. Twist Bioscience is a board member of the International Gene Synthesis Consortium (IGSC). The views expressed here are not necessarily those of the IGSC.

References

↑ U.S. Department of Health & Human Services (4 May 2015). "Screening Framework Guidance for Providers of Synthetic Double-Stranded DNA". https://www.phe.gov/Preparedness/legal/guidance/syndna/Pages/default.aspx.
↑ International Gene Synthesis Consortium (2009). "Harmonized Screening Protocol" (PDF). https://portal.sgidna.com/files/IGSC%20Harmonized%20Screening%20Protocol.pdf.
↑ International Gene Synthesis Consortium (19 November 2017). "Harmonized Screening Protocol v2.0" (PDF). https://genesynthesisconsortium.org/wp-content/uploads/IGSCHarmonizedProtocol11-21-17.pdf.
↑ Zhang, L.; Gronvall, G.K. (2018). "Red Teaming the Biological Sciences for Deliberate Threats". Terrorism and Political Violence: 1–20. doi:10.1080/09546553.2018.1457527.
↑ Koblentz, G.D. (2017). "The De Novo Synthesis of Horsepox Virus: Implications for Biosecurity and Recommendations for Preventing the Reemergence of Smallpox". Health Security 15 (6): 620–28. doi:10.1089/hs.2017.0061. PMID 28836863.
↑ Yoo, S.; Harman, M. (2013). "Regression testing minimization, selection and prioritization: A survey". Journal of Software: Testing, Verification and Reliability 22 (2): 67–120. doi:10.1002/stvr.430.
↑ Organick, L.; Ang, S.D.; Chen, Y.-J. et al. (2017). "Scaling up DNA data storage and random access retrieval". BioRxiv. doi:10.1101/114553.
↑ Plesa, C.; Sidore, A.M.; Lubock, N.B. et al. (2018). "Multiplexed gene synthesis in emulsions for exploring protein functional landscapes". Science 359 (6373): 343–47. doi:10.1126/science.aao5167. PMC PMC6261299. PMID 29301959. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6261299.
↑ The MITRE Corporation (2019). "Common Vulnerabilities and Exposures". https://cve.mitre.org/. Retrieved 07 January 2019.
↑ National Institute of Standards and Technology (2019). "National Vulnerabilities Database". https://nvd.nist.gov/. Retrieved 07 January 2019.
↑ IARPA (2016). "Functional Genomic and Computational Assessment of Threats (Fun GCAT)". https://www.iarpa.gov/index.php/research-programs/fun-gcat.
↑ Jenkins, A. (2017). "Pandemic Prevention Platform (P3)". Defense Advanced Research Projects Agency. https://www.darpa.mil/program/pandemic-prevention-platform.

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. The original article listed references alphabetically; this version, by design, lists them in order of appearance.

[HHSScreen15-1] U.S. Department of Health & Human Services (4 May 2015). "Screening Framework Guidance for Providers of Synthetic Double-Stranded DNA". https://www.phe.gov/Preparedness/legal/guidance/syndna/Pages/default.aspx.

[IGSCHarmon09-2] International Gene Synthesis Consortium (2009). "Harmonized Screening Protocol" (PDF). https://portal.sgidna.com/files/IGSC%20Harmonized%20Screening%20Protocol.pdf.

[IGSCHarmon17-3] International Gene Synthesis Consortium (19 November 2017). "Harmonized Screening Protocol v2.0" (PDF). https://genesynthesisconsortium.org/wp-content/uploads/IGSCHarmonizedProtocol11-21-17.pdf.

[ZhangRed18-4] Zhang, L.; Gronvall, G.K. (2018). "Red Teaming the Biological Sciences for Deliberate Threats". Terrorism and Political Violence: 1–20. doi:10.1080/09546553.2018.1457527.

[KoblentzTheDeNovo17-5] Koblentz, G.D. (2017). "The De Novo Synthesis of Horsepox Virus: Implications for Biosecurity and Recommendations for Preventing the Reemergence of Smallpox". Health Security 15 (6): 620–28. doi:10.1089/hs.2017.0061. PMID 28836863.

[YooRegress12-6] Yoo, S.; Harman, M. (2013). "Regression testing minimization, selection and prioritization: A survey". Journal of Software: Testing, Verification and Reliability 22 (2): 67–120. doi:10.1002/stvr.430.

[OrganickScaling17-7] Organick, L.; Ang, S.D.; Chen, Y.-J. et al. (2017). "Scaling up DNA data storage and random access retrieval". BioRxiv. doi:10.1101/114553.

[PlesaMulti18-8] Plesa, C.; Sidore, A.M.; Lubock, N.B. et al. (2018). "Multiplexed gene synthesis in emulsions for exploring protein functional landscapes". Science 359 (6373): 343–47. doi:10.1126/science.aao5167. PMC PMC6261299. PMID 29301959. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6261299.

[MITRECommon19-9] The MITRE Corporation (2019). "Common Vulnerabilities and Exposures". https://cve.mitre.org/. Retrieved 07 January 2019.

[NISTNational19-10] National Institute of Standards and Technology (2019). "National Vulnerabilities Database". https://nvd.nist.gov/. Retrieved 07 January 2019.

[IARPAFunct16-11] IARPA (2016). "Functional Genomic and Computational Assessment of Threats (Fun GCAT)". https://www.iarpa.gov/index.php/research-programs/fun-gcat.

[JenkinsPandemic17-12] Jenkins, A. (2017). "Pandemic Prevention Platform (P3)". Defense Advanced Research Projects Agency. https://www.darpa.mil/program/pandemic-prevention-platform.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Journal:Next steps for access to safe, secure DNA synthesis

Contents

Abstract

Introduction

Evolving industry best practices

Gene-length sequence screening performance

Screening oligo-length sequences

Research funding priorities

Predicting risk in context

Acknowledgements

Author contributions

Funding

Conflict of interest statement

References

Notes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

Popular publications

Print/export