Deep learning to predict the lab-of-origin of engineered DNA
Genetic engineering projects are rapidly growing in scale and complexity, driven by new tools to design and construct DNA. There is increasing concern that widened access to these technologies could lead to attempts to construct cells for malicious intent, illegal drug production, or to steal intell...
Saved in:
Published in | Nature communications Vol. 9; no. 1; pp. 3135 - 10 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
London
Nature Publishing Group UK
07.08.2018
Nature Publishing Group Nature Portfolio |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Genetic engineering projects are rapidly growing in scale and complexity, driven by new tools to design and construct DNA. There is increasing concern that widened access to these technologies could lead to attempts to construct cells for malicious intent, illegal drug production, or to steal intellectual property. Determining the origin of a DNA sequence is difficult and time-consuming. Here deep learning is applied to predict the lab-of-origin of a DNA sequence. A convolutional neural network was trained on the Addgene plasmid dataset that contained 42,364 engineered DNA sequences from 2230 labs as of February 2016. The network correctly identifies the source lab 48% of the time and 70% it appears in the top 10 predicted labs. Often, there is not a single “smoking gun” that affiliates a DNA sequence with a lab. Rather, it is a combination of design choices that are individually common but collectively reveal the designer.
The synthetic biology era has seen a rapidly growing number of engineered DNA sequences. Here, the authors develop a deep learning method to predict the lab-of-origin of a DNA sequence based on hidden design signatures. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2041-1723 2041-1723 |
DOI: | 10.1038/s41467-018-05378-z |