A new method for designing DNA barcodes better suited for large-scale sequencing studies could revolutionize biomedical research.

DNA sequencing technologies have come a long way, since the first draft of the human genome was published in 2001. Many research studies no longer focus on sequencing just one genome. Instead, researchers now choose to look at many genomes all at once, in a single experiment.

This is where DNA barcodes come in. Just as with barcodes on our grocery items, DNA barcodes allow scientists to track individual DNA samples.

But adding DNA barcodes is not exactly a perfect science. About 10% of the time, generating barcodes with existing methods will introduce errors. This limits the type and scale of experiments that can be done using DNA barcodes.

According to a study published in the June edition of PNAS1, researchers from the University of Texas have developed a new method for designing DNA barcodes. Their method reduces the error rate of DNA barcoding down to just 0.5%.

DNA barcodes

DNA barcodes are short segments of DNA often used in large-scale sequencing experiments. It’s a method that allows researchers to label each of their samples with a unique identifier.

For example, if you are looking for new drugs or inhibitors, you can set up a screen to look at up to ~108 different chemicals using DNA barcodes2. Each small molecule is given a unique barcode or a set of DNA barcodes. The best inhibitors are selected through large-scale sequencing of the attached barcodes, following the experiment.

Error-prone reading & writing

Our DNA code is made up of four different letters or bases – A, T, G and C. During the synthesis or ‘writing’ of barcodes these letters are strung together to make a long strand of DNA. You can think of DNA sequencing as a way to ‘read’ these DNA strands.

But, neither DNA synthesis nor sequencing is perfect. ‘Writing’ and ‘reading’ errors are made in every assay that uses DNA barcodes. The most common error is a single-base deletion, when one base or a letter is omitted from the barcode. Substations (inserting the wrong base) and insertions (adding an extra base) are other common errors that happened when large amounts of barcodes are generated together.

Correcting errors

One way fast to decrease the error rate is choosing barcodes that are minimally affected by errors right from the beginning of an experiment. But, on the flip side, this would mean many potential barcodes will have to be discarded from the get go.

This was the conundrum the researchers addressed with their new FREE (filled/truncated right end edit) barcodes, using a method called sphere packing.

Sphere packing looks at all the possible erroneous barcodes you could make, when you introduce one or two errors into the original barcode. For example, if your DNA barcode is the word AAA, then AAC is one possible error.

By using an algorithm to generate all possible erroneous barcodes, they were able to pinpoint the original barcode, before the error took place.

Huge implications for biomedical research

Researchers anticipate the FREE methods will: (i) reduce the amount of discarded data, (ii) help avoid inaccuracies in results, and (iii) increase the resolution of sequencing assays.

Alternative error-correcting methods end up throwing away up to 100 times as many barcodes, compared to the FREE barcode method. Existing data analysis techniques are also much slower (up to 1, 000 times) at decoding the data, which made experiments that needed large numbers of barcodes (in the range of millions) nearly impossible before now.

As other researchers test out the improved accuracy and efficiency of FREE DNA barcodes, we will get a better sense of exactly how this method might revolutionaries biomedical research.

References

  1. Indel-correcting DNA barcodes for high-throughput sequencing. PNAS (2018).
  2. Encoded Library Synthesis Using Chemical Ligation and the Discovery of sEH Inhibitors from a 334-Million Member Library. Nature (2015).