More Reliable DNA Barcodes

A new method for designing DNA barcodes better suited for large-scale sequencing studies could revolutionize biomedical research.

DNA sequencing technologies have come a long way, since the first draft of the human genome was published in 2001. Many research studies no longer focus on sequencing just one genome. Instead, researchers now choose to look at many genomes all at once, in a single experiment.

This is where DNA barcodes come in. Just as with barcodes on our grocery items, DNA barcodes allow scientists to track individual DNA samples.

But adding DNA barcodes is not exactly a perfect science. About 10% of the time, generating barcodes with existing methods will introduce errors. This limits the type and scale of experiments that can be done using DNA barcodes.

According to a study published in the June edition of PNAS1, researchers from the University of Texas have developed a new method for designing DNA barcodes. Their method reduces the error rate of DNA barcoding down to just 0.5%.

DNA barcodes

DNA barcodes are short segments of DNA often used in large-scale sequencing experiments. It’s a method that allows researchers to label each of their samples with a unique identifier.

For example, if you are looking for new drugs or inhibitors, you can set up a screen to look at up to ~108 different chemicals using DNA barcodes2. Each small molecule is given a unique barcode or a set of DNA barcodes. The best inhibitors are selected through large-scale sequencing of the attached barcodes, following the experiment.

Error-prone reading & writing

Our DNA code is made up of four different letters or bases – A, T, G and C. During the synthesis or ‘writing’ of barcodes these letters are strung together to make a long strand of DNA. You can think of DNA sequencing as a way to ‘read’ these DNA strands.

But, neither DNA synthesis nor sequencing is perfect. ‘Writing’ and ‘reading’ errors are made in every assay that uses DNA barcodes. The most common error is a single-base deletion, when one base or a letter is omitted from the barcode. Substations (inserting the wrong base) and insertions (adding an extra base) are other common errors that happened when large amounts of barcodes are generated together.

Correcting errors

One way fast to decrease the error rate is choosing barcodes that are minimally affected by errors right from the beginning of an experiment. But, on the flip side, this would mean many potential barcodes will have to be discarded from the get go.

This was the conundrum the researchers addressed with their new FREE (filled/truncated right end edit) barcodes, using a method called sphere packing.

Sphere packing looks at all the possible erroneous barcodes you could make, when you introduce one or two errors into the original barcode. For example, if your DNA barcode is the word AAA, then AAC is one possible error.

By using an algorithm to generate all possible erroneous barcodes, they were able to pinpoint the original barcode, before the error took place.

Huge implications for biomedical research

Researchers anticipate the FREE methods will: (i) reduce the amount of discarded data, (ii) help avoid inaccuracies in results, and (iii) increase the resolution of sequencing assays.

Alternative error-correcting methods end up throwing away up to 100 times as many barcodes, compared to the FREE barcode method. Existing data analysis techniques are also much slower (up to 1, 000 times) at decoding the data, which made experiments that needed large numbers of barcodes (in the range of millions) nearly impossible before now.

As other researchers test out the improved accuracy and efficiency of FREE DNA barcodes, we will get a better sense of exactly how this method might revolutionaries biomedical research.

References

  1. Indelcorrecting DNA barcodes for high-throughput sequencing. PNAS (2018).
  2. Encoded Library Synthesis Using Chemical Ligation and the Discovery of sEH Inhibitors from a 334-Million Member Library. Nature (2015).

Natural shorts sleepers: How much sleep do you need?

Researchers have uncovered multiple genes associated with needing less than 6 hours of sleep at night. Sleep is universal. It is an important component of the overall well-being of a person. Sleep provides our bodies...

Alzheimer’s disease risk can be moderated, a new study finds

Researchers identify 21 modifiable risk factors for reducing the risk of developing Alzheimer’s disease. The APOE gene is one of the most-significant genetic risk factors associated with late-onset Alzheimer’s disease. But many people, even...

Latest Posts

Natural shorts sleepers: How much sleep do you need?

Researchers have uncovered multiple genes associated with needing less than 6 hours of sleep at night. Sleep is universal. It is an important component of...

Alzheimer’s disease risk can be moderated, a new study finds

Researchers identify 21 modifiable risk factors for reducing the risk of developing Alzheimer’s disease. The APOE gene is one of the most-significant genetic risk...

The COVID-19 susceptibility DNA test from Genovate

New COVID-19 cases continue to emerge worldwide, making it clear that this pandemic is far from over. Genovate is here to give you easy...

Study finds a link between the severity of COVID-19 infections and blood type

Researchers link genetic changes in the region of DNA that define blood type with susceptibility to COVID-19 infections. Blood type O offers a protective...

ACE2: Genetic clues to COVID-19 infections and treatment

Scientists identify a possible link between genetic variations in the ACE-2 receptor used by coronaviruses to enter cells and susceptibility to COVID-19 infections. Could...