Question

Pooled sequencing DNA barcode space and deconvolution

1

Entering edit mode

4.9 years ago

vinaykusuma ▴ 10

Hello,

I was reading through https://www.biorxiv.org/content/10.1101/2020.04.06.025635v1.full.pdf

which happens to be a pooled sequencing method using barcodes to test about 10000 covid samples at one go.

I came across compressed DNA barcoding space and DNA barcoding deconvolution for the first time and need some help understanding it.

Although, I searched about it on internet i couldn't find any information on it.

I will highly appreciate if someone can help me with explanation.

Thanks.

genome sequence next-gen gene barcode • 1.7k views

ADD COMMENT • link updated 4.9 years ago by sysboolean ▴ 90 • written 4.9 years ago by vinaykusuma ▴ 10

score 3 · Accepted Answer · 2020-05-17

The general idea behind pooled sequencing is that we sequence N samples with X barcodes where X << N.

A major cost in NGS is uniquely barcoding each sample. For each unique barcode, a primer of ~ 60 - 90 bases (depending on design) needs to be synthesized and purified, and roughly adds a cost of ~ $1 - 2 per sample. For sequencing a large number of samples at a time to make full use of the sequencing capacity, say 10,000 samples per day like in Covid-19 testing, we need to uniquely barcode each sample so that we can identify each sample post sequencing. So now you can see the problem in terms of cost. Ordering 10,000 barcoded primers is going to cost several hundred thousand dollars and managing the workflow is going to be a non-trivial.

However, if you have add multiple barcodes to each sample, now you can uniquely tag each sample with a smaller set of barcodes. For example, with 10 barcodes and uniquely adding 5 barcodes to each sample, you can individually barcode ~ 30,000 samples (use permutation formula as barcode order also matters n! / (n-r)! ; n = 10, r = 5). Now you have drastically reduced the cost of ordering barcode primers. Sure, you are using more of each barcode primer but ordering a few barcode oligos in bulk is cheaper and makes managing the workflow easier.

Sample 1 gets barcodes B1,B2,B3,B4,B5

Sample 2 gets barcodes B1,B2,B3,B4,B6 and so on.

There are other ways to find the identities of N samples using X barcodes/NGS libraries where X << N, but here, we make pools of samples so that each sample is distributed over a unique set of pools and after sequencing the pool, we solve the sample IDs based on the pools in which the samples occurred. See the following papers for some simple examples.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6134198/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5109470/