barcode1 sequence + barcode2 sequence + barcode3 sequence + DNA fragment + barcode3 reverse complementary sequence + barcode2 reverse complementary sequence + barcode1 reverse complementary sequence

Question

How to count the number and proportion of different barcode combinations in DNA sequencing results

0

Entering edit mode

4 hours ago

tulip • 0

We used a new single-cell sequencing method for sequencing, and now we have encountered the following problems when analyzing the data. We designed three sequencing barcodes to be linked step by step. Specifically, we first linked the first barcode (containing 30 different base sequences), then linked the second barcode (containing 185 different base sequences) on this basis, and then linked the third barcode (containing 56 base sequences) on this basis. The data obtained in an ideal state should be:

barcode1 sequence + barcode2 sequence + barcode3 sequence + DNA fragment + barcode3 reverse complementary sequence + barcode2 reverse complementary sequence + barcode1 reverse complementary sequence

There will be 3018556 Barcodes combinations.

But the actual sequencing data we analyzed is:

barcode1 sequence + barcode2 sequence + barcode3 sequence + DNA fragment + barcode3 reverse complementary sequence

barcode1 sequence + barcode2 sequence + barcode3 sequence + DNA fragment + barcode3 reverse complementary sequence + barcode2 reverse complementary sequence

barcode1 sequence + DNA fragment + barcode3 reverse complementary sequence + barcode2 reverse complementary sequence + barcode1 reverse complementary sequence

Sequence from unknown source + barcode1 sequence + barcode2 sequence + barcode3 sequence + DNA fragment + barcode3 reverse complementary sequence

barcode1 sequence + DNA fragment + sequence from unknown source

Sequence with one base difference from barcode1 sequence (maybe caused by mutation?) + DNA fragment + barcode3 reverse complementary sequence + barcode2 reverse complementary sequence + barcode1 reverse complementary sequence

The above are just some examples, there are actually many combinations. In addition, all sequences must contain barcode1 sequence or the reverse complementary sequence of barcode1 sequence.

We want to know the number and proportion of different barcode combinations. This problem has troubled me for a long time. I hope you can give me some code ideas to solve this problem? Thank you very much!

data DNA anlaysis sequencing barcode • 71 views

ADD COMMENT • link updated 2 hours ago by dsull ★ 6.9k • written 4 hours ago by tulip • 0

0

Entering edit mode

Show us a few examples of reads. Since you said barcode I assume these are in-line in the sequencing read. What is the length of these barcodes and the length of total read?

This may not be answerable via a forum like biostars since this seems fairly complex and access to actual data may be needed.

ADD REPLY • link 4 hours ago by GenoMax 146k

0

Entering edit mode

I have no idea what you’re asking. “185 base sequences”? — i have no idea what that means; do you mean there are 185 possible barcodes? When you use phrases like “on this basis”, I have no clue what that is supposed to mean. What is “DNA fragment” supposed to mean?

Honestly, this just looks to me like a form of a SPLiT-seq assay. You extract three sequences from fixed positions (i.e. the positions where the barcodes should be) in your reads and then error-correct each of them to “whitelists”.

ADD REPLY • link 2 hours ago by dsull ★ 6.9k