How to count the number and proportion of different barcode combinations in DNA sequencing results
0
0
Entering edit mode
8 weeks ago
tulip • 0

We used a new single-cell sequencing method for sequencing, and now we have encountered the following problems when analyzing the data. We designed three sequencing barcodes to be linked step by step. Specifically, we first linked the first barcode (containing 30 different base sequences), then linked the second barcode (containing 185 different base sequences) on this basis, and then linked the third barcode (containing 56 base sequences) on this basis.

The data obtained in an ideal state should be:

barcode1 sequence + barcode2 sequence + barcode3 sequence + DNA fragment + barcode3 reverse complementary sequence + barcode2 reverse complementary sequence + barcode1 reverse complementary sequence


There will be 3018556 Barcodes combinations.

But the actual sequencing data we analyzed is:


  • barcode1 sequence + barcode2 sequence + barcode3 sequence + DNA fragment + barcode3 reverse complementary sequence
  • barcode1 sequence + barcode2 sequence + barcode3 sequence + DNA fragment + barcode3 reverse complementary sequence + barcode2 reverse complementary sequence
  • barcode1 sequence + DNA fragment + barcode3 reverse complementary sequence + barcode2 reverse complementary sequence + barcode1 reverse complementary sequence
  • Sequence from unknown source + barcode1 sequence + barcode2 sequence + barcode3 sequence + DNA fragment + barcode3 reverse complementary sequence
  • barcode1 sequence + DNA fragment + sequence from unknown source
  • Sequence with one base difference from barcode1 sequence (maybe caused by mutation?) + DNA fragment + barcode3 reverse complementary sequence + barcode2 reverse complementary sequence + barcode1 reverse complementary sequence

The above are just some examples, there are actually many combinations. In addition, all sequences must contain barcode1 sequence or the reverse complementary sequence of barcode1 sequence.

We want to know the number and proportion of different barcode combinations. This problem has troubled me for a long time. I hope you can give me some code ideas to solve this problem? Thank you very much!

DNA-sequencing barcode • 557 views
ADD COMMENT
0
Entering edit mode

Show us a few examples of reads. Since you said barcode I assume these are in-line in the sequencing read. What is the length of these barcodes and the length of total read?

This may not be answerable via a forum like biostars since this seems fairly complex and access to actual data may be needed.

ADD REPLY
0
Entering edit mode

I have no idea what you’re asking. “185 base sequences”? — I have no idea what that means; do you mean there are 185 possible barcodes? When you use phrases like “on this basis”, I have no clue what that is supposed to mean. What is “DNA fragment” supposed to mean?

Honestly, this just looks to me like a form of a SPLiT-seq assay. You extract three sequences from fixed positions (i.e. the positions where the barcodes should be) in your reads and then error-correct each of them to “whitelists”.

ADD REPLY
0
Entering edit mode

Cross-posted on bioinfo SE: https://bioinformatics.stackexchange.com/questions/22944/how-to-count-the-number-and-proportion-of-different-barcode-combinations-in-dna

tulip Please keep in mind that posting the same question to multiple sites can be perceived as bad etiquette, because efforts may be made to address a problem that has already been solved elsewhere in the meantime.

The helpful thing to do if you do decide to post on multiple forums is to add a link to the other forum posts on each post so people will look at the other posts before investing their effort.

ADD REPLY

Login before adding your answer.

Traffic: 1395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6