Inline barcodes in the reverse reads
0
0
Entering edit mode
8.1 years ago
Picasa ▴ 650

Hi,

I have a sample of PE reads that I want to demultiplex. For this I used fastq-multx.

So for instance, my barcode is XXXX

And my Forward raw reads : XXXXCCTTGGGCATGATGGTGACGCGCTTGGCGTGGATGGCGCACAGGTTGGTGTCCTCGAACAGGCCGACCAGGTAGGCCTCGCTGGCCTCCTGCAG

After fastq-multx, this read has been correctly assigned and trimmed:

CCTTGGGCATGATGGTGACGCGCTTGGCGTGGATGGCGCACAGGTTGGTGTCCTCGAACAGGCCGACCAGGTAGGCCTCGCTGGCCTCCTGCAG

However, my Reverse read can be different.. Either I saw:

  • No barcode in the reverse read
  • Barcode (reverse complemented) in the 5' part: XXXXATGGCTCGTACCAAGCAGACCGCCCGCAAGT
  • Barcode (reverse complemented) within the R reads: ATGGCTCGTACCAAGCAGACCXXXXCGGAGGCAAGGCTCCCCGC

I'm not sure what I have to do with. Should I keep only the PE reads with the one that don't have barcode in the reverse reads ?

barcodes • 3.3k views
ADD COMMENT
0
Entering edit mode

How was the data generated? What is the cause that the barcode can end up everywhere (or not) in the reverse read?

ADD REPLY
0
Entering edit mode

It's an amplicon sequencing with custom barcodes.

I don't know how the barcode can be found in the Reverse read.

Just a precision that I forgot to mention: the barcode in the reverse read is the reverse complemented of XXXX

What is the "normal" process ? should the barcode be only found in the forward read ?

ADD REPLY
0
Entering edit mode

That depends on the library prep. How was the library created? When/how were the barcodes attached? Without proper understanding of the experimental procedure we can't get this right.

I assume this is about the same data as in Confusion about barcodes and removal

ADD REPLY
0
Entering edit mode

Yes this is the same dataset.

The procotol is based on:

https://www.ncbi.nlm.nih.gov/pubmed/20516186

ADD REPLY
1
Entering edit mode

In that protocol I found the following (page4, figure 1):

Ligation is nondirectional and also produces molecules which have the same adapters attached to both ends (not depicted). Such molecules do not interfere with sequencing and—due to the formation of hairpin structures—amplify very poorly during indexing PCR.

So that explains why you have some fragments with barcodes on both sides. Essentially you should only have a barcode on one end. Question now is how frequent you saw the barcode in the reverse read.

Based on your explanation your barcode is only 4 characters long, so that means it can also be present by chance in the read, therefore you need to look for its expected context: the illumina P7 sequence.

ADD REPLY
0
Entering edit mode

The XXXX was just an example to simplify. In fact, the length is 7pb.

So if I grep the reverse complement of the barcode in the Reverse read, I find 75695/118664 which correspond to 64%.

Maybe should I keep the PE with

  • No barcode in the reverse read
  • Barcode (reverse complemented) in the 5' part

And I discard the :

  • Barcode (reverse complemented) within the R reads:

?

ADD REPLY
0
Entering edit mode

Are the barcodes at the beginning of the read in your grep (if that is where they are supposed to be)? As @Wouter already said you should find the barcode only one time but it can be at either end.

ADD REPLY
0
Entering edit mode

So there is 39066/118664 (33%) reverse reads that have the reverse complemented barcode in it's 5'.

And so 36629/118664 (31%) reverse reads that have the reverse complemented barcode somewhere in the read.

So if I understand, I should discard all the PE that have the reverse complemented barcode (at the beginning or middle) in it's reverse reads ?

ADD REPLY
0
Entering edit mode

The adapters are ligated using blunt end ligation and as such it's not impossible that fragments end up with two barcodes. However, if I'm not mistaken these shouldn't get sequenced since they contain the same adapter on both sides and therefore won't get amplified by bridge amplification. The barcode should always be at the P7 side of the amplicon so I would suggest OP to look for that sequence.

ADD REPLY
0
Entering edit mode

I just noticed the protocol you shared doesn't use inline barcodes.

ADD REPLY
0
Entering edit mode

It was based on that paper but has been modified lightly.

ADD REPLY
1
Entering edit mode

Then you might want to @#$%'ing consider telling us what you modified instead of having us take guesses to what you have been doing. Really, provide this information upfront because this is a waste of time. The past hour this thread has only been about the experimental procedure and we haven't started yet on the barcode processing. You made us look through protocols and now we have to find out that you modified the protocol - on a vital point apparently. This topic and the previous is quite a pain in the elbow to get a good understanding of what your question really is about.

ADD REPLY
1
Entering edit mode

If it is using inline barcodes then that is not a light modification.

I think you have enough information already to find the right solution.

ADD REPLY
0
Entering edit mode

For future reference, please do not post links to sites behind a paywall - not everyone has access. It's better to copy/paste the relevant information in your post.

ADD REPLY
0
Entering edit mode

For those who don't have access here is a dropbox link https://www.dropbox.com/s/v3xrola70fhzwyk/meyer2010.pdf?dl=0 I ehm perfectly ahum legal obtained that file cough and share this totally anonymously.

ADD REPLY
0
Entering edit mode

And you expect us to click on a dropbox link for a file that is anonymously shared :)

ADD REPLY
0
Entering edit mode

I do not necessarily "expect" that, I provide the opportunity. It's up to you to gamble whether it will be save or not ;) And there is always http://sci-hub.cc/ for those who want to obtain the paper the same way.

ADD REPLY
1
Entering edit mode

Thanks @WouterDeCoster, but I know how to access the reference. I was trying to encourage better behavior by the OP.

ADD REPLY

Login before adding your answer.

Traffic: 2319 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6