Entering edit mode
8.1 years ago
Picasa
▴
650
Hi,
I have a sample of PE reads that I want to demultiplex. For this I used fastq-multx.
So for instance, my barcode is XXXX
And my Forward raw reads : XXXXCCTTGGGCATGATGGTGACGCGCTTGGCGTGGATGGCGCACAGGTTGGTGTCCTCGAACAGGCCGACCAGGTAGGCCTCGCTGGCCTCCTGCAG
After fastq-multx, this read has been correctly assigned and trimmed:
CCTTGGGCATGATGGTGACGCGCTTGGCGTGGATGGCGCACAGGTTGGTGTCCTCGAACAGGCCGACCAGGTAGGCCTCGCTGGCCTCCTGCAG
However, my Reverse read can be different.. Either I saw:
- No barcode in the reverse read
- Barcode (reverse complemented) in the 5' part:
XXXXATGGCTCGTACCAAGCAGACCGCCCGCAAGT
- Barcode (reverse complemented) within the R reads:
ATGGCTCGTACCAAGCAGACCXXXXCGGAGGCAAGGCTCCCCGC
I'm not sure what I have to do with. Should I keep only the PE reads with the one that don't have barcode in the reverse reads ?
How was the data generated? What is the cause that the barcode can end up everywhere (or not) in the reverse read?
It's an amplicon sequencing with custom barcodes.
I don't know how the barcode can be found in the Reverse read.
Just a precision that I forgot to mention: the barcode in the reverse read is the reverse complemented of
XXXX
What is the "normal" process ? should the barcode be only found in the forward read ?
That depends on the library prep. How was the library created? When/how were the barcodes attached? Without proper understanding of the experimental procedure we can't get this right.
I assume this is about the same data as in Confusion about barcodes and removal
Yes this is the same dataset.
The procotol is based on:
https://www.ncbi.nlm.nih.gov/pubmed/20516186
In that protocol I found the following (page4, figure 1):
So that explains why you have some fragments with barcodes on both sides. Essentially you should only have a barcode on one end. Question now is how frequent you saw the barcode in the reverse read.
Based on your explanation your barcode is only 4 characters long, so that means it can also be present by chance in the read, therefore you need to look for its expected context: the illumina P7 sequence.
The
XXXX
was just an example to simplify. In fact, the length is 7pb.So if I grep the reverse complement of the barcode in the Reverse read, I find 75695/118664 which correspond to 64%.
Maybe should I keep the PE with
And I discard the :
?
Are the barcodes at the beginning of the read in your grep (if that is where they are supposed to be)? As @Wouter already said you should find the barcode only one time but it can be at either end.
So there is 39066/118664 (33%) reverse reads that have the reverse complemented barcode in it's 5'.
And so 36629/118664 (31%) reverse reads that have the reverse complemented barcode somewhere in the read.
So if I understand, I should discard all the PE that have the reverse complemented barcode (at the beginning or middle) in it's reverse reads ?
The adapters are ligated using blunt end ligation and as such it's not impossible that fragments end up with two barcodes. However, if I'm not mistaken these shouldn't get sequenced since they contain the same adapter on both sides and therefore won't get amplified by bridge amplification. The barcode should always be at the P7 side of the amplicon so I would suggest OP to look for that sequence.
I just noticed the protocol you shared doesn't use inline barcodes.
It was based on that paper but has been modified lightly.
Then you might want to @#$%'ing consider telling us what you modified instead of having us take guesses to what you have been doing. Really, provide this information upfront because this is a waste of time. The past hour this thread has only been about the experimental procedure and we haven't started yet on the barcode processing. You made us look through protocols and now we have to find out that you modified the protocol - on a vital point apparently. This topic and the previous is quite a pain in the elbow to get a good understanding of what your question really is about.
If it is using inline barcodes then that is not a light modification.
I think you have enough information already to find the right solution.
For future reference, please do not post links to sites behind a paywall - not everyone has access. It's better to copy/paste the relevant information in your post.
For those who don't have access here is a dropbox link https://www.dropbox.com/s/v3xrola70fhzwyk/meyer2010.pdf?dl=0 I ehm perfectly ahum legal obtained that file cough and share this totally anonymously.
And you expect us to click on a dropbox link for a file that is anonymously shared :)
I do not necessarily "expect" that, I provide the opportunity. It's up to you to gamble whether it will be save or not ;) And there is always http://sci-hub.cc/ for those who want to obtain the paper the same way.
Thanks @WouterDeCoster, but I know how to access the reference. I was trying to encourage better behavior by the OP.