Entering edit mode
8.1 years ago
Picasa
▴
650
Hello,
I have a list of sequence that I want to demultiplex.
1) The barcode is it always at the beginning (5') of the read ?
2) I'm looking for a soft like fastx_barcode_splitter.pl
from FASTX toolkit but this one doesn't trim the barcodes.
Do you know a know that split and trim ?
barcodes are never a part of the actual read (for standard illumina barcodes) unless your barcodes are designed in this experiment to be "in-line".
That said take a look at sabre package. It may do what you want.
Thanks for your support.
I am not sure if I was clear but Im in the situation (c) in this figure and I want to go to the (d)
http://www.illumina.com/content/dam/illumina-marketing/images/technology/multiplexing-overview-figure.gif
So for each sample I have a barcode information (the sequence and its reverse complement) and I want to keep the paired end (because I have PE data) .
1) So for the sabre package, I used the PE mode with the barcode in its F sens. is it right ?
That figure your linked is for standard illumina barcodes. Even though they are shown "inline" in that illustration, that part is read as an independent read(s) on the sequencer. These would generally be handled by Illumina's own bcl2fastq software.
Have you looked at the demultiplexed result from Sabre to check if the reads have been correctly separated?
Yes it has been correctly separated .
The R1 reads has been trimmed correctly but teh R2 remain the same, that's why I am not sure if I'm doing right
Sorry to harp on this but can you clarify if you are using standard illumina barcodes or custom barcodes that are designed to be inline? Perhaps you can post a couple of example reads to illustrate the before/after scenario.
This is ampliconseq with custom barcodes. We have gene from different species that we sequenced on the same lane. My goal is make 2 fastq files (PE) for each specie.
For instance with this pair of read:
I know that the barcode
CGCTTGA
(sens F) correspond to the sample X.So after sabre:
Barcode is only expected to be on R1, correct (so R2 should be left as is)?
I dont' know.. Is it the standard protocol ?
Since R1/R2 are coming from the same DNA fragment you need to label only one-end for basic demux. One may do both ends if it is something more complex (not an experimental person, so can't think of a scenario where that would be needed).
Thanks you for your support anyway.
There is an option -c to trim the R2 reads too (first 7 bp), but I'm not sure If I have to use it..
You would have to do that only in a case if you had a barcode at both ends of the expected fragment. You would need to ask those who designed this amplicon.
Both ends can be barcoded if sequencing many samples where just barcoding one side wouldn't give enough index possibilities.
https://github.com/najoshi/sabre/commit/2a1bedc9a53fd03420d7a3b11f406efa60f90ba1
Check this out it looks like there is an option :
--h, --both-barcodes, Optional flag that indicates that both fastq files have barcodes.\n\ +-c, --both-barcodes, Optional flag that indicates that both fastq files have barcodes.\n\