Dear All,
I have been using fastx_barcode_splitter to demultiplex my reads. Today I found that there are some of the reads that did not match to any barcodes we used in the experiment. I took a closer look and I found the problem of reads not sorted because there was atleast one base in the beginning of the read.
Example Fasta Sequenece:
>HWI-ST863:238:C20G3ACXX:4:1204:18858:57161 1:N:0:AAACAAAA
TACTTACCTACTTCCGCTGGTCATCCTGCGCCAATTTGATGTGTGTGGTTTTTAATTGAGCTGTATAATCTGTTTATTTTGAGGCCAAAAAAAAAAAA
Barcode: ACTTACCTACTT
TACTTACCTACTTCCGCTGGTCATCCTGCGCCAATTTGATGTGTGTGGTTTTTAATTGAGCTGTATAATCTGTTTATTTTGAGGCCAAAAAAAAAAAA
_ACTTACCTACTT
This is however a match, but the read is not sorted into corresponding barcode file.
The command I use is the following:
cat <file_name> | fastx_barcode_splitter.pl --bcfile mybarcodes.txt --bol --mismatches 3 --prefix code_ --suffix "_1" > code_1.stats
I tried option --partial
, but this is super slow and I almost had to kill the process and did not improve code splitting efficiently.
Can some one help me understand if there is any better way to manage this? is there anyother splitter that can be used with ease and easily be integrated with some existing pipeline?
Is there any known explanation for that extra nucleotide at the beginning of your reads?
I am unable to come up with any but barcode contamination in synthesis/purification?