Hello,
Earlier I had a problem ( already solved, thanks to the help of Brian Bushnell and Genomax), in which my index reads were not supply in a separated file but in the fastq labels, like this example:
@GHAY-HISEQ2:5:2308:2003:1934#TTGCTGGA-ACCAACTG/1;1
NGCATGAACGGCTAAACGAGGGTCCAACTGTCTCTTATCT
+GHAY-HISEQ2:5:2308:2003:1934#TTGCTGGA-ACCAACTG/1;1
B[[aaeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
@GHAY-HISEQ2:5:2308:2551:1934#CCTGGATA-TGCTCGAC/1;1
NAGCTGGAATTACCGCGGCTGCTGGCACCAGACTTGCCCT
+GHAY-HISEQ2:5:2308:2551:1934#CCTGGATA-TGCTCGAC/1;1
B[[[aeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
With the aid of demuxbyname script from BBSuiteTools, I was able to demultiplex all reads with indexes containing no mismatch.
I then got nearly 90 % of the reads using this approach, but I am thinking in how I could extract from the remaining 10%, reads with indexes containing up to 1 mismatch.
Do anyone know some method for doing this?
Thanks for your comment. I was aware that I would not get all 10%, but I was hopping to recover ~3% from it. Indeed, it seems to me a good idea to look the number and kind of tags are present in the undertermined file.
Contrary to what I thought, most of the reads are crap:
So I think you are right. I will stay with the perfect matches. Thank you again!