Entering edit mode
4.2 years ago
Marcel
•
0
Hello,
we recently had a sequencing experiment where a lot of "undetermined" reads (i.e. reads that could not be properly demultiplexed) were produced. I was now asked to which species these reads map. Is there an easy way to do this, other than manually trying to align the reads to all sorts of different species?
extract the unmapped reads, convert to fasta and blast them against
nr
.You could use a program like
kaiju
(LINK) that is normally used for taxonomic classification with your fastq data. That saidare you getting unexpected indexes? If that is the case you should investigate that rather than try to investigate the reads associated with these.
Thanks for the suggestion. I was told that these undetermined reads correspond to reads either without a barcode at all or with a barcode with one or more mismatches.
Reads without indexes are likely phiX (which is generally spiked in as a control during sequencing). If your indexes were well designed (or you were using commercial indexes) then they should allow 1-2 errors. You can thus retrieve data that would otherwise go waste (e.g. ATCGCTA == ATCGGTA these would be considered equivalent).