Entering edit mode
4.0 years ago
robert.murphy
▴
90
Should you blastn illumina reads for contamination before or after you have quality controlled them based on phred score?
What effects would a before and after be predicted to have?
you should not blast(n) any reads whenever ...
no , serious now: blast should not be your go-to tool when dealing with NGS data, there exists for better and more efficient software to accomplish blast-like tasks for NGS data.
Seen that you ask for contamination, have a look in to things like KRAKEN (and/or google for "NGS data and contamination" or such)
And before or after quality filtering will not make much difference, if real contamination is present it would not get removed by Q-filtering.
Thank you for this.
Why should one not blastn reads?
efficiency/speed for one. Also the sensitivity of blast on those rather short sequences is less than real read mappers. (perhaps less of an issue here but blast has also no notion of "paired-end", which is an important concept in NGS data).
So should you merge paired end reads when using blastn in this way or just only use the forward reads? I will give kraken a look :)
Remember to use
-task blastn-short
when you run the blast searches. Blast would be sensitive to contamination from adapter sequences so you should merge and then scan/trim the reads prior to blast searches, if you want to do this.Blastn is very slow, you may use bwa o bowtie2 for mapping reads on known possible contamination genomes.
Removing condamination AFTER QC is better, because the latter is faster. Runing the slower process in smaller data costs less time.
You should have no contamination in well prepared libraries. Do you know for certain there is contamination?
In the short reads I found no contamination but when blasting the assembled contigs I found 2 mapping to the incorrect species. The culture was pure but the DNA was extracted by the sequencing company. Is is likely the long reads (it is a hybrid assembly) as contaminated but the short reads are not)
That does not seems like strong evidence of contamination. Since blast does local alignments it is possible that you may have got those alignments by chance. You would want to investigate carefully before drawing a conclusion.
So due to how the blast algorithm works it is not the best for contamination detection unless paired with other information?