Hi everyone,
Im analysing Oxford Nanopore sequenced DNA in human cells, which i align to the hg19 UCSC reference genome. Most of the time i get a high mapping percentage, however in a few cases i get mapping percentages below 5%.
Of course im able to tune the parameteres of minimap2 a bit, but i never get more than 5% mapped sequences. With the basic mapping being done as the following
minimap2 -t 8 -ax map-ont --secondary=no hg19.25chr.mmi xx.fastq | samtools sort - > xx.bam
Then for those cases with low mapping percentages, im extracting some of the unmapped sequences and by using BLAST I find the low mapping percentages are due to viral contamination of my samples.
Do any of you guys know of some better method / database in order to assess what bacteria / virus / other species the unmapped reads are aligning to, instead of just identifying the contaminated species by blasting a random amount of reads and then realigning to their respective ref genome. ?
Thanks for your help.
Since you have long reads you are probably going to be limited in tools you can use for screening for contamination that are currently available.
kraken
(https://ccb.jhu.edu/software/kraken2/ ) is generally used for this, but I am not sure if it will accept long reads.You could use
shred.sh
from BBMap to chop your reads up into smaller pieces and then potentially usekraken
.Thanks for suggesting to cut the reads into smaller pieces, because one of my problems has exactly been im having long reads.
you can try Kraken
https://ccb.jhu.edu/software/kraken/
Thank you, ill give it a look