Entering edit mode
5.4 years ago
ciemanek
▴
140
I want to filter out human 'contaminants' from my metagenomic sample. I mapped my reads to a human genome and I am filtering them with samtools. So far, I only filter out reads that had mapping flag, but should I also add a flag for removing supplementary reads while filtering? I can't wrap my head around the idea of supplementary reads and what they actually mean in terms of filtering.
It is always a good idea to remove reads that may map to host(s) e.g. human before performing de novo assembly in metagenomic analysis. However, I am not sure what you mean by
can you explain what you mean by that?
If it refers to What's Supplementary Reads? then they should be removed.
ciemanek : Have you looked at
bbsplit.sh/removehuman.sh
tool from BBMap suite?What I mean are reads mapped by bwa mem as supplementary - they are listed with samtools flagstat as below:
What I wonder is what the fact that a read is supplementary tells me in terms of if it should be removed or not. I understand that those are reads that are not aligning fully into one fragment of a reference sequence but different parts of them map to different positions of the reference. Can we, in such case, say that this read is indeed coming from human DNA? And aren't reads flagged as 'supplementary' already a part of 'mapped'? I am not sure what flag should I use to filter out reads mapping to human.
I was not checking BBmap, as we wanted to use bwa + samtools since we already have it in our pipeline.
If it is mapping to human genome then it should be removed.
thanks a lot for the answer :)