Filtering human reads in metagenomics: should supplementary reads be removed?
0
0
Entering edit mode
5.4 years ago
ciemanek ▴ 140

I want to filter out human 'contaminants' from my metagenomic sample. I mapped my reads to a human genome and I am filtering them with samtools. So far, I only filter out reads that had mapping flag, but should I also add a flag for removing supplementary reads while filtering? I can't wrap my head around the idea of supplementary reads and what they actually mean in terms of filtering.

dna-seq metagenomics bwa samtools flagstat • 2.5k views
ADD COMMENT
1
Entering edit mode

It is always a good idea to remove reads that may map to host(s) e.g. human before performing de novo assembly in metagenomic analysis. However, I am not sure what you mean by

supplementary reads

can you explain what you mean by that?

ADD REPLY
0
Entering edit mode

If it refers to What's Supplementary Reads? then they should be removed.

ciemanek : Have you looked at bbsplit.sh/removehuman.sh tool from BBMap suite?

ADD REPLY
0
Entering edit mode

What I mean are reads mapped by bwa mem as supplementary - they are listed with samtools flagstat as below:

4086053 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
482807 + 0 supplementary
0 + 0 duplicates
964819 + 0 mapped (23.61% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

What I wonder is what the fact that a read is supplementary tells me in terms of if it should be removed or not. I understand that those are reads that are not aligning fully into one fragment of a reference sequence but different parts of them map to different positions of the reference. Can we, in such case, say that this read is indeed coming from human DNA? And aren't reads flagged as 'supplementary' already a part of 'mapped'? I am not sure what flag should I use to filter out reads mapping to human.

I was not checking BBmap, as we wanted to use bwa + samtools since we already have it in our pipeline.

ADD REPLY
0
Entering edit mode

If it is mapping to human genome then it should be removed.

ADD REPLY
0
Entering edit mode

thanks a lot for the answer :)

ADD REPLY

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6