Find human contaminants in metagenomics microbiome studies
3
0
Entering edit mode
7.9 years ago
David ▴ 240

Hi, Dealing with samples (2x150bp) from an illumina WGS HiSeq experiment i´m trying to consolidate my pipeline by removing possible human contaminants from my samples (my test sample is a community Mock that i have ordered). Initial amplicons were 400bp long so i don´t expect reads (2x150bp) to overlap.

I was wondering if other have used bwa to remove reads mapped to the human genome.

My idea is to use the following:

bwa mem reads.R1.fq reads.R2.fa > output.sam

Then use samtools with -f4 tag to get unmapped reads and discard human contaminants.

samtools -f4 output.sam > unmapped_reads samtools -F4 output.sam > mapped_reads

I was wondering if this is suitable aproach or if i need to fine tune some parameters ? and how are you dealing with BWA mapped reads, meaning you might have reads partially mapped to the human genome that could be rescued ?? How to control for %identity ?

Thanks for your comments..

bwa microbiome metagenomics wgs • 3.8k views
ADD COMMENT
1
Entering edit mode
7.9 years ago
GenoMax 147k

BBMap suite has a tool to do just this task. See this thread for more information.

ADD COMMENT
0
Entering edit mode
7.9 years ago

Here's a guide from http://www.metagenomics.wiki/:

Remove host sequences

Steps:

a) bowtie2 mapping: write all (mapped/unmapped) reads to a single .bam file
b) Samtools: use flags to extract unmapped reads 
c) bedtools: split paired-end reads into separated files

The main difference from your's:

# SAMtools SAM-flag filter: get unmapped pairs (both ends unmapped)
samtools view -b -f 12 -F 256 SAMPLE_mapped_and_unmapped.bam > SAMPLE_bothEndsUnmapped.bam
-f 12     Extract only (-f) alignments with both reads unmapped: <read unmapped><mate unmapped>
-F 256   Do not(-F) extract alignments which are: <not primary alignment>

For myself, I didn't find the most suitable method. The method above removes too much reads when comparing the microbe composition (by diamond+megan) to that of binning software such as kaiju.

ADD COMMENT
0
Entering edit mode

I feel like I am missing something really obvious, but can someone explain to me why a read with both mates unmapped (SAM flag 12) would have a not primary alignment (SAM flag 256)? I can see myself that few of these reads exist because adding the -F 256 to the comman does make a (very small) difference, but I do not understand why an unmapped read would have a not primary alignment (basically without having a primary one??). Thanks for any clarification

ADD REPLY
0
Entering edit mode

Ok, after reading extensively I understand. The confusion comes from the fact that some aligners and tools (such as Picard MarkDuplicates) assign "not primary alignment" flag (256), normally aimed at flagging alternative locations where a read can map, to "supplementary alignments" (flag 2048), which are chimeric/split reads having different parts of the read mapping to different locations in the genome. The latter do exist and we want to remove them when extracting exogenous reads not belonging to the host. CONCLUSION: Be aware of the behaviour of your aligner to know whether flag 256 or 2048 should be used to remove chimeric reads.

ADD REPLY
0
Entering edit mode
7.9 years ago
David ▴ 240

Thanks for your comments,

With samtools -f 12 and -F 256 you discard singletons where one pair is mapped and the other pair is unmapped which looks astonishing as singletons can be incorporated into the assembly process. Am i correct ?? Also you don´t have control over the percent identity ??

I ´m going to give bbmap a try as follows (will report result):

bbmap.sh -Xmx40g minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=db/hg19/ build=1 qtrim=rl trimq=10 untrim in=Sample.R1.non-matching.fa.gz in2=Sample.R2.non-matching.fa.gz outu=Sample.unmapped.fa.gz outm=Sample.mapped.fa.gz threads=10

It looks to me that the filtering of the samfile is critical here ?

ADD COMMENT
0
Entering edit mode

You misunderstand the -f 12 and -F 256, singletons are remained.

ADD REPLY

Login before adding your answer.

Traffic: 2229 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6