Best SAM flags to evaluate host contamination
0
0
Entering edit mode
5.5 years ago
ctseto ▴ 310

After generating a BAM, looking at which reads map to a known contaminant (human host) and which don't (theoretically, gut contents across tree of life).

From BAM, was thinking of extracting paired end reads using samtools fastq -1 R1.fastq -2 R2.fastq -f 13 input.bam ; rationale being f (only include reads with all of the FLAGs in INT present) with 13 is:

1   1   Read paired
3   4   Read unmapped
4   8   Mate unmapped

Thus if reads are mapped to human host, reads output with f13 that are paired, unmapped and mate unmapped are likely the reads I want (to the limits of the human reference sequence itself being clean).

And to evaluate the FASTQ reads corresponding to human, was thinking F (only include reads with none of the FLAGS in INT present) 12, which includes reads that are not unmapped and mate unmapped. I am still somewhat in the dark about "proper pair" and whether or not reads where mates are on different contigs might cause issues. To me, F13 would mean including things that were not 1, or not paired, which might make things weird? Otherwise going with pair & properly paired would be straightforward versus this not/unpaired business.

f13 would appear to be the equivalent of the more common 77/141 (with additional 64 and 128 to pick R1 and R2); F12, being a negative, might throw a bunch of unusual confounders my way...

SAM NGS • 1.2k views
ADD COMMENT
1
Entering edit mode

sounds about right to me, I did it similar. I wouldn't mind the pairing at all (except filtering for both reads unmapped), that flag is always true, unless you have unpaired input data.

"Proper pairs" essentially means both reads are in the correct orientation and within the expected distance range, see this thread

ADD REPLY
1
Entering edit mode

I would suggest binning the reads using human genome with bbsplit.sh from BBMap suite.

ADD REPLY

Login before adding your answer.

Traffic: 1613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6