Dear All,
I want to remove the singletons from the aligned bam file.
Downloaded fastq files using
fastq-dump --split-files SRR1517848
Then aligned the paired end fastq files, using BWA by:
bwa mem hg19.fa R_1.fa R_2.fa -o SRR1517848.sam
Then converted it to BAM format and I used samtools to remove the singletons using : according to the paper, article
samtools view -@ 8 -F 0x04 -b SRR1517848.sam > SRR1517848.bam
Then I looked into the stats using samtools flagstat command by;
samtools flagstat SRR1517848.bam
which gave;
Before filtering
4614120 + 0 in total (QC-passed reads + QC-failed reads)
4549753 + 0 mapped (98.61% : N/A)
4609656 + 0 paired in sequencing
2304828 + 0 read1
2304828 + 0 read2
4461356 + 0 properly paired (96.78% : N/A)
4517788 + 0 with itself and mate mapped
27501 + 0 singletons (0.60% : N/A)
39296 + 0 with mate mapped to a different chr
33576 + 0 with mate mapped to a different chr (mapQ>=5)
And after filtering
4549753 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
4464 + 0 supplementary
0 + 0 duplicates
4549753 + 0 mapped (100.00% : N/A)
4545289 + 0 paired in sequencing
2281318 + 0 read1
2263971 + 0 read2
4461356 + 0 properly paired (98.15% : N/A)
4517788 + 0 with itself and mate mapped
27501 + 0 singletons (0.61% : N/A)
39296 + 0 with mate mapped to a different chr
33576 + 0 with mate mapped to a different chr (mapQ>=5)
So my Questions:
1) So the command to remove the singletons by -F 0x04 have only removed unmapped reads ?
2) In both before and after stats the singletons remain, and what are those?
Before: 27501 + 0 singletons (0.60% : N/A)
After : 27501 + 0 singletons (0.61% : N/A)
3) Is there a way to remove singletons in variant detection GATK pipeline ? what benefits does it have?
Thanks
I'd also add the header to the bam file (by adding the
-h
parameter), to avoid problems in downstream analysis.Thanks when I used the above command it gave singletons removed reads:
As for SAM flag guide when I use
0x08
I am removing mate unmapped (0x8) reads, and with a warning:Flag(s) and 0x8 cannot be set when read is not paired
eg: R1 and R2 are mate pairs, where:
R1 = mapped (mate is unmapped)
R2 = unmapped .
So with this I am removing the mate unmapped reads only ?
i.e so I am removing R2 reads and keeping the R1 ?
2) My Second question is How important is to remove the singletons in GATK variant detection pipe line? Is this a step that I need to be doing? or can I avoid this?
Thanks when I used the above command it gave singletons removed reads:
As for SAM flag guide when I use
0x08
I am removing mate unmapped (0x8) reads, and with a warning:Flag(s) and 0x8 cannot be set when read is not paired
eg: R1 and R2 are mate pairs, where:
R1 = mapped (mate is unmapped)
R2 = unmapped .
So with this I am removing the mate unmapped reads only ?
i.e so I am removing R2 reads and keeping the R1 ?
2) My Second question is How important is to remove the singletons in GATK variant detection pipe line? Is this a step that I need to be doing? or can I avoid this?
In your e.g. R1 is the singleton read since its mate is unmapped and thus R1 read will be removed when you filter for the flag
0x08
.Thanks for the clarification. So why is it important to remove the singleton ?
(So my thinking is::: since its mate can not be mapped to the chromosome , the mapped read can not be accurately determined. )