Entering edit mode
10.7 years ago
lanyuhao1991
▴
20
Hi, I use MosaikAligner to map pair-end reads to the genome(know the genome_size). I get the bam format file(a.bam). a.bam have all the mapped results including the unmapped,the mapped (unique-anlignmed and multiply-aligned) reads. I want to get the coverage and coverage rate of a.bam. I use 2 ways:
First way:
because the size of a.bam is too huge, I want to reduce the size. So
I use samtools -F 8 to get mapped reads file.
- samtools view -b -F 8 a.bam > a.map.bam
- samtools sort a.map.bam a.map.sort
- samtools depth a.map.sort.bam | awk '{if ($3 > 0){n++;sum+=$3}} END {print "The coverage rate =", n/genome_size,"\t", "The coverage depth =", sum/genome_size}
The result: The coverage rate =0.58 the coverage depth = 12X
Second way:
I do not reduce the unmapped reads.
- samtools sort a.bam a.sort
- samtools depth a.sort.bam | awk '{if ($3 > 0){n++;sum+=$3}} END {print "The coverage rate =", n/genome_size,"\t", "The coverage depth =", sum/genome_size}
The result: The coverage rate =0.59the coverage depth = 13X
- However, why do the second way that I get bigger coverage rate and coverage depth than the first way ? And why do the unmapped reads can contribute to the increase of the coverage rate and depth ? More importantly, I want to know which way is right.
samtools view -b -F 8 a.bam
will filter reads whose mate is unmapped rather than unmapped reads. What happens if yousamtools view -b -F 4 a.bam
instead?Use -F 8 to filter ummaped mates can have the same result as use -F 4.I do the test, the result is same. But use -F 12 can filter the pair-end reads including that only a mate is unmapped. http://samtools.github.io/hts-specs/SAMv1.pdf
0x4 segment unmapped;0x8 next segment in the template unmapped; I think use -F 4 or -F 8 in pair-end map will get the same result. This is not the key to my question. Thank you for your reply.
Whether filtering by 0x4 or 0x8 produces the same result will depend on the aligner and settings, which is why I asked.