Count The Coverage Rate Of A Bacterial Genome'S Bam File
3
0
Entering edit mode
10.8 years ago
lanyuhao1991 ▴ 20

I have the bam file. I want to count the coverage rate of a bacterial genome. For example, the size of a bacterial genome is 10000 bp. Reads have been mapped to the genome. I use 'samtool depth' and I get 6000 Nucleotide site have been coveraged. So the coverage rate is 60%. But I find some same reads is mapped to different site of the genome, so 60% is high. How can I handle these reads that are mapped to different site of the genome?

coverage bam map genome • 3.6k views
ADD COMMENT
2
Entering edit mode
10.8 years ago
Fidel ★ 2.0k

How did yo map the reads? The method to remove multi-reads is specific for the aligner used. You may want to check this answer: A: Best way to get truly unique reads in bowtie/SAM?

ADD COMMENT
0
Entering edit mode
10.8 years ago
cnluzon ▴ 30

I guess it depends on what you are trying to do with these alignments. If it is only an estimation of the coverage you are getting, the first thing I would do would be counting the number of reads that are actually mapped to multiple places. This can be easily done looking at the reads' IDs.

If the percentage of reads mapped to several places is low, then I guess you should not worry too much. However, if it is high the question that rises is how does that affect your results? What I mean is that if it is very important that these reads are properly mapped (I do not know what you want to do, if it is for instance variants what you are trying to identify, this might be an issue) then maybe you should not count on these "ambiguous" reads and compute the coverage counting only the well mapped reads. This would yield a more reliable coverage value. If you are going to make some experiment decisions based on this and it is possible for you, I would try to reach the needed coverage without counting on the "ambiguous" reads.

Otherwise (if this is not a really big issue) you could try to see if your mapper has an "map only to one place" option or "remove ambiguous reads", which they sometimes have.

Hope this helped.

ADD COMMENT
0
Entering edit mode
10.8 years ago

reads that map to several positions on the genome are usually assigned a mapping quality value of 0, therefore you could try removing those reads while calculating the coverage depth by considering a minimum mapping quality of 1:

samtools depth -Q 1 your_bam_file > final_coverage.txt
ADD COMMENT

Login before adding your answer.

Traffic: 1133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6