Question

Samtools coverage results

0

Entering edit mode

4.1 years ago

smrutimayipanda ▴ 20

Hii, I have run the samtools coverage command for calculation of gene coverage. I got this results:

#rname  startpos    endpos  numreads    covbases    coverage    meandepth   meanbaseq   meanmapq
1   1   249250621   5267569 43635507    17.5067 3.25543 36.1    55
2   1   243199373   3735452 38023900    15.6349 2.36581 36.1    56.7
3   1   198022430   2876635 29511402    14.9031 2.24146 36.1    59.1
4   1   191154276   2120864 24624225    12.8819 1.70427 36.1    54.1
5   1   180915260   2339470 25323867    13.9976 1.99639 36.1    57.5
6   1   171115067   2710441 25385971    14.8356 2.44034 36.1    58
7   1   159138663   2524920 25711041    16.1564 2.43605 36.1    54.9
8   1   146364022   1716145 20358945    13.9098 1.80446 36.1    57.1
9   1   141213431   2185054 20803062    14.7316 2.38488 36.1    55
10  1   135534747   2104547 22107195    16.3111 2.39094 36.1    55.4
11  1   135006516   2923941 23462880    17.3791 3.33113 36.1    58.4
12  1   133851895   2562215 23469760    17.5341 2.94676 36.1    59
13  1   115169878   943925  12223004    10.613  1.26315 36.1    58.7
14  1   107349540   1610115 14890978    13.8715 2.31553 36.1    58.4
15  1   102531392   1950910 16489158    16.0821 2.92665 36.1    51.8
16  1   90354753    2338564 17753639    19.6488 3.96056 36.1    51.9
17  1   81195210    2951173 20431476    25.1634 5.59499 36.1    56.6
18  1   78077248    819720  10195226    13.0579 1.61619 36.1    57.2
19  1   59128983    2968003 18049860    30.5262 7.70268 36.1    58.5
20  1   63025520    1165330 11315216    17.9534 2.84952 36.1    59.4
21  1   48129895    561354  5884798 12.2269 1.79324 36.1    58.3
22  1   51304566    1134209 9023051 17.5872 3.40329 36.1    56
X   1   155270560   1305748 14555083    9.37401 1.29082 36.1    55.6
Y   1   59373566    205755  3473216 5.84977 0.516154    36  26.7
MT  1   16569   16217   16569   100 148.003 36  57.8

I want to understand what the coverage column is saying? Is 17 is the percentage (17%) or anything else? How to interpret the data? How do you know how much coverage of gene is there?

NGS • 3.2k views

ADD COMMENT • link 4.1 years ago by smrutimayipanda ▴ 20

score 0 · Answer 1 · 2021-08-30

0

Entering edit mode

4.1 years ago

Hood ▴ 40

Accordingly to samtools coverage documentation

coverage Proportion of covered bases [0..1]

But yes, in tab-delimited output it is not a proportion [0...1] but percent. So basically, it coverage = ((endpos - startpos + 1) / covbases) * 100

ADD COMMENT • link 4.1 years ago by Hood ▴ 40

0

Entering edit mode

you can see the startpos as 1. the percentage would be greater than 100, which is not possible. second thing coverage of 17%, 18% or 30% is very less, so how to improve that?

ADD REPLY • link 4.1 years ago by smrutimayipanda ▴ 20

0

Entering edit mode

The percentage will not be greater than 100 or we can't understand each other. If we will have cover each nucleotide then endpos - startpos + 1 = covbases so (endpos - startpos + 1) / covbases = 1 and we have coverage = 100.

I don't know experimental design so I can't say why coverage is so low.

ADD REPLY • link 4.1 years ago by Hood ▴ 40

0

Entering edit mode

Please calculate once with my data from your given formula. Its greater than 100.

ADD REPLY • link 4.1 years ago by smrutimayipanda ▴ 20

1

Entering edit mode

Oh, yes. My bad. coverage = (covbases / (endpos - startpos + 1)) * 100

ADD REPLY • link 4.1 years ago by Hood ▴ 40

0

Entering edit mode

ok i got it. But my question is 17% or 20% or 30% is less. what should i do to make it greater than 90%?

ADD REPLY • link 4.1 years ago by smrutimayipanda ▴ 20

0

Entering edit mode

It's very depend on experimental design. May be it's just bad reads and there is nothing you can do in this case.

ADD REPLY • link 4.1 years ago by Hood ▴ 40

0

Entering edit mode

Can you please tell me how to know about the coverage of genes? and variant coverage?

ADD REPLY • link 4.1 years ago by smrutimayipanda ▴ 20

0

Entering edit mode

For genes coverage you need to know coordinates of genes in format chr:start-end, then you can use -r flag of samtools coverage for example

-r, --region REG Show specified region. Format: chr:start-end.

For variant coverage you need to call variant first. You can do it using bcftools. In VCF file file you get will be depth for each variant.