Entering edit mode
3.3 years ago
smrutimayipanda
▴
20
Hii, I have run the samtools coverage command for calculation of gene coverage. I got this results:
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq
1 1 249250621 5267569 43635507 17.5067 3.25543 36.1 55
2 1 243199373 3735452 38023900 15.6349 2.36581 36.1 56.7
3 1 198022430 2876635 29511402 14.9031 2.24146 36.1 59.1
4 1 191154276 2120864 24624225 12.8819 1.70427 36.1 54.1
5 1 180915260 2339470 25323867 13.9976 1.99639 36.1 57.5
6 1 171115067 2710441 25385971 14.8356 2.44034 36.1 58
7 1 159138663 2524920 25711041 16.1564 2.43605 36.1 54.9
8 1 146364022 1716145 20358945 13.9098 1.80446 36.1 57.1
9 1 141213431 2185054 20803062 14.7316 2.38488 36.1 55
10 1 135534747 2104547 22107195 16.3111 2.39094 36.1 55.4
11 1 135006516 2923941 23462880 17.3791 3.33113 36.1 58.4
12 1 133851895 2562215 23469760 17.5341 2.94676 36.1 59
13 1 115169878 943925 12223004 10.613 1.26315 36.1 58.7
14 1 107349540 1610115 14890978 13.8715 2.31553 36.1 58.4
15 1 102531392 1950910 16489158 16.0821 2.92665 36.1 51.8
16 1 90354753 2338564 17753639 19.6488 3.96056 36.1 51.9
17 1 81195210 2951173 20431476 25.1634 5.59499 36.1 56.6
18 1 78077248 819720 10195226 13.0579 1.61619 36.1 57.2
19 1 59128983 2968003 18049860 30.5262 7.70268 36.1 58.5
20 1 63025520 1165330 11315216 17.9534 2.84952 36.1 59.4
21 1 48129895 561354 5884798 12.2269 1.79324 36.1 58.3
22 1 51304566 1134209 9023051 17.5872 3.40329 36.1 56
X 1 155270560 1305748 14555083 9.37401 1.29082 36.1 55.6
Y 1 59373566 205755 3473216 5.84977 0.516154 36 26.7
MT 1 16569 16217 16569 100 148.003 36 57.8
I want to understand what the coverage column is saying? Is 17 is the percentage (17%) or anything else? How to interpret the data? How do you know how much coverage of gene is there?
you can see the startpos as 1. the percentage would be greater than 100, which is not possible. second thing coverage of 17%, 18% or 30% is very less, so how to improve that?
The percentage will not be greater than 100 or we can't understand each other. If we will have cover each nucleotide then
endpos - startpos + 1 = covbases
so(endpos - startpos + 1) / covbases = 1
and we havecoverage = 100
.I don't know experimental design so I can't say why coverage is so low.
Please calculate once with my data from your given formula. Its greater than 100.
Oh, yes. My bad.
coverage = (covbases / (endpos - startpos + 1)) * 100
ok i got it. But my question is 17% or 20% or 30% is less. what should i do to make it greater than 90%?
It's very depend on experimental design. May be it's just bad reads and there is nothing you can do in this case.
Can you please tell me how to know about the coverage of genes? and variant coverage?
For genes coverage you need to know coordinates of genes in format
chr:start-end
, then you can use-r
flag of samtools coverage for exampleFor variant coverage you need to call variant first. You can do it using bcftools. In VCF file file you get will be depth for each variant.
Please provide me the command line, if possible.