Hi,
I want to determine how well different regions of the genome are sequenced based on the GNomad data source. I have downloaded the following genome-wide tabular data with the first 10 rows shown here:
chrom pos mean median over_1 over_5 over_10 over_(and so on...)
1 12141 2.9005e-02 0 2.1939e-02 1.0547e-04 0.0000e+00
1 12142 2.9216e-02 0 2.1622e-02 1.0547e-04 0.0000e+00
1 12143 2.7951e-02 0 2.1200e-02 1.0547e-04 0.0000e+00
1 12144 2.9111e-02 0 2.1728e-02 1.0547e-04 0.0000e+00
1 12145 2.9216e-02 0 2.1833e-02 1.0547e-04 0.0000e+00
1 12146 2.6790e-02 0 2.0251e-02 1.0547e-04 0.0000e+00
1 12147 3.2802e-02 0 2.4048e-02 1.0547e-04 0.0000e+00
1 12148 3.3330e-02 0 2.4470e-02 1.0547e-04 0.0000e+00
1 12149 3.4279e-02 0 2.4786e-02 0.0000e+00 0.0000e+00
But I have not found out, what ...
the coverage numbers actually mean. I am aware of this question. There appear to be different possible meanings for coverage beyond just the number of reads overlapping a region. Is there any way to determine which of the many meanings applies here?
what the meaning of over_1, over_5, over_10 (and so on) is. Do these refer to the mean or median over neighboring
n
positions? And is itn
positions on both sides or is it a centered window ofn
positions?
Is this a standard TSV-based data format or is it GNomad-specific?