Question

Statistical method of deducing the probability of a ratio being 1:2 for pairs of numbers (read depth) in a set

0

Entering edit mode

10.0 years ago

jomaco ▴ 200

By looking at the numbers below (average depth of DNA sequencing read coverage across transcript regions) I am trying to decide whether for each line V1 and V2 are equal or whether V1 is half of V2 i.e. it looks like the first 12 lines potentially have a ratio of 1:2 (V1:V2) and the rest probably have a ratio closer to 1:1 (V1:V2).

Is there a statistical method of deducing the probability of the ratio being 1:2 (V1:V2)?

I am a novice statistician and have only used methods to find the whether two sets of data have means which are significantly different from each other e.g. Student's t-test, I am unsure how to compare individual numbers, perhaps taking into account the differences in the numbers in the dataset as a whole.

V1              V2
26.4048096192   40.5971943888
42.7286338116   67.3218964442
16.6910523354   28.449634215
25.6696629213   39.7415730337
22.8902439024   35.4198606272
23.4578783151   42.8853354134
30.844304522    47.2364052662
21.3841059603   45.9911699779
26.9329608939   53.3240223464
25.6696629213   39.7415730337
22.8902439024   35.4198606272
23.4578783151   42.8853354134
48.4988523336   42.5401683244
66.9087759815   56.2771362587
30.9506517691   44.8621973929
41.0501567398   36.207523511
14.7519747235   11.4992101106
98.5691964286   86.9235491071
112.394004283   109.324411135
77.6472694718   70.7502238138
16.791973792    15.1638001638
47.6559691913   40.216944801
52.3489795918   48.362244898
83.0362537764   78.9007337074
47.7375565611   42.4809954751
85.6609654124   78.0877993158
85.1274038462   90.8990384615
153.882743363   126.502212389
99.50558109     93.8778726198
66.8800773694   58.5222437137
30.3466026081   29.2868908717
55.5277078086   52.7877833753
81.8555408971   74.9525065963
86.95684803     82.0356472795
85.0009587728   81.0872483221
203.495197439   135.008537887
70.7687908497   74.9027777778
87.9617065557   82.0097814776
84.094017094    83.5337132004
30.3687707641   45.0897009967
93.0515653775   90.2780847145
80.6575875486   79.2879377432
21.380224261    19.6819571865
40.7149700599   40.7023952096
103.873167982   94.3038331454
44.9216677195   44.0650663298
22.7315377932   19.5851433536
31.1595383571   25.9450101833
144.497706422   142.811926606
80.0854621458   70.4226660474
80.4943127962   72.2900473934
18.4689213894   20.7294332724
123.254526092   126.424920128
26.4048096192   40.5971943888
42.7286338116   67.3218964442
89.939923539    86.7706171491
74.0379020287   67.8980702622
84.1919866444   74.8113522538

ratio significance sequencing-depth • 1.9k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.0 years ago by jomaco ▴ 200

Ram · Answer 1 · 2015-11-13

It's important to realize that there is a reference-bias when mapping reads, particularly with a mapping program that has low tolerance for errors. Baited exon-capture data also causes such a bias. As such, when looking at ratios, a 100/140 split might have a near-zero chance of occurring when flipping coins, but... it would be a pretty strong indication of heterozygousity in sequencing.

In short, due to bias (which cannot be effectively calculated), theoretical techniques that assume unbiased random distributions will not give you a very useful answer. It's better to determine empirically, for a given type of data, what kind of ratios you see in the mapped reads when something has a 2:1 or 1:1 or 0:1 genotype.