calculating cumulative recombination rate for MOSAIC?
2
1
Entering edit mode
5.4 years ago
rshoobs ▴ 10

I am running MOSAIC and don't understand how their "cumulative recombination rate" is calculated. I have the 1000G genetic map that their data appears to be based off of, but don't know how to convert the combined rate to the cumulative rate. As an example the first few lines of the genetic map look like this

pos        rate(cM/Mb)     Map(cM)
14431347    9.6640974    0.00000000
14432618    9.7078062    0.01228307
14433624    9.7138922    0.02204912
14433659    9.7163435    0.02238911
14433758    9.7078088    0.02335103

Whereas the cumulative recombination rate looks like this

:sites:44801
14431347 14432618 14433624 14433659 14433758
0 0.009850758 0.01767697 0.0179494 0.01872035

where the 3 rows are # of sites, pos (bp) and cumulative recombination rate

I did find this note online for calculating cumulative recombination rate. I've calculated my mean combined rate as ~2.65, however I still seem to be missing something as multiplying this by the distance between snps does not give me the correct answer.

local ancestry mosaic recombination • 2.1k views
ADD COMMENT
0
Entering edit mode

@rshoobs Did you solve this?

ADD REPLY
0
Entering edit mode
4.6 years ago

I've only just seen this post. The rates files in the MOSAIC example files are derived from another source so are different to 1000G (they are HapMap I think). They vary with genome build (MOSAIC examples are build 36) and that's the main consideration; otherwise any differences should be quite small.

If you have pos and rate (columns 1 and 2 in your example) then to calculate row 3 you would have dm (the differenced map) ith value is given by the equation: dm[i] = 1e-6rate[i-1](pos[i+1]-pos[i])

i.e. multiply by 10^-6 (to change to megabases) then calculate the rate over all megabases since the previous position.

In R you could calculate this in two lines avoiding a loop via: dm=c(0,1e-6rate[-N](pos[-1]-pos[-N])) # differenced map m=cumsum(dm) # map where N is the total number of sites / positions you have.

The only caveat is that the rates (second column) are sometimes marked as -1 where there is no recombination so you should set those to zero first by running rate[rate==-1]=0

ADD COMMENT
0
Entering edit mode

Thanks for explaining this. I tested this code out in R using one of the hg19 files provided by impute2 but the values of m do not match (or even really resemble) the COMBINED_rate column. Have I done something wrong or missed a step?

pos=map[,1]
rate=map[,3]
N=length(pos)
dm=c(0,0.0000001* rate[-N] * (pos[-1]-pos[-N]))
m=cumsum(dm)
map$m=m

head(map,66)
   position COMBINED_rate.cM.Mb. Genetic_Map.cM.          m
1     55550            0.0000000       0.0000000 0.0000000000
2    568322            0.0000000       0.0000000 0.0000000000
3    568527            0.0000000       0.0000000 0.0000000000
4    721290            2.6858077       0.4102920 0.0000000000
5    723819            2.8222713       0.4174296 0.0001037629
6    723891            2.9813106       0.4176442 0.0001067683
7    728242            2.9806151       0.4306129 0.0002884853
8    729948            3.0780969       0.4358641 0.0003619479
9    740857            3.0751333       0.4694107 0.0008374321
10   750235            1.7834967       0.4861364 0.0012776454
11   752566            1.1323778       0.4887759 0.0013909638
12   754192            0.5926495       0.4897396 0.0014704388
13   754745            0.3843793       0.4899521 0.0014975214
14   765948            0.4143187       0.4945938 0.0020464148
15   767070            0.4476843       0.4950961 0.0021019082
16   768448            0.4482707       0.4957138 0.0021701324
17   769551            0.4503094       0.4962105 0.0022248097
18   771521            0.4573571       0.4971115 0.0023225631
19   774047            0.4689815       0.4982961 0.0024481335
20   775659            0.4607820       0.4990389 0.0025284588
21   776546            0.4547785       0.4994423 0.0025727236
22   777122            0.4539713       0.4997038 0.0026014915
23   777513            0.4559979       0.4998821 0.0026210299
24   779322            0.4564198       0.5007077 0.0027114585
25   780027            0.4564688       0.5010295 0.0027467584
26   780785            0.4560405       0.5013752 0.0027847365
27   784023            0.4560154       0.5028518 0.0029470818
28   784904            0.4651660       0.5032616 0.0029913830
29   785050            3.1979933       0.5037285 0.0029987306
30   785989            4.1068312       0.5075848 0.0030460307
31   792480            4.4923191       0.5367445 0.0033755040
32   798026            5.7728685       0.5687608 0.0036731825
ADD REPLY
0
Entering edit mode

Hi rdmorin,

rates is the second column rather than the 3rd as you have used.

The other issue is that we need to add on the starting genetic distance value i.e. map[1,3]. Often but not always maps start at 0, but this one does not.

So using the example that comes with impute2:

map=read.table("example.chr22.map",header=T)

pos=map[,1]

rate=map[,2]

N=length(pos)

dm=c(0,1e-6*rate[-N]*(pos[-1]-pos[-N]))

m=cumsum(dm)

m=m+map[1,3]

This should ensure m is identical to the 3rd column now.

ADD REPLY

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6