I would like to infer shared genomic interval between different samples.
My input:
sample chr start end
NE001 1 100 200
NE001 2 100 200
NE002 1 50 150
NE002 2 50 150
NE003 2 250 300
My expected output:
chr start end freq
1 100 150 2
2 100 150 2
Where the "freq" is the how many samples have contribuited to infer the shared region. In the above example freq = 2 (NE001 and NE002).
Cheers!
What have you tried so far?
Actually, I know how to measure the shared numbers between two regions.
input:
Function:
f <- function(x) length( intersect(seq(x[1],x[2],1), seq(x[3],x[4],1)) )
a <- apply(X,1,f)
And I known how to overlap regions and a "cbind" with overlapped ones too. But when all regions are in a same dataframe, and have a collumn tagging different animals... sincerely I donĀ“t know how can I start the logic...
Cool, we try to ensure that posters have given it a go before providing help. Hopefully my answer (below) will work well for you.