overlap snp with regions in R
1
0
Entering edit mode
4.0 years ago
pt.taklifi ▴ 60

Hello everyone, I have a list of SNPs in R and a list of regions.

p1.snp <- structure(list(chrom = c("chr1", "chr1", "chr1", "chr1", "chr1", 
"chr1"), position = c(2135809L, 11130112L, 11253473L, 11258963L, 
15847782L, 16611163L), REF = c("C", "G", "G", "C", "C", "A"), 
    ALT = c("A", "T", "T", "A", "G", "G"), NA. = c(18L, 46L, 
    21L, 32L, 17L, 9L), NA..1 = c(0L, 0L, 0L, 0L, 0L, 0L), NA..2 = c("0%", 
    "0%", "0%", "0%", "0%", "0%"), NA..3 = c("C", "G", "G", "C", 
    "C", "A"), NA..4 = c(11L, 31L, 8L, 43L, 9L, 0L), NA..5 = c(4L, 
    9L, 4L, 12L, 6L, 14L), NA..6 = c("26.67%", "22.5%", "33.33%", 
    "21.82%", "40%", "100%"), NA..7 = c("M", "K", "K", "M", "S", 
    "G"), NA..8 = c("Somatic", "Somatic", "Somatic", "Somatic", 
    "Somatic", "Somatic"), NA..9 = c(1L, 1L, 1L, 1L, 1L, 1L), 
    NA..10 = c(0.03335777, 0.0005946179, 0.01209677, 0.002473575, 
    0.005523112, 1.223706e-06), NA..11 = c(0L, 6L, 8L, 43L, 9L, 
    0L), NA..12 = c(11L, 25L, 0L, 0L, 0L, 0L), NA..13 = c(0L, 
    0L, 4L, 12L, 6L, 14L), NA..14 = c(4L, 9L, 0L, 0L, 0L, 0L), 
    NA..15 = c(0L, 19L, 21L, 24L, 17L, 0L), NA..16 = c(18L, 27L, 
    0L, 8L, 0L, 9L), NA..17 = c(0L, 0L, 0L, 0L, 0L, 0L), NA..18 = c(0L, 
    0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-6L))

and

regions <- structure(list(chrom = c("chr1", "chr1", "chr1", "chr1", "chr1", 
"chr1"), chromStart = c(975451L, 1014228L, 1290080L, 1291099L, 
1291742L, 1327977L), chromEnd = c(975952L, 1014729L, 1290581L, 
1291600L, 1292243L, 1328478L), name = c("BRCA_39", "BRCA_55", 
"BRCA_123", "BRCA_124", "BRCA_125", "BRCA_143"), NA. = c(1.878426, 
4.074697, 2.443588, 3.180199, 8.26783, 1.082465), NA..1 = c("3'UTR", 
"3'UTR", "3'UTR", "3'UTR", "3'UTR", "3'UTR"), NA..2 = c(0.6187625, 
0.6287425, 0.6786427, 0.7025948, 0.6407186, 0.6766467), NA..3 = c(0.3812375, 
0.3712575, 0.3213573, 0.2974052, 0.3592814, 0.3233533)), row.names = c(NA, 
-6L), class = "data.frame")

I know that I could use some commands in terminal for this purpose ,like bedtools intersect; however I'm looking for a function in R that does the same; I want to have an output in a matrix format reporting if a varinat falls in any of ranges of regions. I would appreciate your help and suggestions

snp overlap • 869 views
ADD COMMENT
3
Entering edit mode
4.0 years ago

You need to first convert both to a granges object.

library("plyranges")

p1.snp <- as_granges(p1.snp, seqnames=chrom, start=position, end=position)
regions <- as_granges(regions, seqnames=chrom, start=chromStart, end=chromEnd)

You can then find the overlaps. For this I will use the handy function join_overlap_left from plyranges. It will return a GRanges object of the snps with information from any overlapping regions added to each row.

overlaps <- join_overlap_left(p1.snp, regions)

There are no overlaps in your example data.

You can then turn this back into a data.frame if needed.

overlaps <- as.data.frame(overlaps)
ADD COMMENT

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6