Hey, on forums like this, it is usually a good idea to provide code that can produce your input data. Then, people will have less time to spend in tackling your problem.
To do what you want, you should use GenomicRanges:
create your data
SNPs <- data.frame(
Chr = c(1,1,2,2,3,3,4),
POS = c(2,3,4,7,5,13,13))
SNPs
Chr POS
1 1 2
2 1 3
3 2 4
4 2 7
5 3 5
6 3 13
7 4 13
genes <- data.frame(
Chr = c(1,2,2,3,3,4),
start = c(2,3,6,4,11,14),
end = c(3,5,10,10,17,17),
id = c('a','b','c','d','e','f'))
genes
Chr start end id
1 1 2 3 a
2 2 3 5 b
3 2 6 10 c
4 3 4 10 d
5 3 11 17 e
6 4 14 17 f
convert the data to GRanges objects
require(GenomicRanges)
grSNPs <- makeGRangesFromDataFrame(
SNPs,
seqnames.field = 'Chr',
start.field = 'POS',
end.field = 'POS',
keep.extra.columns = TRUE)
grgenes <- makeGRangesFromDataFrame(
genes,
seqnames.field = 'Chr',
start.field = 'start',
end.field = 'end',
keep.extra.columns = TRUE)
find overlaps and save the indices that overlapped
qHits <- queryHits(findOverlaps(query = grSNPs, subject = grgenes, type = 'within'))
subHits <- subjectHits(findOverlaps(query = grSNPs, subject = grgenes, type = 'within'))
use indices to produce the original data for the overlaps
overlaps <- data.frame(SNPs[qHits,], genes[subHits,'id'])
colnames(overlaps) <- c('Chr','POS','id')
now find those that didn't overlap
missing <- SNPs[-qHits,]
nmissing <- nrow(missing)
missing <- data.frame(missing, rep(NA, nmissing))
colnames(missing) <- c('Chr','POS','id')
rbind overlapping regions and non-overlapping
finalresult <- rbind(overlaps, missing)
finalresult
Chr POS id
1 1 2 a
2 1 3 a
3 2 4 b
4 2 7 c
5 3 5 d
6 3 13 e
7 4 13 <NA>
thank you very~ much for ur answer!
Please up-vote any answers that have helped.