Entering edit mode
8.4 years ago
5594487
•
0
Hello everyone, I am trying to filter the data set "Data2" base on if the "start" value in Data2 fails into the range of the "start" value and the "end" value of Data2. Meanwhile they must be on a same chromosome (the value of the first column "chr"). Here's what I got:
Data 1:
chr start end
chr1 4543784 4543829
chr1 9760745 9760786
chr1 9898702 9898959
chr1 12578847 12578879
chr1 12662062 12662207
chr1 12797766 12798818
..........
chr9 123344149 123345127
chr9 123388337 123389640
chrY 347178 347228
chrY 2876752 2877980
chrY 2886982 2888373
chrY 2890052 2892628
Data 2:
chr start
chr1 3102347
chr1 3111668
chr1 3521852
chr1 3681676
chr1 3801983
chr1 3802020
................
chrY 2891128
chrY 2891544
chrY 2892532
chrY 2892627
chrY 2895794
chrY 2896222
The "chr" value must be the same, so I tried the follows:
Filtered<- Data2[Data2$chr == Data1$chr, Data2$start >= Data1$start, Data2$start <= Data1$end]
Filtered<- Data2[Data2$chr == Data1$chr | Data2$start >= Data1$start | Data2$start <= Data1$end]
Filtered<- subset(Data2, Data2$chr == Data1$chr | Data2$start >= Data1$start | Data2$end <= Data1$end)
None of them work. I started using R since last week so this question may seems silly to many. I have been googling and scratching my head since yesterday. Thank you very much in advance for any advise!
Why not
bedtools intersect
? Here.or at the very least, Bioconductor::GenomicRanges