Hi everyone,
I have two unequal data sets.
A data set
V1 V2 V3
6 42721754 42721769
6 42721757 42721772
6 42721760 42721775
6 42721763 42721778
6 42721766 42721781
6 42721769 42721784
6 42721772 42721787
6 42721775 42721790
B data set
V2 AF
42721757 0.003067485
42721760 0.006134969
42721763 0.006134969
42721766 0.003067485
42721769 0.006134969
42721772 0.006134969
42721775 0.003067485
42721778 0.006134969
42721781 0.003067485
42721784 0.003067485
42721787 0.009202454
42721790 0.009202454
I want to check if the value of V2
in B data set
falls between the values of V2
and V3
of A data set
. And when its true, I want value of AF
in B data set
to be added as a new column in A dataset
.
There are two important points.
1. B data set
is 80 rows while A data set
is 6000 rows.
2. value of V2
in B
may be repeated, which shouldn't get thrown out in final output.
an urgent help with be appreciated.
Thanks,
If you can convert "B data set" as bed file, you could use bedtools intersect.
Thanks Rashedul, however bedtools might not work because of redundancy.
Thanks Kevin,
That was quick. I tried but it didn't work for me, that's weird. I have exact same data sets and i copied your code. It didn't reproduce the product you have shown. Can you please take another look? Your all assumptions are correct except one thing I would like to clarify that I also want to use 'less than or equal to' and 'more than or equal to' options here. So, will
>=
and<=
work?Thanks a bunch.
Hey, which system are you using? - Mac? Are you sure that your files are tab-delimited
I am using RedHat Enterprise Linux 7 and yes the files are tab -delimited. Will it work on mac? Thanks for the help again. Really appreciate and looking forward to solve this problem.
Hey Kevin, I am trying to do it in R as well but I think there is a common problem which is causing this not to work with real dataset. My apologies. With example data it works on Mac but its not working with real dataset. I realized that the real dataset has many regions in
A.txt
which doesn't overlap with theB.txt
. Please note that in real dataset,A.txt
has about 10,000 rows whileB.txt
has about 80 rows. Can we introduce NA or zero for the entries which are present inA.txt
but absent inB.txt
? Thanks for your help.You just accepted our answers? - is the issue resolved?
I accepted answer for the example data given. However, I am still struggling with the real data.
Feel free to ask a new question with example data that is more representative to your real data, and provide expected output.
This is called "merge on overlap", see this related StackOverflow post: