I have two files A and B. I want to look for rows that overlap between these two files and retrieve only those rows from file A into a separate file altogether. Could anyone help me with a bunch of codes in R to perform this.
I cannot manipulate my data in excel since it is too large and I'm still new to R.
Thank you in anticipation for your help
the answer is bedtools: http://bedtools.readthedocs.org/en/latest/content/tools/intersect.html
What do you mean by "overlap"? Could you post a couple of example rows from file A and file B and show how what you want to have happen?
Okay here's the question again
file A:
File B
I would like A,C and F (as they overlap between the two files) into a separate file.
New file:
Hope this makes better sense.
Post this as comment to you original question and see answer by Ido Tamir: A: Code for looking for overlaps.
Now its nice.
merge function will do it for you now.
Here,
colnames
is the columns name, on the basis of which, tables are to be merged.For more help: https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html
Using
%in%
may be faster than merge.If you're really doing it based on the letters and they are unique, you could use a command line:
Just be aware that this will not work in most (more complex) cases. For example, if 'A' is in the file, you will also get genes 'AA' 'AB' 'CAG' and anything containing A. In your 'fileB', if you can add more regex info, it can help. For example:
Should be:
This will return only gene 'A' since it specifies that the letter 'A' must happen RIGHT after the beginning of the line (
^
) and must immediately be followed by a tabulation (\t
).