Hi,
I have 2 files and I want to match the first column in first file with the first column in second file then if they match, I want to pick the corresponding value in 2nd column of 2nd file and add it in a new column in the first file. The problem is, when I match my 2 files I end up with data frames of different lengths because my first file contains gene Ids multiple times but in the second file, I have the corresponding gene name which would of course just occur once. As an example,
File1:
Gene Id value_sample1 value_sample2
x 0.0001 0.00000034
File2:
Gene Id Gene Name
x y
Thanks a lot!!
have a look into the
merge
functionand what do you want to have in the end, if a gene is present more than once in your first file? do you still want to get all its rows or for example only the first one?
Yes I want to get all the rows of the gene in the first file. But add the gene name in a new column. When I match the first file with second I end up with data frames of different lengths and hence I can't
cbind
them. Any advice?are you using R I guess? In R you can use the command
merge
as suggested by @russ_hyde, as long as your data is stored in two data.frames. you can use it like:merge(data.frame1,data.frame2)
if your data.frames have the gene id in the first vector (as in your example)A very quick command with bash would be:
join <(sort file1) <(sort file2) >fileOut