Hi,
I have managed to merge two different files using a common column which has the same information using the below function:
Combine_Files <- merge(File1, File2, by="Symbols")
However, I would like to know if its possible to merge two different *.csv files i.e. one file (file 1) with the gene symbols only (one gene per row), and the other file (file 2) with the gene symbols and other annotation columns (many related genes per row separated by comma but one of the gene is common as (file 1). I would like to merge both the files based on the mapping of the gene symbols from file1 and extract other annotations columns from (file 2) in the combined file for further data analysis. I would like to know how this could be done.
For example: File_1
Gene_Symbols
1. GeneA1
2. GeneA2
3. GeneA3
For example: File_2
Gene_Symbols
1. GeneA1, GeneX1, GeneX2, GeneD
2. GeneA2, GeneL1, GeneP2, GeneNA
3. GeneB3, GeneA3, GeneLP1, GeneNA1
Other columns in File_2
Phenotype, GO Ontology, Pathways
Expected Output
Gene_Symbols, Phenotype, GO Ontology, Pathways
Thank you,
Toufiq
"Other columns" are they part of File2? If yes, could you update your example, and provide expected output.
Yes, other columns are part of file 2. File 1 has one column and file 2 has four columns. The expected out should have 4 columns.
Does this example help?
GeneA1,GeneX1,GeneX2,GeneD WO cell processes Pathway1
GeneA2,GeneL1,GeneP2,GeneNA KT inhibition Pathway2
GeneB3,GeneA3,GeneLP1,GeneNA1 WO1 activation Pathway3
GeneA1 WO cell processes Pathway1
GeneA2 KT inhibition Pathway2
GeneA3 WO1 activation Pathway3
Maybe you should use the grep function of R to simply grep the gene symbol of file 1 to that in file 2 and if the grep is not null you paste the colums to that row. You will need a loop, probably (or maybe a clever use of apply).