So I have 2 files and something peculiar occurred.
My origin file is a list of SELECT genes like this:
A1BG
A2M
A2MP1
And I have a second file with gene synonyms which is like this:
A1BG A1B;ABG;GAB;HYST2477
A2M A2MD;CPAMD5
So, if I read the second file in R and do summary() then the output for the synonyms column is each gene individually. like so:
symbol synonyms
A1BG: 1 A1B: 7
A2M: 1 TRNAL_CAA: 2
This basically means that in the second file, R can tell that the ';'
is a separator in the 2nd column.
But when I append to the 1st file the info from the synonyms file and do summary() for the produced file I get this:
symbol synonyms
A1BG: 1 A1B;ABG;GAB;HYST2477 :1
A2M: 1 A2MD;CPAMD5: 1
I read both files like this:
synonyms file:
df <- read.csv('homo_sapiens_synonyms.csv', header=TRUE, sep='\t')
joined file:
df <- read.csv('synonyms.csv', header=TRUE, sep='\t')
Why R doesn't separate the values in the synonyms column on the joined file?
This question seems to be a duplicate of this: Colapse column values to multiple rows for further analysis.
I think this is because you are overwriting the contents of
df
by second file contents, not appending it.To do so you have to store both file contents in separate data frames.