Separate column values for summary
0
0
Entering edit mode
8.2 years ago
Jack ▴ 120

So I have 2 files and something peculiar occurred.

My origin file is a list of SELECT genes like this:

A1BG
A2M
A2MP1

And I have a second file with gene synonyms which is like this:

A1BG   A1B;ABG;GAB;HYST2477
A2M    A2MD;CPAMD5

So, if I read the second file in R and do summary() then the output for the synonyms column is each gene individually. like so:

symbol     synonyms
A1BG: 1    A1B: 7
A2M: 1     TRNAL_CAA: 2

This basically means that in the second file, R can tell that the ';' is a separator in the 2nd column.

But when I append to the 1st file the info from the synonyms file and do summary() for the produced file I get this:

symbol     synonyms
A1BG: 1    A1B;ABG;GAB;HYST2477 :1
A2M: 1     A2MD;CPAMD5: 1

I read both files like this:

synonyms file:

df <- read.csv('homo_sapiens_synonyms.csv', header=TRUE, sep='\t')

joined file:

df <- read.csv('synonyms.csv', header=TRUE, sep='\t')

Why R doesn't separate the values in the synonyms column on the joined file?

R summary csv • 1.6k views
ADD COMMENT
0
Entering edit mode

This question seems to be a duplicate of this: Colapse column values to multiple rows for further analysis.

ADD REPLY
0
Entering edit mode

I think this is because you are overwriting the contents of df by second file contents, not appending it.

To do so you have to store both file contents in separate data frames.

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6