NGS data analysis
1
0
Entering edit mode
7.6 years ago
CancerGuy • 0

I have three different excel files from three biological replicates with fold change values (transcriptome analysis). I want to combine all the data in one file, but the problem is the sequence of gene listing in file is different in all three files (eg. gene A is on 1st place in first file but is on 157th and 3045th place in file 2 and 3, respectively). This makes simple copy-paste impossible. Is there any other way to arrange this data? Thanks in advance.

RNA-Seq gene Excel • 2.0k views
ADD COMMENT
0
Entering edit mode

Are those columns sorted by the fold change values? Should it not be simple enough to re-sort the data using the gene name column and then combine the data columns in a new file/workbook?

ADD REPLY
0
Entering edit mode

Some genes are missing in one or other file (due to zero read, i guess), and i have whole transcriptome there. Even after arranging gene name column i found data was not aligned correctly.

ADD REPLY
3
Entering edit mode
ADD REPLY
2
Entering edit mode

Join is good. Merge() in R also works quite well for this kind of thing:

 merge(x, y, by.x='gene' by.y = 'gene', all =T)
ADD REPLY
0
Entering edit mode

I'll try your suggestions. Cheers.

ADD REPLY
0
Entering edit mode

'vlookup' in MS Excel might work for you.

ADD REPLY
1
Entering edit mode
7.6 years ago
AB ▴ 360

You can use dplyr and join them.

If you want to retain all the rows in both datasets, use full join

full_join(x,y,by=c('gene.x'='gene.y')

You can use inner_join, left_join or right_join depending on whatver you want.

ADD COMMENT
0
Entering edit mode

Awesome! That's what I was looking for.

ADD REPLY

Login before adding your answer.

Traffic: 2158 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6