Question

common data extraction

0

Entering edit mode

9.0 years ago

Azhar ▴ 50

I have excel file with 4 spread sheets and each sheet has one column of gene name and corresponding fold change and expression and other values, each column in row number is different so I want to compare these column and get common gene names in two or three or all four sheets with there respective expression and fold-change values?

e.g.,

A    foldchange  B foldchange  C foldchange D foldchange 
a       5                 a     5                    k       9         d          5  
c          7               g       10                g       10      g       10  
d          5                h         11            h         11      a     5         
                              c          7                                   h         11
d          5        
g          2
h            8

and output I need like

gene name        A    FC          B    FC      C   FC         D FC \
                        a   5             a      5       NA    0       NA 0

RNA-Seq • 2.6k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 9.0 years ago by Azhar ▴ 50

1

Entering edit mode

Hi m.azhar, do you consider exporting the data into text files and use R or unix tools for this task? I wouldn´t expect much help with doing this in Excel from Biostars, because Excel is not considered a bioinformatics tool by most.

ADD REPLY • link 9.0 years ago by Michael 56k

0

Entering edit mode

yes thanks for advice can you sugesst some method tool or script how to do this

ADD REPLY • link 9.0 years ago by Azhar ▴ 50

0

Entering edit mode

Please google on how to save an Excel file in CSV format. If you have data in multiple sheets in the workbook, you may have to export each sheet to a separate file. Once this is done, you can read the CSV using any plain text processing tool.

ADD REPLY • link 9.0 years ago by Ram 45k

1

Entering edit mode

This is not necessarily a good idea, depending on the actual format of the original data. It is very likely that the OP's source data is not actually in "Excel" format, but is already in some flat text format such as TSV, TXT (tab-delimited), CSV (comma-delimited) or XLS (as opposed to XLSX). If this is the case, then opening the files in Excel and then saving them in another format is likely to introduce formatting errors. Excel's built-in auto-correct feature is very prone to mangling gene names and NA values and saving them in formats that make it more difficult to handle later. The OP needs to figure out exactly what format the source data is in by looking at the file extension and attempting to open the files in a raw text viewer such as TextEdit (Mac) or Notepad (Windows). Once this information is obtained, then corresponding data conversion steps can be considered.

ADD REPLY • link 9.0 years ago by steve ★ 3.5k

0

Entering edit mode

True. I was assuming OP had checked for such file association related goof-ups. I think we should, as one of the first steps of setting up a bioinformatics machine, decouple CSV and TSV files from spreadsheet programs and have them open in an advanced text editor, such as TextWrangler or Notepad++.

(I avoid TextEdit and notepad because I cannot trust them to detect EOL conversions, vertical tabs, etc.)

ADD REPLY • link 9.0 years ago by Ram 45k

score 0 · Answer 1 · 2017-01-17

0

Entering edit mode

8.5 years ago

aarti • 0

You can compare four data sheets in Excel also. All you need is minimum one common column across sheets (in your case gene identifier). You can apply VLOOKUP

ADD COMMENT • link 8.5 years ago by aarti • 0