Entering edit mode
8.4 years ago
Azhar
▴
50
I have excel file with 4 spread sheets and each sheet has one column of gene name and corresponding fold change and expression and other values, each column in row number is different so I want to compare these column and get common gene names in two or three or all four sheets with there respective expression and fold-change values?
e.g.,
A foldchange B foldchange C foldchange D foldchange
a 5 a 5 k 9 d 5
c 7 g 10 g 10 g 10
d 5 h 11 h 11 a 5
c 7 h 11
d 5
g 2
h 8
and output I need like
gene name A FC B FC C FC D FC \
a 5 a 5 NA 0 NA 0
Hi m.azhar, do you consider exporting the data into text files and use R or unix tools for this task? I wouldn´t expect much help with doing this in Excel from Biostars, because Excel is not considered a bioinformatics tool by most.
yes thanks for advice can you sugesst some method tool or script how to do this
Please google on how to save an Excel file in CSV format. If you have data in multiple sheets in the workbook, you may have to export each sheet to a separate file. Once this is done, you can read the CSV using any plain text processing tool.
This is not necessarily a good idea, depending on the actual format of the original data. It is very likely that the OP's source data is not actually in "Excel" format, but is already in some flat text format such as TSV, TXT (tab-delimited), CSV (comma-delimited) or XLS (as opposed to XLSX). If this is the case, then opening the files in Excel and then saving them in another format is likely to introduce formatting errors. Excel's built-in auto-correct feature is very prone to mangling gene names and NA values and saving them in formats that make it more difficult to handle later. The OP needs to figure out exactly what format the source data is in by looking at the file extension and attempting to open the files in a raw text viewer such as TextEdit (Mac) or Notepad (Windows). Once this information is obtained, then corresponding data conversion steps can be considered.
True. I was assuming OP had checked for such file association related goof-ups. I think we should, as one of the first steps of setting up a bioinformatics machine, decouple CSV and TSV files from spreadsheet programs and have them open in an advanced text editor, such as TextWrangler or Notepad++.
(I avoid TextEdit and notepad because I cannot trust them to detect EOL conversions, vertical tabs, etc.)