How to define a column containing gene names "text" (instead of "general") when using openxlsx::write.xlsx
1
3
Entering edit mode
5.3 years ago

It is a well-known issue that Excel can inadvertently bungle up gene names unless the format type of the column containing the gene names is set to "text" (see e.g. here and here).

While the solution proposed in the links above works it depends on using Excel's Import Wizard and generally assumes that the files in question are simple tab- or comma-delimited files. However, I would, in fact, like to make use of the capabilities of the openxlsx R package, which allows me to generate fairly complex spreadsheet collections, which is often preferable to dumping an entire folder of individual csv/txt files on a collaborator. Does anyone know how to specify the column type when using the write.xlsx function?

This is the command that I typically use to turn a list of data.frames into a collection of spreadsheets (the names of the data.frames within the list will become the names of the individual sheets in the resulting xslx file.

list_of_data_frames <- list( first_sheet = data.frame(gene = c("a","b","c"), b = 4:6), second = data.frame(X = "a", Y = 5)) 
openxlsx::write.xlsx(list_of_data_frames, file = "filename.xlsx", rowNames = TRUE)
excel genes formatting openxslx R • 3.6k views
ADD COMMENT
0
Entering edit mode

cross-posted on github

ADD REPLY
0
Entering edit mode

I think you should be able to use the writeDataTable function to enforce formatting on the data by setting class(df$Gene) <- "text". See the formatting vignette for an example.

ADD REPLY
0
Entering edit mode

class(df$gene) <- "text" did not change the result. I'd like to stick to write.xlsx, which is basically a wrapper around writeDataTable if I understand correctly.

ADD REPLY
0
Entering edit mode

Have you tried writing paste0("'", gene) instead of just gene? Excel will treat SEPT1 as date but 'SEPT1 as text.

Try:

list_of_data_frames <- list( first_sheet = data.frame(gene = c("a","b","c"), b = 4:6), second = data.frame(X = "a", Y = 5))
list_of_data_frames$first_sheet$gene <- paste0("'", list_of_data_frames$first_sheet$gene)
openxlsx::write.xlsx(list_of_data_frames, file = "filename.xlsx", rowNames = TRUE)
ADD REPLY
0
Entering edit mode

Yes, that's a work-around, I guess, one could add anything to alter those gene names, but it's annoying because I cannot easily read them back in.

That being said, I've noticed that, apparently, the gene names are not changed when I just open the file, so that would be a huge relief and would mean that I don't, in fact, actually have to worry about it. The problem arises if I try to enter another gene name in the same column in Excel, but it won't actually change the values that I dumped in the spreadsheet via write.xlsx.

ADD REPLY
0
Entering edit mode

Excel is not made for bioinformatics, that was clear. R can save your data into excel's "TEXT" (character string), but still excel will overrule it and will change SEPT11 to 11-Sep or 9/11/2019 (or something alike). The only way is still the wizard, so that means only text files. Unfortunately, because it would be nice when your methods would work without this error!

ADD REPLY
0
Entering edit mode

Why does Excel take the Wizard's instructions more seriously than the encoded column type?

ADD REPLY
0
Entering edit mode

I think because microsoft wants people not to think for them selves, but let computers do that for them. When you are someone that has a mind of its own, they will force you to use the wizard...

ADD REPLY
0
Entering edit mode

if it would let me IMPORT xlsx files, that'd be a compromise I could live with, but it won't

ADD REPLY
0
Entering edit mode

Can you tell me for which function you've observed that the gene names were changed in an xlsx file that you generated via R? As you can see in my last comment to Ram's suggestion, I actually realized that Excel seems to honor whatever content I've dumped in the xlsx file via write.xlsx, but it will mercilessly change any new entries that I do via Excel. That's a behavior I can live with since I never plan on actually doing anything in Excel and I can easily accompany each shared Excel file with a note that the data format MUST be manually changed to "text" if a users plans on adding any entries to that column within Excel.

ADD REPLY
0
Entering edit mode

Sorry I didn't want to imply that I was using write.xlsx. I know that excel is doing unasked stuff by default, that's why I commented. I always give .txt files to collaborators, if they lazy load them without the wizard, it is not my fault if you know what I mean ;-) But seems like you have a good solution now?

ADD REPLY
0
Entering edit mode

at least on my machine with my applications it does seem to work as I describe above, yes! (the only caveat being that the user should make sure to manually adapt the column format to ensure that nothing is going to happen upon additional manipulation of the xlsx file)

ADD REPLY
2
Entering edit mode
5.2 years ago
Benn 8.3k

Sorry for the late reaction, but I was using xlsx library, and found it can properly print gene names in a workbook (and I recalled this discussing here).

So if you use xlsx library, this works for me.

library(xlsx)

wb <- createWorkbook()

sheet <- createSheet(wb, "Sheet1")

 rows  <- createRow(sheet, rowIndex=1:2)    

cell.1 <- createCell(rows, colIndex=1)[[1,1]]
cell.2 <- createCell(rows, colIndex=1)[[2,1]]     
setCellValue(cell.1, "Sept11")
setCellValue(cell.2, "March2")

cs1 <- CellStyle(wb, dataFormat=NULL, font=NULL)

setCellStyle(cell.1, cs1)  
setCellStyle(cell.2, cs1)

saveWorkbook(wb, file="test.xlsx")

If I open the file the gene names do not automatically change to dates!

ADD COMMENT
0
Entering edit mode

Thanks for sharing! That behavior is consistent with the openxlsx solution described above, too. Now the question is: what happens when you open your test.xlsx file with Excel and add another entry (e.g. Sept11) in the gene name column?

ADD REPLY
0
Entering edit mode

If you type another notorious symbol, it will be changed to date immediately... But why would your collaborators type another gene symbol?

ADD REPLY
0
Entering edit mode

Because they're not scared of altering XLS files. Anyway, as I detailed above, I'm somewhat ok with that, but, on the other hand, this behavior tells me that the format type of the gene name column is not correctly assigned by either package. Just something to be aware of.

ADD REPLY

Login before adding your answer.

Traffic: 2711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6