It is a well-known issue that Excel can inadvertently bungle up gene names unless the format type of the column containing the gene names is set to "text" (see e.g. here and here).
While the solution proposed in the links above works it depends on using Excel's Import Wizard and generally assumes that the files in question are simple tab- or comma-delimited files. However, I would, in fact, like to make use of the capabilities of the openxlsx R package, which allows me to generate fairly complex spreadsheet collections, which is often preferable to dumping an entire folder of individual csv/txt files on a collaborator. Does anyone know how to specify the column type when using the write.xlsx
function?
This is the command that I typically use to turn a list of data.frames into a collection of spreadsheets (the names of the data.frames within the list will become the names of the individual sheets in the resulting xslx file.
list_of_data_frames <- list( first_sheet = data.frame(gene = c("a","b","c"), b = 4:6), second = data.frame(X = "a", Y = 5))
openxlsx::write.xlsx(list_of_data_frames, file = "filename.xlsx", rowNames = TRUE)
cross-posted on github
I think you should be able to use the
writeDataTable
function to enforce formatting on the data by settingclass(df$Gene) <- "text"
. See the formatting vignette for an example.class(df$gene) <- "text"
did not change the result. I'd like to stick towrite.xlsx
, which is basically a wrapper aroundwriteDataTable
if I understand correctly.Have you tried writing
paste0("'", gene)
instead of justgene
? Excel will treatSEPT1
as date but'SEPT1
as text.Try:
Yes, that's a work-around, I guess, one could add anything to alter those gene names, but it's annoying because I cannot easily read them back in.
That being said, I've noticed that, apparently, the gene names are not changed when I just open the file, so that would be a huge relief and would mean that I don't, in fact, actually have to worry about it. The problem arises if I try to enter another gene name in the same column in Excel, but it won't actually change the values that I dumped in the spreadsheet via
write.xlsx
.Excel is not made for bioinformatics, that was clear. R can save your data into excel's "TEXT" (character string), but still excel will overrule it and will change SEPT11 to 11-Sep or 9/11/2019 (or something alike). The only way is still the wizard, so that means only text files. Unfortunately, because it would be nice when your methods would work without this error!
Why does Excel take the Wizard's instructions more seriously than the encoded column type?
I think because microsoft wants people not to think for them selves, but let computers do that for them. When you are someone that has a mind of its own, they will force you to use the wizard...
if it would let me IMPORT xlsx files, that'd be a compromise I could live with, but it won't
Can you tell me for which function you've observed that the gene names were changed in an xlsx file that you generated via R? As you can see in my last comment to Ram's suggestion, I actually realized that Excel seems to honor whatever content I've dumped in the xlsx file via
write.xlsx
, but it will mercilessly change any new entries that I do via Excel. That's a behavior I can live with since I never plan on actually doing anything in Excel and I can easily accompany each shared Excel file with a note that the data format MUST be manually changed to "text" if a users plans on adding any entries to that column within Excel.Sorry I didn't want to imply that I was using
write.xlsx
. I know that excel is doing unasked stuff by default, that's why I commented. I always give .txt files to collaborators, if they lazy load them without the wizard, it is not my fault if you know what I mean ;-) But seems like you have a good solution now?at least on my machine with my applications it does seem to work as I describe above, yes! (the only caveat being that the user should make sure to manually adapt the column format to ensure that nothing is going to happen upon additional manipulation of the xlsx file)