How to avoid conversion of gene symbols to date format in Excel
6
7
Entering edit mode
8.2 years ago
Mike ★ 1.9k

Hi All,

Apologies for the technical/irrelevant question.

Some gene names start with APR/MARC/SEPT* etc default converted into date format.

How to avoid this automatics conversion of gene symbols to date in Excel.

Thanks..

gene • 13k views
ADD COMMENT
2
Entering edit mode

Well avoiding excel would work. Look here: "Gene name errors are widespread in the scientific literature"

ADD REPLY
1
Entering edit mode

Hi, This is a useful video to resolve the problem.

ADD REPLY
0
Entering edit mode

I know this is not the answer to the question that you are asking, but I agree with the other guys. Should you work with genes on a regular basis or will with with data in general in the future, consider putting the effort in to learn R/Matlab/Python - at least one of them and at least a bit. It's worth it. (If this is just, let's say, a student project and you know that this is a one time thing, than this might be another story but .... well )

ADD REPLY
7
Entering edit mode

I would argue that for most, Excel is unfortunately unavoidable - yes, you can personally avoid it, but as soon as you send your tab-delimited table to someone else to look at - science is collaborative after all - they will open it in Excel.

The mangling of gene names by Excel is an entirely avoidable, user-mediated problem. Use your advantage (knowing how computers work) to prevent others making this mistake, not to sanctimoniously point out their inevitable error when they make it.

ADD REPLY
1
Entering edit mode

The problem has been known for at least 12 years. And I agree it can be avoided on a case by case basis, but not generally. That means, that whatever precautions are taken, eventually the same errors will be introduced, the past experience shows that this is going to happen over an over again.

The mangling of gene names in Excel is imo an archetype of bad design, where the software tries to be 'smart' in an unexpected way (even called 'Standard') upon import of text data. The easiest way would be to turn of the 'Standard' import type in Excel completely, but this is not possible, and therefore will inevitably lead to these errors.

There are enough alternatives to Excel for dealing with spreadsheets, like LibreOffice, or Google docs, or R-studio and none of them has this behavior.

ADD REPLY
1
Entering edit mode

@ LLTommy , thanks, but this is not one time thing, I am bioinformatician, and I have lots of experience with Unix /R/Perl, so for me this is not problem, But problem is that when I send my results to my PI or someone else/Experimental people, they always open in Excel or want results in Excel.

ADD REPLY
2
Entering edit mode

But problem is that when I send my results to my PI or someone else/Experimental people, they always open in Excel or want results in Excel.

Then include explicit instructions NOT to use Excel, explain that Excel will eff-up their data, and suggest an alternative tool. A lot of headaches can be avoided by anticipation of the errors that "experimental people" are likely to make. And it's our responsibility as bioinformaticists to minimize those errors, instead of compounding them by being intentionally obtuse and/or not sharing our expertise.

Excel is an unavoidable evil only if we allow it. Turn away from the dark side! :-).

ADD REPLY
1
Entering edit mode

I admire your idealism @harold.smith.tarheel, but >10 years of experience has taught me that this will not work. You'll pry Excel from their cold, dead hands.

ADD REPLY
0
Entering edit mode

I'm actually pretty cynical, and my experience is similar to yours. But there's a difference between using Excel out of ignorance vs intentionally disregarding instructions to the contrary. The latter users deserve to have their data mangled.

ADD REPLY
0
Entering edit mode

agree with @Simon Cockell

ADD REPLY
0
Entering edit mode

Fair enough, I just assumed that you are working with Excel yourself. And yes, if others open/send you data in that data format, you can not simply work around as I mentioned by not using Excel. So well, I don't know if there is really a solution to this or if you constantly have to work around it (telling people not to use Excel, double check the data that you receive from people to make sure Excel did not do funny things, and I don't know)

Having sad all that. Excel (or alternatives) are great programs - for what they were made for. The problem is that people just put everything in there, EVERYTHING. And that is not only the case in Biology/Bioinformatics but in many many other fields. Believe me, I know. And it causes a lot of problems there as well. This is kind of an old story, but if you have some minutes and don't know it, it's 'fun' and reminds you that we are not the only ones battling with Excel.

ADD REPLY
0
Entering edit mode

Thanks all for your comments and suggestions.

Final conclusion is that "Excel is an unavoidable Evil"

ADD REPLY
13
Entering edit mode
8.2 years ago

If we assume that Excel is an unavoidable evil (and for most this is unfortunately a credible assumption)...

When importing your delimited data into Excel, be sure to set the column type appropriately. The default for all columns is 'General' - this will lead to the inevitable auto-correct mangling. Gene name columns should be set to 'Text', then the septins etc will be left unmolested.

Once mangling has occurred, correction needs to be approached carefully - the main issue with what Excel does here is that it changes the underlying data, rather than only changing things in the display layer. This means your gene names are turned into floats (which is Excel's underlying representation of dates). Of course, Excel being Excel, the float used is different depending on the platform: https://support.microsoft.com/en-us/kb/214330 (also some versions of Excel think 1900 was a leap year, others know better - this is a very deep rabbit hole...). You could conceivably write some code to handle the conversion of dates back to gene names, but it would have to consider these caveats and edge cases veeeerrrry carefully.

All in all, I agree with Michael (although accept that this is not always possible).

ADD COMMENT
4
Entering edit mode
8.2 years ago
Michael 55k

Avoiding use of Excel is a way (and possibly the only one) to consistently solve this problem.

ADD COMMENT
4
Entering edit mode
8.2 years ago

While pasting, use import from text wizard in excel and in Data Format section choose the gene column as "Text" datatype.

ADD COMMENT
0
Entering edit mode

Thanks, it worked!

ADD REPLY
2
Entering edit mode
8.2 years ago

From what I read, no conversion happens in google spreadsheets.

ADD COMMENT
0
Entering edit mode

Google Spreadsheets is not a safe space for gene lists, says @ryanlayer.

ADD REPLY
1
Entering edit mode

I believe (but am not 100% certain) that Google sheets did not suffer from the auto-conversion issue when this thread was started, but now it does. The "trick", as with Excel, is to define the column variable type of course. Unfortunately, getting people to think about the nature of their data in a prevalent "point and click" culture remains challenging.

ADD REPLY
1
Entering edit mode
8.2 years ago
H.Hasani ▴ 990

Please read this paper before making any efforts and then decide for yourself!

Gene name errors are widespread in the scientific literature

ADD COMMENT
0
Entering edit mode
4.2 years ago
urmi-21 • 0

Pretty late reply to the OP, but hopefully, my answer can help new people looking for an easy way to import gene lists into excel. I created this simple Java app to convert/merge gene lists. https://github.com/urmi-21/csvtoxl

ADD COMMENT
0
Entering edit mode

Great job with the app. Just a heads up - HGNC recently addressed this problem: https://www.nature.com/articles/s41588-020-0669-3

Symbols that affect data handling and retrieval. For example, all symbols that autoconverted to dates in Microsoft Excel have been changed (for example, SEPT1 is now SEPTIN1; MARCH1 is now MARCHF1); tRNA synthetase symbols that were also common words have been changed (for example, WARS is now WARS1; CARS is now CARS1).

ADD REPLY

Login before adding your answer.

Traffic: 1529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6