i have been working with a gene expression dataset and have observed that in my gene columns there are some of the genes like MAR ,SEPT getting cnverted into march and september . is there a way of getting this right ?please help
i have been working with a gene expression dataset and have observed that in my gene columns there are some of the genes like MAR ,SEPT getting cnverted into march and september . is there a way of getting this right ?please help
Don't use Excel or related for scientific purposes. This is already documented by several papers:
- Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics”, BMC Bioinformatics 2004
- Gene name errors are widespread in the scientific literature. Genome Biology, 2016
Also discussed in a Nature job blog post which references this kludge for those who insist on using Excel:
Escape Excel: A tool for preventing gene symbol and accession conversion errors
and also discussed in this previous Biostars question.
This problem has been well documented for over 15 years now. When will people learn?
As for recovering from this, you could try this web based tool but frankly I would never be entirely confident that the correct identifiers have been restored.
EDIT: This is valid for LibreOffice which is a free software trying to clone Excel, good and bad features included.
i had the data of where the gene is present in the genome i.e the locations and the chromosome number from that i used the genome browser of ucsc and recovered the names of the genes and have copied the entire gene column into a word pad file and applied replace all operation for that particular genes i have placed a quote before the gene name like 'MARCH1 'SEPT1 so on. now the problem is sorted
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
(untested) Why don't you save use something like
awk
to simply quote the gene names like "SEPT" in the original file to save them from being interpreted as dates?but i want the entire name without loosing the number because sept1 sept2 are two different genes
Quote the entire gene name prior to reading into Excel/Libre to protect it from manipulation.
I went to a seminar on time series analysis and one of the first points made was to never open data in software like excel/libre. Do you know R or Python? You could start looking into how to work with tabular data with those languages if this is something you need to do a lot of.