Hi all,
I'm facing a very annoying error in R while assigning row names to my data matrix. I have some RNA-seq data that I'm considering clustering in R. I'm using gene names as row names for my expression matrix but it keeps reporting that there are duplicate names. Some un-annotated genes have been assigned with some IDs that start with numbers. I don't understand how to deal with this error? Is there a way to work around it? because I cant change the gene names.
EDIT:
gene sample1 sample2 sample3
Mar-01 4.19504 3.9006 4.15683
Mar-02 3.0554 3.4261 3.76675
un_A_2 1.1515 1.2455 0.563484
un_A_3 98.2504 120.341 101.753
ENSGALG00000008227 39.6383 12.8651 38.2281
ENSGALG00000008242 5.71557 7.79314 9.40917
ENSGALG00000008277 24.6231 28.3207 24.9288
CNN3 141.708 134.476 144.514
CNNM1 0.840218 0.963683 0.619086
CNNM2 16.0282 12.1301 12.4665
Many thanks.
Gene names - "Mar-01, Mar-02" seems like copy paste from Excel, watch out! http://nsaunders.wordpress.com/2012/10/22/gene-name-errors-and-excel-lessons-not-learned/ http://www.biomedcentral.com/1471-2105/5/80