How to solve duplicate rownames error on R
2
0
Entering edit mode
3.2 years ago
jbnrodriguez ▴ 30

I know this has been asked several times but I've tried a lot of solutions as suggested before and they don't work. I keep getting the below error on R (no matter however I modify the csv file) when I run the below

annotation_file <- "Best3_Abicinctus_FunctionalAnnotation.csv"
annotation_info <- read.csv(annotation_file, row.names=1, header=T)
Error in read.table(file=file,header=header,sep=sep,quote=quote, : duplicate 'row.names' are not allowed

I cannot set 'row.names=NULL' as this will screw up the data order for what I intend to do downstream. I even removed blanks/tabs from the end of every row by using sed 's/[[:blank:]]*$//'but the error doesnt go away. I tested replacing commas and spaces in all of the column entries and yet the annoying error doesn't go away. This is how first few lines of the file look like

"gene_id","name","product"
"maker-Contig673-pred_gff_AUGUSTUS-gene-1.6","stk10","Serine/threonine-protein kinase 10"
"maker-Contig204-pred_gff_AUGUSTUS-gene-3.1","ccnh","Cyclin-H"
"maker-Contig31958-pred_gff_AUGUSTUS-gene-0.7","fam136a","Protein FAM136A"
"maker-Contig31340-pred_gff_AUGUSTUS-gene-0.8","h2b","Histone H2B"

The file is available here(dropbox link) on Dropbox in case you would like to take a look. I'm on a deadline and I'm just helplessly stuck at this step. Any help would be highly appreciated.

R • 4.6k views
ADD COMMENT
0
Entering edit mode

gene_id has 216 duplicate values. Get rid of duplicate rows.

ADD REPLY
0
Entering edit mode

I also tested removing all duplicates but the error doesn't go away. When I do awk 'x[$1]++ ==1 {print $1 " is duplicated"}' rmdup_Best3_Abicinctus_FunctionalAnnotation.csv I dont get anything for this new file which I believe confirms there are no more duplicates on the gene_id column

ADD REPLY
0
Entering edit mode

how about sort -k1 test.txt| uniq

ADD REPLY
0
Entering edit mode

What are you doing downstream that requires you to have rownames?

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
3.2 years ago
Lisa Ha ▴ 120

The awk code only recognizes fully duplicated rows, not all rows where the gene_id is duplicated. The file still contains duplicates, which is why you still get the error message. If you want to remove all gene_id duplicates, the following code works, but you will miss out on information when the gene_id is predicted to produce genes with different names/products (i.e. maker-Contig29174-pred_gff_AUGUSTUS-gene-0.4 has two name entries, unknown and dok2).

library(tidyverse)
annotation_info <- read.csv(annotation_file, header=T)
uniqAnnoInfo <- annotation_info %>% distinct(gene_id, .keep_all = TRUE) 
rownames(uniqAnnoInfo) <- uniqAnnoInfo$gene_id
ADD COMMENT
0
Entering edit mode

Thank you @Lisa Ha; your insight helped me to correctly identify the issue with my file

ADD REPLY
0
Entering edit mode
3.2 years ago

Use make.unique which will add .1 etc... to the names.

annotation_info <- read.table('./test.csv',row.names=1,sep=',',header=T)
Error in read.table("./test.csv", row.names = 1, sep = ",") : 
  duplicate 'row.names' are not allowed

annotation_info <- read.table('./test.csv',sep=',',header=T)
row.names(annotation_info) <- make.unique(annotation_info[,1])
annotation_info[,1] <- NULL
ADD COMMENT
0
Entering edit mode

Although I have no idea if you need the row.names to be exact matches at some point - so this could break downstream...

ADD REPLY
0
Entering edit mode

Thanks you @benformatics but this won't work as I intend to do the following downstream and so I need exact matches to the gene_id info on sig_de_results (my list of significantly expressed genes from DESeq2 which contain the gene_id info in the first column)

sig_de_annotations <- annotation_info[rownames(sig_de_results),] 
sig_de_results <- cbind(sig_de_annotations, as.data.frame(sig_de_results)) write.csv(sig_de_results, row.names=T, file="DEGlist_Deformed_vs_Healthy.csv",)
ADD REPLY

Login before adding your answer.

Traffic: 2505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6