I know this has been asked several times but I've tried a lot of solutions as suggested before and they don't work. I keep getting the below error on R (no matter however I modify the csv file) when I run the below
annotation_file <- "Best3_Abicinctus_FunctionalAnnotation.csv"
annotation_info <- read.csv(annotation_file, row.names=1, header=T)
Error in read.table(file=file,header=header,sep=sep,quote=quote, : duplicate 'row.names' are not allowed
I cannot set 'row.names=NULL' as this will screw up the data order for what I intend to do downstream. I even removed blanks/tabs from the end of every row by using sed 's/[[:blank:]]*$//'but the error doesnt go away. I tested replacing commas and spaces in all of the column entries and yet the annoying error doesn't go away. This is how first few lines of the file look like
"gene_id","name","product"
"maker-Contig673-pred_gff_AUGUSTUS-gene-1.6","stk10","Serine/threonine-protein kinase 10"
"maker-Contig204-pred_gff_AUGUSTUS-gene-3.1","ccnh","Cyclin-H"
"maker-Contig31958-pred_gff_AUGUSTUS-gene-0.7","fam136a","Protein FAM136A"
"maker-Contig31340-pred_gff_AUGUSTUS-gene-0.8","h2b","Histone H2B"
The file is available here(dropbox link) on Dropbox in case you would like to take a look. I'm on a deadline and I'm just helplessly stuck at this step. Any help would be highly appreciated.
gene_id has 216 duplicate values. Get rid of duplicate rows.
I also tested removing all duplicates but the error doesn't go away. When I do
awk 'x[$1]++ ==1 {print $1 " is duplicated"}' rmdup_Best3_Abicinctus_FunctionalAnnotation.csv
I dont get anything for this new file which I believe confirms there are no more duplicates on the gene_id columnhow about
sort -k1 test.txt| uniq
What are you doing downstream that requires you to have rownames?
Cross-posted at SO: