Hi all,
I have some RNA-Seq data and I am planning to run DESeq analysis on them and I am facing an error when I am assigning the gene names as row names it says "duplicate row names". I don't want to remove any genes is there any way to work around it?
Below is an example of my data Countdata
gene sample1 sample2 sample3
CCDC7 419 326 360
CNNM1 60 48 22
PAK6 208 200 176
RPP14 50 42 91
IDS 8 11 18
PAK6 702 802 612
CFTR 58 48 40
CNN3 1200 1224 1605
CNNM1 906 989 823
Have tried How To Deal With Duplicate Row Names Error In R way. Tried to take only gene names in separate dataframe and tried to make them as row names to this data frame.
rownames(countdata2) = make.names(countdata, unique = TRUE)
But getting an error saying "Invalid row.names length". Can anyone please guide me through? Thank you very much in advance
You should probably figure out why there are duplicate gene names first. Can you post the code you used to generate the count table?
Probably a transcript-level file has multiple rows for a gene with multiple transcripts. Leave it in the ID of your original file.
Go back and do things right with ensembl IDs. Those are always unique.
row.names(countdata2) <- paste0(countdata$gene, "_", seq_along(countdata$gene))
would give you uniquerow.names
but that would no longer be thegene
names themselves (unless you choose to makecountdata$gene <- paste0(countdata$gene, "_", seq_along(countdata$gene))
also.