Hi guys,
I am trying to run DEseq on my count matrix file. However, when I try to import my matrix file using the command:
matrix <- read.delim("matrix.txt", header=T, sep="\t", check.names = FALSE, row.names=1)
I get this error:
Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed
I have searched this site and stackoverflow for solutions and have tried them all. I even ran this command suggested on this site.
cut -f1 matrix.txt | sort | uniq -d.
My output is:
ENST00000221418.9a
ENST00000237696.10a
ENST00000291560.7a
ENST00000291565.9a
ENST00000307365.4a
ENST00000309641.10a
ENST00000377482.10a
ENST00000405924.1a
ENST00000418557.1a
ENST00000445300.1a
I had duplicates but I used a script to add letters at the end of the same gene names to differentiate them, so am confused why I am getting this error. Appreciate any help and sorry to bother you guys.
Thank you for your answer, and they are gene IDs. I am trying to look at Super Enhancers and how there counts change in different stage of progression in breast cancer. I converted the peak coordinates of super enhancers into gene names so I could utilize DEseq. Some super enhancers have the same genes that are closest to it that is why the same gene id appears twice. I added letters at the end to differentiate the name of those genes if they were the same, but am getting the same error, even though now the names should not be duplicate. Appreciate your help.
Check to see which duplicates you have.
This is the output I am getting:
Anything with n > 1 will be a duplicated gene. It looks like you have a bunch of them, so you'll need to figure out where in your workflow they were duplicated.
Sounds good, thank you. Last question, does that output show me all, meaning the 10 that are shown, are those it that are duplicated or are there more?
Adding a filter at the end will let you return a data.frame with all of the duplicated genes.
Thank you so much, appreciate all your time and help.
Please don't add blank lines between code-formatted lines - that makes code hard to read.