Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed
1
0
Entering edit mode
4.0 years ago
mropri ▴ 160

Hi guys,

I am trying to run DEseq on my count matrix file. However, when I try to import my matrix file using the command:

matrix <- read.delim("matrix.txt", header=T, sep="\t", check.names = FALSE, row.names=1)

I get this error:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  :    duplicate 'row.names' are not allowed

I have searched this site and stackoverflow for solutions and have tried them all. I even ran this command suggested on this site.

cut -f1 matrix.txt | sort | uniq -d.

My output is:

ENST00000221418.9a  
ENST00000237696.10a
ENST00000291560.7a
ENST00000291565.9a
ENST00000307365.4a
ENST00000309641.10a
ENST00000377482.10a
ENST00000405924.1a
ENST00000418557.1a
ENST00000445300.1a

I had duplicates but I used a script to add letters at the end of the same gene names to differentiate them, so am confused why I am getting this error. Appreciate any help and sorry to bother you guys.

RNA-Seq R • 3.2k views
ADD COMMENT
0
Entering edit mode
4.0 years ago

First, I don't think that DESeq is designed to work with transcripts. It's designed to work with genes.

Don't you have the answer right there? Those are the transcript IDs that are repeated in your file. Figure out why they are in there twice.

ADD COMMENT
0
Entering edit mode

Thank you for your answer, and they are gene IDs. I am trying to look at Super Enhancers and how there counts change in different stage of progression in breast cancer. I converted the peak coordinates of super enhancers into gene names so I could utilize DEseq. Some super enhancers have the same genes that are closest to it that is why the same gene id appears twice. I added letters at the end to differentiate the name of those genes if they were the same, but am getting the same error, even though now the names should not be duplicate. Appreciate your help.

ADD REPLY
1
Entering edit mode

Check to see which duplicates you have.

library("tidyverse")

df <- read_tsv("matrix.txt")

df %>%
  rename(gene=1) %>%
  count(gene, sort=TRUE)
ADD REPLY
0
Entering edit mode

This is the output I am getting:

> df <- read_tsv("matrix.txt")
Parsed with column specification:
cols(
  Gene = col_character(),
  `10A` = col_double(),
  AT1 = col_double(),
  DCIS = col_double(),
  CA1 = col_double()
)

> df %>%
+   rename(gene=1) %>%
+   count(gene, sort=TRUE)
# A tibble: 2,286 x 2
   gene                    n
   <chr>               <int>
 1 ENST00000221418.9a      2
 2 ENST00000237696.10a     2
 3 ENST00000291560.7a      2
 4 ENST00000291565.9a      2
 5 ENST00000307365.4a      2
 6 ENST00000309641.10a     2
 7 ENST00000377482.10a     2
 8 ENST00000405924.1a      2
 9 ENST00000418557.1a      2
10 ENST00000445300.1a      2
# ... with 2,276 more rows
Warning message:
`...` is not empty.
We detected these problematic arguments:
* `needs_dots`
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
ADD REPLY
1
Entering edit mode

Anything with n > 1 will be a duplicated gene. It looks like you have a bunch of them, so you'll need to figure out where in your workflow they were duplicated.

ADD REPLY
0
Entering edit mode

Sounds good, thank you. Last question, does that output show me all, meaning the 10 that are shown, are those it that are duplicated or are there more?

ADD REPLY
1
Entering edit mode

Adding a filter at the end will let you return a data.frame with all of the duplicated genes.

df %>%
  rename(gene=1) %>%
  count(gene, sort=TRUE) %>%
  filter(n > 1)
ADD REPLY
1
Entering edit mode

Thank you so much, appreciate all your time and help.

ADD REPLY
1
Entering edit mode

Please don't add blank lines between code-formatted lines - that makes code hard to read.

ADD REPLY

Login before adding your answer.

Traffic: 1598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6