Question

Removing Duplicate Rows(Gene Names)

1

Entering edit mode

11.1 years ago

robjohn7000 ▴ 110

Hi,

How can I remove duplicate rows from the output of my toptableF? I have tried the following without success:

  if (unique(fit2$genes)){
  Sigenes.F <- topTableF(fit2, number=100, genelist=fit2$genes, adjust.method="bonferroni",
                   sort.by="F")
  }

Thanks

microarray r • 9.7k views

ADD COMMENT • link updated 11.1 years ago by Michael 55k • written 11.1 years ago by robjohn7000 ▴ 110

3

Entering edit mode

I don't think you want to remove duplicated genes, you probably want to keep the best p-value for each gene. Have a look into the aggregate() function

ADD REPLY • link 11.1 years ago by Irsan ★ 7.8k

0

Entering edit mode

Thanks again, Irsan. I was able to get get aggregate() to work. It makes sense in the way you have explained, and not just throwing genes away.

ADD REPLY • link 11.1 years ago by robjohn7000 ▴ 110

2

Entering edit mode

you can do as Michael Dondrup says below, but that might be removing important information. Often, you have duplicated genes in microarray data because there are multiple probes for a single gene. E.g. what if one of them is up-regulated and one is down-regulated (due to alternative transcript use) and you discard one?

ADD REPLY • link 11.1 years ago by brentp 24k

0

Entering edit mode

Is it from the fit2 data frame or the Sigenes.F data frame that you want to remove the duplicates?

ADD REPLY • link 11.1 years ago by Joseph Hughes ★ 3.0k

0

Entering edit mode

From the Sigenes.F data frame. Thanks.

ADD REPLY • link 11.1 years ago by robjohn7000 ▴ 110

score 5 · Accepted Answer · 2013-10-10

5

Entering edit mode

11.1 years ago

Sean Davis 27k

You'll need to pay attention to what you are doing. If you are new to R, you will need to do some reading on R data structures, as each behaves differently with regard to indexing. Just guessing, but you probably want:

SigGenes.F[!duplicated(SigGenes.F$genes),]

ADD COMMENT • link 11.1 years ago by Sean Davis 27k

0

Entering edit mode

You'r absoultely right. This worked for me as well. Many thanks Sean!

ADD REPLY • link 11.1 years ago by robjohn7000 ▴ 110

score 3 · Accepted Answer · 2013-10-10

3

Entering edit mode

11.1 years ago

Michael 55k

Read ?duplicated:

> x = c(1:10,1:10)
> x
 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10
> x[!duplicated(x)]
 [1]  1  2  3  4  5  6  7  8  9 10
>

ADD COMMENT • link 11.1 years ago by Michael 55k

0

Entering edit mode

I used duplicated() this way:

 SigGenes.F[!duplicated(SigGenes.F)]

Got this output:

  Error in `[.data.frame`(SigGenes.F, !duplicated(SigGenes.F)) : 
  undefined columns selected

Then tried:

    SigGenes.F[!duplicated(SigGenes.F[2])]

Output:

      columns 3:7 have been removed

Thanks

ADD REPLY • link 11.1 years ago by robjohn7000 ▴ 110