Removing Duplicate Rows(Gene Names)
2
1
Entering edit mode
11.1 years ago
robjohn7000 ▴ 110

Hi,

How can I remove duplicate rows from the output of my toptableF? I have tried the following without success:

  if (unique(fit2$genes)){
  Sigenes.F <- topTableF(fit2, number=100, genelist=fit2$genes, adjust.method="bonferroni",
                   sort.by="F")
  }

Thanks

microarray r • 9.7k views
ADD COMMENT
3
Entering edit mode

I don't think you want to remove duplicated genes, you probably want to keep the best p-value for each gene. Have a look into the aggregate() function

ADD REPLY
0
Entering edit mode

Thanks again, Irsan. I was able to get get aggregate() to work. It makes sense in the way you have explained, and not just throwing genes away.

ADD REPLY
2
Entering edit mode

you can do as Michael Dondrup says below, but that might be removing important information. Often, you have duplicated genes in microarray data because there are multiple probes for a single gene. E.g. what if one of them is up-regulated and one is down-regulated (due to alternative transcript use) and you discard one?

ADD REPLY
0
Entering edit mode

Is it from the fit2 data frame or the Sigenes.F data frame that you want to remove the duplicates?

ADD REPLY
0
Entering edit mode

From the Sigenes.F data frame. Thanks.

ADD REPLY
5
Entering edit mode
11.1 years ago

You'll need to pay attention to what you are doing. If you are new to R, you will need to do some reading on R data structures, as each behaves differently with regard to indexing. Just guessing, but you probably want:

SigGenes.F[!duplicated(SigGenes.F$genes),]
ADD COMMENT
0
Entering edit mode

You'r absoultely right. This worked for me as well. Many thanks Sean!

ADD REPLY
3
Entering edit mode
11.1 years ago
Michael 55k

Read ?duplicated:

> x = c(1:10,1:10)
> x
 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10
> x[!duplicated(x)]
 [1]  1  2  3  4  5  6  7  8  9 10
>
ADD COMMENT
0
Entering edit mode

I used duplicated() this way:

 SigGenes.F[!duplicated(SigGenes.F)]

Got this output:

  Error in `[.data.frame`(SigGenes.F, !duplicated(SigGenes.F)) : 
  undefined columns selected

Then tried:

    SigGenes.F[!duplicated(SigGenes.F[2])]

Output:

      columns 3:7 have been removed

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6