Remove redundant name in a list
1
0
Entering edit mode
9.4 years ago
yasjas ▴ 70
[[1]]

rep_name rep_family gene_name Hepatocytes_B1 Hepatocytes_B3 Huh.7_B1 Huh.7_B2
 HERVL40-int        LTR    ANKIB1         7.2268         7.3056   7.2132   7.5750
HERVL40-int        LTR    ANKIB1         7.2268         7.3056   7.2132   7.5750
HERVL40-int        LTR    ANKIB1         7.2268         7.3056   7.2132   7.5750
HERVL40-int        LTR    ANKIB1         7.2268         7.3056   7.2132   7.5750
HERVL40-int        LTR   PRKAR2B     6.4382         2.2347   7.6774   6.6859
HERVL40-int        LTR     REV3L         6.6961         6.4858   4.1992   4.7723
HERVL40-int        LTR     POMT2         5.6758         5.7517   5.8600   6.1739

[[2]]

 rep_name rep_family gene_name Hepatocytes_B1 Hepatocytes_B3 Huh.7_B1 Huh.7_B2
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
 HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
 HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
 HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
HUERS-P3-int        LTR     MIPEP         4.2390         5.4311   7.9192   5.7850

Hello guys,

I have these lists and I want to keep only the lines that are not repeated. for example some of the gene names appear more than once and I want to count it only once.

Does anyone knows how I can keep only once each gene name and remove the duplicate ones from a list?

Thanks for any suggestion

R • 1.9k views
ADD COMMENT
0
Entering edit mode

Never mind sorry that was a stupid question, ignore it.

ADD REPLY
1
Entering edit mode

Google must have helped you if you spend some time on it. First hit of google is http://stackoverflow.com/questions/13967063/remove-duplicate-rows-in-r. You can find some stuff here on biostars also: Removing Duplicate Rows(Gene Names)

ADD REPLY
0
Entering edit mode
9.4 years ago
michael.ante ★ 3.9k

Hi,

A quick and dirty approach is to use the unique() function for the gene names and loop over them, checking the number of occurrences:


res<-c()
u<-unique(x$gene_name)
for(i in u){
if(length(which(x$gene_name==i))==1){res=rbind(res,x[which(x$gene_name==i),])}
}

Avoid loops in R use Remove redundant name in a list instead.

ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6