Question

Remove redundant name in a list

0

Entering edit mode

10.2 years ago

yasjas ▴ 70

[[1]]

rep_name rep_family gene_name Hepatocytes_B1 Hepatocytes_B3 Huh.7_B1 Huh.7_B2
 HERVL40-int        LTR    ANKIB1         7.2268         7.3056   7.2132   7.5750
HERVL40-int        LTR    ANKIB1         7.2268         7.3056   7.2132   7.5750
HERVL40-int        LTR    ANKIB1         7.2268         7.3056   7.2132   7.5750
HERVL40-int        LTR    ANKIB1         7.2268         7.3056   7.2132   7.5750
HERVL40-int        LTR   PRKAR2B     6.4382         2.2347   7.6774   6.6859
HERVL40-int        LTR     REV3L         6.6961         6.4858   4.1992   4.7723
HERVL40-int        LTR     POMT2         5.6758         5.7517   5.8600   6.1739

[[2]]

 rep_name rep_family gene_name Hepatocytes_B1 Hepatocytes_B3 Huh.7_B1 Huh.7_B2
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
 HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR      ODZ1         5.6166         2.8973   1.5077   0.5965
HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
 HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
 HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
HUERS-P3-int        LTR     CNTN1         2.2008         1.1640   1.3469   1.6292
HUERS-P3-int        LTR     MIPEP         4.2390         5.4311   7.9192   5.7850

Hello guys,

I have these lists and I want to keep only the lines that are not repeated. for example some of the gene names appear more than once and I want to count it only once.

Does anyone knows how I can keep only once each gene name and remove the duplicate ones from a list?

Thanks for any suggestion

R • 2.2k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by yasjas ▴ 70

0

Entering edit mode

Never mind sorry that was a stupid question, ignore it.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by yasjas ▴ 70

1

Entering edit mode

Google must have helped you if you spend some time on it. First hit of google is http://stackoverflow.com/questions/13967063/remove-duplicate-rows-in-r. You can find some stuff here on biostars also: Removing Duplicate Rows(Gene Names)

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by venu 7.1k

Ram · Answer 1 · 2015-08-11

0

Entering edit mode

10.2 years ago

michael.ante ★ 4.0k

Hi,

A quick and dirty approach is to use the unique() function for the gene names and loop over them, checking the number of occurrences:


res<-c()
u<-unique(x$gene_name)
for(i in u){
if(length(which(x$gene_name==i))==1){res=rbind(res,x[which(x$gene_name==i),])}
}

Avoid loops in R use Remove redundant name in a list instead.

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by michael.ante ★ 4.0k