Covert Many Lines In A Specific Line
1
0
Entering edit mode
11.0 years ago
viniciushs88 ▴ 50

0 down vote favorite

I would like to transform this data:

Sample  Genotype  Region
sample1    A      Region1
sample1    B      Region1
sample1    A      Region1
sample2    A      Region1
sample2    A      Region1
sample3    A      Region1
sample4    B      Region1

In that format:

Sample  Genotype  Region   
sample1    E      Region1
sample2    A      Region1
sample3    A      Region1
sample4    B      Region1

I wanna to tag excluded (E) in "Genotype" column in an unified line to samples with more than one genotype (sample1) and just unify lines to samples with genotype repeated in two lines (sample2). I have one list with many regions (Region1 - Regionx). It is possible to do in R software? Thanks a lot.

r • 2.7k views
ADD COMMENT
1
Entering edit mode
11.0 years ago

Given the above in a data.frame called d:

d2 <- unique(d) #Collapse duplicates, e.g., "sample2"
d2$Genotype <- factor(d2$Genotype, levels=c(levels(d2$Genotype), "E")) #Add a level "E" to Genotype
d2[duplicated(d2$Sample),2] <- "E" #Label "E" lines
d2 <- d2[-duplicated(d2$Sample, fromLast=T)==F,] #Remove the non-labeled "E" lines that should still be excluded
ADD COMMENT

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6