Entering edit mode
7.9 years ago
Ana
▴
200
I have a question about doing a for loop in R, I would be very grateful if you could let me know your ideas. I'm working with NGS data, I have calculated r2 values to estimate linkage disequilibrium but I want to calculate LD decay for every single SNP in each contig.
This is the first 3 rows of my data:
scaffold94_798049_802097 999 NA tscaffold94_798049_802097 999 NA 1
tscaffold94_798049_802097 999 NA tscaffold94_798049_802097 1029 NA 1
tscaffold94_798049_50222 2011 NA tscaffold94_798049_802097 1029 NA 1
the first and third column are contig names. How can I make a loop to keep only those rows that the name of first and third columns are identical (means that only those two SNP located on the same contig)?
actually someone gave me the solution: Works perfectly fine
data$keep_dontKeep <- "dontKeep"
for (i in 1:nrow(data)){ if(as.character(data$V1[i]) == as.character(data$V4[i])){ #If values in V1 and V3 are equal, categorize as 'keep' data$keep_dontKeep[i] <- "keep" } }
data <- data[data$keep_dontKeep == "keep",]
TriS's R solution is way more simple and efficient (faster). It is also the recommended way. You do not need to use a for loop in R for subsetting.