Lost in "if else" statement R
4
1
Entering edit mode
7.7 years ago
Lila M ★ 1.3k

Hi every body! I'm trying to perform an R loop to processes one of my data frame in which I've stored ChiP seq data. My data frame looks like this

chr1    700245  714068  -   13824   uc001abo.3
chr1    934342  935552  -   1211    uc001aci.2
chr1    1189292 1203372 -   14081   uc001adm.4
chr1    1189292 1209234 -   19943   uc001ado.3
chr1    1243994 1247057 +   3064    uc001aed.3

And I want to handle the 2nd and 3rd columns (adding and subtracting some values). To do that, I've create the code as follow:

import_file= read.delim("file", sep="\t", header = F)
file =as.data.frame.matrix(import_file)
#length( file$V6)

for (i in length file$V6)){

if (any( file$V4 == '-')) {
file$V2 =  file$V3-150
file$V3 =  file$V3 +50
 } 
else {   
#if file$V4 == '+' do this
file$V2 =  file$V2-50
file$V3 =  file$V2 +150
   } 
}

The first "if" statement is processed fine, but for "else", the loop repeat the "if" statement for the other condition. What I want to do is

 if (any( file$V4 == '-')) {
    file$V2 =  file$V3-150
    file$V3 =  file$V3 +50
#works

and

 if file$V4 == '+' do this
    file$V2 =  file$V2-50
    file$V3 =  file$V2 +150
#doesn't work

Any help?

Thanks a lot!

ChIP-Seq R if else loop • 2.0k views
ADD COMMENT
0
Entering edit mode

Hi guys, your approaches perform the calculation over the modified data (after adding or subtracting) not over the original data frame (that is exactly what I want) :)

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY (or add this to the original question) to keep threads logically organized.

ADD REPLY
3
Entering edit mode
7.7 years ago

You're not subsetting file inside your for loop, so if(any(file$V4 == '-')) will always be true. Get rid of the any() and only change the ith entry.

Having said that:

idx = which(file$V4 == '-')
file$V2[idx] =  file$V3[idx] - 150
file$V3[idx] =  file$V3[idx] +50
idx = which(file$V4 != '-')
file$V2[idx] =  file$V2[idx] - 50
file$V3[idx] =  file$V2[idx] +150

Alternatively, convert this to a GRanges object and use flank(), which is strand aware.

ADD COMMENT
1
Entering edit mode

You beat me by one second :D

ADD REPLY
0
Entering edit mode

I typed less, that's why :)

ADD REPLY
2
Entering edit mode
7.7 years ago

I think you should use [i] in your if/else. Or I don't understand the purpose of that loop.

import_file= read.delim("file", sep="\t", header = F)
file =as.data.frame.matrix(import_file)
#length( file$V6)

for (i in c(1:length(file$V6))){
  if (( file$V4 [i]== '-')) {
    file$V2[i] =  file$V3[i]-150
    file$V3[i] =  file$V3[i] +50
   } 
  else {   
  #if file$V4[i] == '+' do this
    file$V2[i] =  file$V2[i]-50
    file$V3[i] =  file$V2[i] +150
   } 
}

PS : indentation is your friend.

PPS : This code could be much more efficient without a loop:

file$V2[which(file$V4=='-')]=file$V3[which(file$V4=='-')]-150
file$V3[which(file$V4=='-')]=file$V3[which(file$V4=='-')]+50
...
ADD COMMENT
0
Entering edit mode

Thank you for the advice! The first code that you propose doesn't work at all, the second one is fine for "-" but when I do the same in the "+"

unique_intersect$V2[which(unique_intersect$V4=='+')]=unique_intersect$V2[which(unique_intersect$V4=='+')]-50
unique_intersect$V3[which(unique_intersect$V4=='+')]=unique_intersect$V2[which(unique_intersect$V4=='+')]+150

the result for unique_intersect$V3 is calculated using unique_intersect$V2[which(unique_intersect$V4=='+')]-50 , so the statement doesn't do exactly what I want. Any way, I keep trying!!!

ADD REPLY
1
Entering edit mode

Oh I see, there is a grammatical error (that I copied pasted from your code). for (i in length file$V6)){ should be for (i in c(1:length(file$V6))){. It is now fixed.

Once corrected, the code outputs this : Is it what you want ?

    V1      V2      V3 V4    V5         V6
1 chr1  713918  714118  - 13824 uc001abo.3
2 chr1  935402  935602  -  1211 uc001aci.2
3 chr1 1203222 1203422  - 14081 uc001adm.4
4 chr1 1209084 1209284  - 19943 uc001ado.3
5 chr1 1243944 1244094  +  3064 uc001aed.3
ADD REPLY
1
Entering edit mode
7.7 years ago
zjhzwang ▴ 180

I think you can do it by another way:

library(dplyr)
data <- tbl_df(read.table("file_path", header = F, stringsAsFactors = F))
#
data_1 <- filter(data, V4 == "+")
data_1$V2 =  data_1$V2-50
data_1$V3 =  data_1$V2 +150
#
data_2 <- filter(data, V4 == "-")
data_2$V2 =  data_2$V3 - 150
data_2$V3 =  data_2$V3 + 50
#
result <- rbind(data_1, data_2)
ADD COMMENT
0
Entering edit mode
7.7 years ago
LLTommy ★ 1.2k

I haven't done any R for a while but: Are you sure any() is doing what you expect it to do? I just googled it and it says about any, all: 'Check whether any or all of the elements of a vector are TRUE.' This suggest to me that if one line of the vector (=column in your case) is true, the expression returns true. So that behaviour would make perfectly sense (because you aways have a '-' in that column that you posted). So I suggest that you investigate this and I think you have to change your condition a little bit.

ADD COMMENT
1
Entering edit mode

Ok, while I typed this some other people spotted the same thing. Problem solved I'd say.

ADD REPLY

Login before adding your answer.

Traffic: 1900 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6