How to match location of string
2
1
Entering edit mode
5.6 years ago

Hi,

I am trying to match the location of the list of string like this

library(seqinr)

at <- ("ATATATAT")
s1 <-ifelse(at[8]=="T"||"A" && at[7]=="A"||"T" &&
            at[6]=="T"||"A",5,
            ifelse(at[2]=="T"||"A" && at[4]=="A"||"T" &&
                     at[1]=="T"||"A",'1','0'
            ))
s1

It works fine only for one sequence. I tried it in a for loop but getting error like

invalid 'x' type in 'x && y'

Any help is much appreciated Thanks

R seqinr • 1.8k views
ADD COMMENT
0
Entering edit mode

This is a Question, not a Page, please be careful when selecting the post type.

What is s2c()? From which package?

If the code above works but the loop doesn't, you should show the loop as well, and provide an example dataset to replicate the failure.

ADD REPLY
1
Entering edit mode

This looks like code directly translated from Excel functions. Surely there must be better, more efficient ways to achieve OP's goals.

ADD REPLY
1
Entering edit mode

I think this is the s2c function OP is using. Also, how is a[8] == "T"||"A" even proper R syntax? the "T" || "A" will throw an error. Pretty sure OP's code doesn't work as-is at the moment.

ADD REPLY
0
Entering edit mode

Can you describe what you're trying to achieve and what a actually looks like (i.e., the result of s2c(at)).

EDIT: and what the final for-loop is supposed to achieve.

I promise, if you describe your question properly (i.e. what exactly should be the end result?) there's going to be a more robust way of doing that in R.

ADD REPLY
0
Entering edit mode

that "a" I was using for next coding step; not the part of this analysis.

ADD REPLY
0
Entering edit mode

Thanks, everyone for reply.

Let me correct my question to make it easy to understand

I have a list of sequences like this in 2nd column of a csv file.

        Seq
>1_seq     ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACAG
>2_seq     ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACCC
>3_seql    ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACTT
>4_seql    ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACAG

I want to match the position of each sequences w.r.t. each other For example, if A or T is present in "11th or 17" location of each sequence then return 1 else 0.

Thanks in advance

ADD REPLY
1
Entering edit mode

That seems to be partially from multiple sequence alignment, Either way, you might benefit from creating a 2D matrix with each column a base position and each row a sequence, that would be a lot easier to filter using indexes.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

Ideally edit your original question and add this information there.

ADD REPLY
0
Entering edit mode

For example, if A or T is present in "11th or 17" location of each sequence then return 1 else 0.

That doesn't make sense to me.

ADD REPLY
0
Entering edit mode

Do you only care about the presence or absence in certain positions? How many positions are you interested in?

ADD REPLY
4
Entering edit mode
5.6 years ago
zx8754 12k

To simplify your example, condition: if any 2nd or 4th position in every sequence has A or T, then TRUE.

# example data
x <- c("AAGTA", 
       "AAGTA", 
       "AAGTA", 
       "ACGAA")

# in this example all TRUE
all(substr(x, 2, 2) %in% c("A", "T") | substr(x, 4, 4) %in% c("A", "T"))
# [1] TRUE

If this is not the solution you are looking for, then please provide example input and expected output, clearly.

ADD COMMENT
0
Entering edit mode

excellent.....

Thanks alot dear. It's working...

ADD REPLY
2
Entering edit mode
5.6 years ago

To address your error message:

assuming this is R code, I don't think that the command works even in a single instance outside a for-loop:

> "A" == "T"||"A" && "A" == "A"||"T"
Error in "A" && "A" == "A" : invalid 'x' type in 'x && y'

The syntax would have to be:

> "A" %in% c("T","A") && "A" %in% c("T","A")
[1] TRUE

That being said, as the numerous comments above indicate, there's most definitely a more straight-forward way of doing whatever it is you're trying to do.

ADD COMMENT
0
Entering edit mode

The following regex would test the same things:

ifelse(grepl(".{5}[A|T]{3}", at), 
          5, 
             ifelse(grepl("[A|T]{2}.[A|T]", at), 
                      1,
                      NA
))

Note how you're also missing the indication for what should happen if the second ifelse iteration returns a FALSE (I've used NA here)

ADD REPLY
0
Entering edit mode

Thanks for the quick reply. But still, I am getting the same error for big files.

ADD REPLY

Login before adding your answer.

Traffic: 1478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6