Question

If it's possible for R detecting the reverse compliment sequence?

1

Entering edit mode

7.3 years ago

horsedog ▴ 60

Here I got a lot of targeted sequences in one fasta file and I need to align them, but some of them are reverse compliment sequence, is there any way to detect them once and reverse compliment them? Thanks.

R sequencing • 2.0k views

ADD COMMENT • link updated 7.3 years ago by VHahaut ★ 1.2k • written 7.3 years ago by horsedog ▴ 60

0

Entering edit mode

but some of them are reverse compliment sequence

how do you know this ?

ADD REPLY • link 7.3 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

they don't have ATG as the beginning codon but with CAT in the end.

ADD REPLY • link 7.3 years ago by horsedog ▴ 60

1

Entering edit mode

then retrive sequences not starting with ATG, compute the revcom sequences and cat them to those starting with ATG.

a shell solution:

seqkit grep -s -i -p ^ATG  seqs.fa > positive.fa

seqkit grep -s -i -p ^ATG -v seqs.fa | seqkit seq -r -p > processed.fa

cat positive.fa processed.fa > result.fa

ADD REPLY • link 7.3 years ago by shenwei356 8.7k

score 1 · Answer 1 · 2017-09-15

If you want to do it in R, simply use the Biostring package (reverseComplement() ):

https://bioconductor.org/packages/release/bioc/html/Biostrings.html

https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/reverseComplement

this package should contain all the functions you need to compare strings, manipulate them and find a pattern.

score 0 · Answer 2 · 2017-09-14

Well, you could simply write a script yourself that 1) compares all the seqs (n*(n-1)/2 runs for n fasta seqs in the worst case), 2) in any comparison, if one of compared seqs has ATG at the beginning and CAT at the end, runs nucleotidewise check of two seqs further to see if two seqs would be fully reverse, and if it reaches end/beginning of fasta seqs 3) you have two reverse seqs.

score 0 · Answer 3 · 2017-09-14

It's quite simple to implement it. I have implemented it in Python, which can be migrated to R without big effort.

COMP = {"A" : "T", "T" : "A", "C" : "G", "G" : "C", "a" : "t", "t" : "a", "c" : "g", "g" : "c", "N":"N", "\n":"\n"}

def reverse_complement(origin):
    length = len(origin)
    revCompArr = ['' for x in xrange(length)]
    for i in xrange(length):
        orig = origin[length - i -1]
        if orig in COMP:
            revCompArr[i] = COMP[orig]
        else:
            revCompArr[i] = 'N'
    return ''.join(revCompArr)