If it's possible for R detecting the reverse compliment sequence?
3
1
Entering edit mode
7.2 years ago
horsedog ▴ 60

Here I got a lot of targeted sequences in one fasta file and I need to align them, but some of them are reverse compliment sequence, is there any way to detect them once and reverse compliment them? Thanks.

R sequencing • 1.9k views
ADD COMMENT
0
Entering edit mode

but some of them are reverse compliment sequence

how do you know this ?

ADD REPLY
1
Entering edit mode

they don't have ATG as the beginning codon but with CAT in the end.

ADD REPLY
1
Entering edit mode

then retrive sequences not starting with ATG, compute the revcom sequences and cat them to those starting with ATG.

a shell solution:

seqkit grep -s -i -p ^ATG  seqs.fa > positive.fa

seqkit grep -s -i -p ^ATG -v seqs.fa | seqkit seq -r -p > processed.fa

cat positive.fa processed.fa > result.fa
ADD REPLY
1
Entering edit mode
7.2 years ago
VHahaut ★ 1.2k

If you want to do it in R, simply use the Biostring package (reverseComplement() ):

https://bioconductor.org/packages/release/bioc/html/Biostrings.html

https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/reverseComplement

this package should contain all the functions you need to compare strings, manipulate them and find a pattern.

ADD COMMENT
0
Entering edit mode
7.2 years ago

Well, you could simply write a script yourself that 1) compares all the seqs (n*(n-1)/2 runs for n fasta seqs in the worst case), 2) in any comparison, if one of compared seqs has ATG at the beginning and CAT at the end, runs nucleotidewise check of two seqs further to see if two seqs would be fully reverse, and if it reaches end/beginning of fasta seqs 3) you have two reverse seqs.

ADD COMMENT
0
Entering edit mode
7.2 years ago
chen ★ 2.5k

It's quite simple to implement it. I have implemented it in Python, which can be migrated to R without big effort.

COMP = {"A" : "T", "T" : "A", "C" : "G", "G" : "C", "a" : "t", "t" : "a", "c" : "g", "g" : "c", "N":"N", "\n":"\n"}

def reverse_complement(origin):
    length = len(origin)
    revCompArr = ['' for x in xrange(length)]
    for i in xrange(length):
        orig = origin[length - i -1]
        if orig in COMP:
            revCompArr[i] = COMP[orig]
        else:
            revCompArr[i] = 'N'
    return ''.join(revCompArr)
ADD COMMENT

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6