Here I got a lot of targeted sequences in one fasta file and I need to align them, but some of them are reverse compliment sequence, is there any way to detect them once and reverse compliment them? Thanks.
Here I got a lot of targeted sequences in one fasta file and I need to align them, but some of them are reverse compliment sequence, is there any way to detect them once and reverse compliment them? Thanks.
If you want to do it in R, simply use the Biostring package (reverseComplement() ):
https://bioconductor.org/packages/release/bioc/html/Biostrings.html
https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/reverseComplement
this package should contain all the functions you need to compare strings, manipulate them and find a pattern.
Well, you could simply write a script yourself that 1) compares all the seqs (n*(n-1)/2 runs for n fasta seqs in the worst case), 2) in any comparison, if one of compared seqs has ATG at the beginning and CAT at the end, runs nucleotidewise check of two seqs further to see if two seqs would be fully reverse, and if it reaches end/beginning of fasta seqs 3) you have two reverse seqs.
It's quite simple to implement it. I have implemented it in Python, which can be migrated to R without big effort.
COMP = {"A" : "T", "T" : "A", "C" : "G", "G" : "C", "a" : "t", "t" : "a", "c" : "g", "g" : "c", "N":"N", "\n":"\n"}
def reverse_complement(origin):
length = len(origin)
revCompArr = ['' for x in xrange(length)]
for i in xrange(length):
orig = origin[length - i -1]
if orig in COMP:
revCompArr[i] = COMP[orig]
else:
revCompArr[i] = 'N'
return ''.join(revCompArr)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
how do you know this ?
they don't have ATG as the beginning codon but with CAT in the end.
then retrive sequences not starting with ATG, compute the revcom sequences and cat them to those starting with ATG.
a shell solution: