Removing all stop codons from Sequence Record using Biopython
2
3
Entering edit mode
6.8 years ago
ckan91 ▴ 40

Hello Everyone,

I have sequences that occasionally have an erronious stop codon. Is there a way to filter a biopython Sequence Record of all stop codons?

Edit: The sequence is in frame and I would like to remove the whole codon for all sequences in the SeqRec. Apologies for the lack of clarity.

Thank you so much! Chris

biopython • 5.0k views
ADD COMMENT
1
Entering edit mode
6.8 years ago

Bastien is right, there are many unclear points in your question (is the sequence already in frame? Do you want to remove the whole codon or just 1 nucleotide?) etc. Assuming that your sequence is already in frame you can do this:

from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

codon_stop_array = ["TAG", "TGA", "TAA", "UGA", "UAA", "UAG"]

for record in SeqIO.parse("my_fasta_file.fasta", "fasta"):
    print(record.seq)
    tempRecordSeq = list(record.seq)
    for index in range(0, len(record.seq), 3):
        codon = record.seq[index:index+3]
        if codon in codon_stop_array:
            del tempRecordSeq[index:index+3]
    record.seq = Seq("".join(tempRecordSeq))

but this will also remove the last stop codon.

ADD COMMENT
0
Entering edit mode

Thank you for your help!

ADD REPLY
0
Entering edit mode

This was helpful. I used a variant of this code to prepare alignments for codeml. If you are using it in an alignment it is important to maintain sequence length, so instead of deleting the stop codon I replaced it with ambiguous characters.

def replace_stop_codons(record, codon_stop_array = ["TAG", "TGA", "TAA"]):
    tempRecordSeq = list(record.seq)
    for index in range(0, len(record.seq), 3):
            codon = record.seq[index:index+3]
            if codon in codon_stop_array:
                tempRecordSeq[index:index+3] = '?','?','?'
    record.seq = Seq("".join(tempRecordSeq))
    return record
ADD REPLY
0
Entering edit mode
6.8 years ago

More information are necessary here, but assuming you don't want them to be in phase, try something like this :

from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

codon_stop_array=["TAG","TGA","TAA"]
record_without_stop=[]
record_with_stop=[]

for record in SeqIO.parse("your_fasta_file.fasta", "fasta"):
    if any(codon in record.seq for codon in codon_stop_array):
        record_with_stop.append(record)
    else:
        record_without_stop.append(record)
ADD COMMENT
0
Entering edit mode

Thank you for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6