I am very new to programming in python. I have protein fasta files of species of plants.
I would like to filter them based on the number of amino acids each sequence contain. Criteria is those sequences >20 amino acids.
I am able to get the amino acids bigger than 20 with the resources on biopython cookbook. However,when i try to write them on the file. It gives me error. I am unable to reproduce it. Moreover, I would also like to have IDs of each sequence in the output file. Please help me!
Code:
import Bio
from Bio import SeqIO
for s_record in SeqIO.parse('arabidopsis_thaliana_proteome.ath.tfa','fasta'):
name = s_record.id
seq = s_record.seq
seqLen = len(s_record)
if seqLen >20:
desired_proteins=seq
output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta")
output_file
Thank you in advance :)
you want to do
seqLen = len(seq)
nots_record
. You also want to keep a list of the good ones, as WouterDeCoster's example specifies, and writes_record
to the file, notseq
.seq
would just give you the sequence string, and you want to maintain headers in the format of a fasta file.Actually,
len(record)
is the same aslen(record.seq)
and there is no need to convert it to a string.Thank you :) Understandable now!