I used this piece of biopython codes written by Eric Normandeau to parse several sequences from a Fasta file. The codes worked nicely. I would like to have my parsed output from this format
>DR179241 similar to UniRef100_A5HIY3 Cluster: Thaumatin-like protein; n=1; ...... ATAATCTTTAGATCAGTCATCAATCTCAACAGTATCGCTTTCAATTCTCTTTCATATTGC ATGGAAGTGTGTAAATACAATTAGGGCATTCATTGAGTTGACTTCATTTAAGCGCT......
#!/usr/bin/env pythonimport sys
from Bio import SeqIO
fasta_file = sys.argv[1]# Input fasta file
result_file = sys.argv[2]# Output fasta file
def modified(records):
for record in records:
#Clear the description
record.description=""
yield record
records = modified(SeqIO.parse(fasta_file),"fasta")
count = SeqIO.write(records, result_file, "fasta")
print "Converted %i records" % count
This used a generator function to modify the records (memory efficient - you should avoid loading everything into memory). The key point for your request was to set the description to an empty string.
Oohh, good to know. Thanks Leonor.