Update Multiple Sequences From Fasta File And Write Them As A New Fasta In Biopython
0
1
Entering edit mode
12.1 years ago
Abdullah ▴ 100

Hi,

I have a set of sequences in a fasta file, i read them into biopython and then i want to make some changes to them (adding or replacing or deleting) and then i want to save them to a new fastafile.

what i'm doing in the example is a small shifting of 5 nucleotides :

>     allrecords=list(SeqIO.parse("test.fasta", 'fasta'))
>             fullseq=0
>             tochangeseq =0 
>              oldfullseq=0     
> 
> for y in range(len(allrecords)):
> 
> 
>           oldfullseq=allrecords[y].seq
>             tochange=5        #might be dynamically included by another for loop
>             tochangeseq = allrecords[y].seq[0:tochange]
>             fullseq = oldfullseq[tochange:len(oldfullseq)]
>             fullseq += tochangeseq
>             allrecords[y].seq = fullseq
> print allrecords[0].seq    ### it should print the new updated sequence, but it's not

but the problem is when i print the sequences after the loop, i found that they did not changed, how can i update them and then write the new version to a fasta file.

help would be appreciated

biopython • 4.8k views
ADD COMMENT
2
Entering edit mode

The code for the sequence manipulation is a bit convoluted. You can just do this:

newSeq = allrecords[y].seq[tochange:] + allrecords[y].seq[:tochange]

allrecords[y].seq = newSeq

But I don't see anything wrong with it. It looks like you deleted a lot of the code. Can you post the full code?

ADD REPLY
0
Entering edit mode

Hi, when i try your method, i get Memory error .. besides .. that's the whole code .. i just want to update the sequences and output the new ones to a new fasta file .. but it's not working

ADD REPLY
0
Entering edit mode

Just do this then:

tochange = 5
for record in SeqIO.parse('test.fasta','fasta')):
   seq = str(record.seq)
   print '>' + record.description
   print seq[tochange:] + seq[:tochange]

pipe the results to a new file.

ADD REPLY
0
Entering edit mode

i think this not a good idea to do this .. i want to fill the new sequences inside an array because i might need them for other computational stuff ...the question is why my method is not working, if i print full seq it prints the new sequence but it's not being assigned to the record object ..

ADD REPLY
2
Entering edit mode

Honestly I don't really know why it doesn't work for you. That's why I thought maybe you didn't post the full code. I tried this:

from Bio import SeqIO

allRecords = list(SeqIO.parse('test.fa','fasta'))

tochange = 5
for i in range(len(allRecords)):
    newSeq = allRecords[i].seq[tochange:] + '.' + allRecords[i].seq[:tochange]
    allRecords[i].seq = newSeq

print allRecords[0].seq
print allRecords[1].seq

On this test.fa:

>test1
AGTGCTAGCTAGCTATTATATGCGACTGATCGTACTGGTAGCTGGTCA
>test2
ACGATCGTAGTCTTGATGCTAGTTTATATGGTCGTAGTCAGTCGTACA

And it worked for me. Giving me this:

TAGCTAGCTATTATATGCGACTGATCGTACTGGTAGCTGGTCA.AGTGC
CGTAGTCTTGATGCTAGTTTATATGGTCGTAGTCAGTCGTACA.ACGAT
ADD REPLY
0
Entering edit mode

It's not a good idea to work on large sequences (which I assume you are doing) and store them all in-memory. This is why Damian's previously suggested method resulted in a memory error for you.

On another note, I also tried your posted code with a smaller mock fasta file, it works just fine. It prints the new sequences, and when I inspected the SeqRecord objects, their sequences have changed as well. Have you posted the entire code?

ADD REPLY
1
Entering edit mode

Hi. I don't see the problem here. Your code works for me...

>>> allrecords[0].seq

Seq('GGAACGAGTTATCTGAACCTTCGGGGGACGATAACGGCGTCGAGCGGCGGACGG...AGC', SingleLetterAlphabet())

>>>for y in range(len(allrecords)):
    oldfullseq=allrecords[y].seq
    tochange=5    
    tochangeseq = allrecords[y].seq[0:tochange]
    fullseq = oldfullseq[tochange:len(oldfullseq)]
    fullseq += tochangeseq
    allrecords[y].seq = fullseq

.

>>>allrecords[0].seq

Seq('GAGTTATCTGAACCTTCGGGGGACGATAACGGCGTCGAGCGGCGGACGGGTGAG...AAC',SingleLetterAlphabet())

ADD REPLY
0
Entering edit mode

Hi, thank you all for your help, i found the error .. it's working now .. and your codes also works ..

ADD REPLY
0
Entering edit mode

You forgot to open the 'test.fasta' file:) Try: allrecords = list(SeqIO.parse(open("test.fasta"), "fasta"))

ADD REPLY
0
Entering edit mode

No, in modern Biopython, SeqIO.parse can take a filename as a string or a file handle

ADD REPLY
0
Entering edit mode

good to know. thanks

ADD REPLY

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6