Question

Extracting Fasta Sequence Using Biopython

3

Entering edit mode

13.1 years ago

Ma ▴ 140

Hi: I want to extract one section of a chromosome into a FASTA file, I have two versions, but neither of them work correctly. At the end I want to have a normal FASTA file like this:

>chromosome_1_results
ACGGGAAAAA.......

version 1

from Bio import SeqIO

inFile = open('c:\\data\\ch1.fasta','r')
fw=open("c:\\data\\ch1results.fasta",'w')

s=0
for record in SeqIO.parse(inFile,'fasta'):
        fw.write (str(record.seq)[1:((23522552+23660224)/2)+1])

fw.close()

In this version it generates the file, but when I want to open it using for example a word processor it cannot be read. I have tried with ch1.fasta and opens normally. Also I have problems in how to put a header like in the FASTA files to my results.

version 2 from Bio import SeqIO

inFile = open('c:\\data\\ch1.fasta','r')
fw=open("c:\\data\\ch1results.fasta",'w')

s=0
for record in SeqIO.parse(inFile,'fasta'):
        SeqIO.write (str(record.seq)[1:((23522552+23660224)/2)+1],fw,"fasta")

fw.close()

in the second case I got an error that says "str object has no attribute id"

Thanks

fasta biopython • 16k views

ADD COMMENT • link updated 13.1 years ago by Michael Schubert ★ 7.1k • written 13.1 years ago by Ma ▴ 140

score 4 · Answer 1 · 2011-12-06

4

Entering edit mode

13.1 years ago

Michael Schubert ★ 7.1k

And the answer is: use version 2, but write a record instead of a string.

# read the record first
record.seq = record.seq[1:((23522552+23660224)/2)+1]
SeqIO.write(record,fw,"fasta")

ADD COMMENT • link 13.1 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

thank you very much for your time in answering this question @Michael Schubert, now it works really nice

ADD REPLY • link 13.1 years ago by Ma ▴ 140

score 3 · Answer 2 · 2011-12-06

3

Entering edit mode

13.1 years ago

Damian Kao 16k

You should read up more about python file IO. Here is how to make it output a header. I am assuming ch1.fasta only has one entry in it?

from Bio import SeqIO

inFile = open('c:\\data\\ch1.fasta','r')
fw=open("c:\\data\\ch1results.fasta",'w')

for record in SeqIO.parse(inFile,'fasta'):
        fw.write(">" + record.id + "\n")
        fw.write (str(record.seq)[1:((23522552+23660224)/2)+1] + "\n")

fw.close()

Are you opening it with notepad?

ADD COMMENT • link 13.1 years ago by Damian Kao 16k

1

Entering edit mode

The fasta format is just a header beginning with ">" along with an ID name on one line followed by the sequence on the next line(s). As long as you have those two things, it's considered a fasta file. The code I posted should print out a header.

ADD REPLY • link 13.1 years ago by Damian Kao 16k

0

Entering edit mode

thanks @DK, you always giving a hand in this field, the ch1.fasta has the complete FASTA sequence of chromosome 1, for that reason I wanted the output, of the region that I need, to be saved in FASTA format. I have tried the solution with fw.write, but the problem is that it only saves a very long line; which is not so good, because I need the file generated to be in FASTA format for other purposes

ADD REPLY • link 13.1 years ago by Ma ▴ 140

0

Entering edit mode

Why not use SeqIO for writing as well?: SeqIO.write(record, fw, "fasta")

ADD REPLY • link 13.1 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

Yeah SeqIO.write would work too. But I figured it'll be easier to explain the headers by manually typing it out and seeing what it does.

ADD REPLY • link 13.1 years ago by Damian Kao 16k

0

Entering edit mode

But it doesn't break lines, i.e. the file is not well human readable.

ADD REPLY • link 13.1 years ago by Michael Schubert ★ 7.1k

score 1 · Answer 3 · 2011-12-06

1

Entering edit mode

13.1 years ago

Fabian Bull ★ 1.3k

For iterating over sequence see: BioPython: SeqIO

For working with sequence records see: Biopython: SeqRecord

ADD COMMENT • link 13.1 years ago by Fabian Bull ★ 1.3k

0

Entering edit mode

can you be more specific instead of just pointing to the BioPython tutorial? I cannot find the mistake and I have read that material

ADD REPLY • link 13.1 years ago by Ma ▴ 140

0

Entering edit mode

peri4n: He explains his problem, shows how he tried to solve it, and where he is stuck. Basic but ok question to me.

ADD REPLY • link 13.1 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

I am just tired of all these "How do I parse file XXX"-question of people who obviously have no clue about programming. I just give them ressources so they can learn it. They don't learn anything if we solve their problems everytime.