Can Biopython Properly Import Fasta Headers With Spaces In Them?
1
8
Entering edit mode
12.8 years ago
David M ▴ 580

When I use BioPython to create a sequence iterator, I find that any characters after the first space (" ") in the header are ignored. For instance:

If file.fasta is:

>A Header With Spaces
ATCGATCGATGC

The following code:

for sequence in SeqIO.parse(open("file.fasta"), "fasta"):
     print sequence.id

Will print:

A

Is there a way to get the full sequence id while still taking advantage of BioPython's utility?

python biopython fasta • 7.9k views
ADD COMMENT
1
Entering edit mode

The ID in FASTA is defined as everything that comes before the first space, so this behaviour is exactly right.

ADD REPLY
16
Entering edit mode
12.8 years ago

sequence.description should give you the entire header

ADD COMMENT
1
Entering edit mode

A good way of seeing what methods/properties are in a python object without looking up the API is to just use the dir() function. Try printing dir(sequence) and it should list all it's props/methods.

ADD REPLY
0
Entering edit mode

That did it. I can't tell you how long I've been working around this quirk when there's such a simple solution. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6