Iteratively Concatenate An Arbitrary Number Of Seq Objects Using Biopython
2
1
Entering edit mode
10.8 years ago
mossmatters ▴ 90

What is the best way to join multiple Seq objects together into one sequence?

The Biopython tutorial suggests using "+" to join the sequence of two sequence objects:

seq3 = seq1+seq2

But what if I had a list (of arbitrary length) of Seq objects? If they were strings I could simply:

"".join(seq_object_list)

But Seq objects do not have this attribute.

The following will get the job done:

from Bio.Seq import Seq
seq_list = [Seq("ACTG"),Seq("CCCT"),Seq("ACGG"),Seq("CTGA")]    
concatenated_seq = Seq("")
for i in seq_list:
    concatenated_seq += i

But I figured there might be a more "Pythonic" way of doing things.

biopython seq • 6.2k views
ADD COMMENT
3
Entering edit mode
10.8 years ago

In Python join() method is significantly faster than typical concatenation (seq1+seq2). It's because strings are immutable and can't be change in place. So to change a string, a new representation needs to be created (a concatenation of the two).

I would write something like this:

from Bio.Seq import Seq
seq_list = [Seq("ACTG"),Seq("CCCT"),Seq("ACGG"),Seq("CTGA")]
conseq = "".join([str(seq_rec) for seq_rec in seq_list])

I used a list comprehension here, because it's faster than creating an empty list and appending to it one by one.

ADD COMMENT
0
Entering edit mode

Thanks! I guess I was just wondering if there was an equivalent to .join for Seq objects. If there were, would it be faster than converting from Seq object to str and then back to Seq object, as you've done?

ADD REPLY
0
Entering edit mode

Note that gives you a string, not a Seq at the end

ADD REPLY
1
Entering edit mode
10.8 years ago
Peter 6.0k

Given a list of Seq objects, e.g.

from Bio.Seq import Seq
seq_list = [Seq("ACTG"),Seq("CCCT"),Seq("ACGG"),Seq("CTGA")]

You can use + as noted, but this is clumsy:

concatenated_seq = Seq("")
for s in seq_list:
    concatenated_seq += s

You can use summation through - the trick is you need a start value (not the default of zero):

concatenated_seq = sum(seq_list, Seq(""))

We should probably add a .join method to the Seq object, using the same alphabet rules as used for addition:

concatenated_seq = Seq("").join(seq_list) # Not currently implemented

Would you find that clearer?

ADD COMMENT
0
Entering edit mode

Yes, I think a join method would be a good addition to Biopython!

ADD REPLY

Login before adding your answer.

Traffic: 1498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6