Concatenating Nexus Files (For Mrbayes, Etc)
3
2
Entering edit mode
12.8 years ago
Andre Elias ▴ 110

Hi, I've been trying to concatenate multiple genes into a single nexus file to be used in MrBayes.

Apparently, Biopython has this script in their website, but it's not working for me (ref: http://biopython.org/wiki/Concatenate_nexus):

from Bio.Nexus import Nexus
# the combine function takes a list [(name, nexus instance)...], if we provide the
# file handles in a list we can use a list comprehension to such a list easily
handles = [open('btCOI.nex', 'r'), open('btCOII.nex', 'r'), open('btITS.nex', 'r')]   
nexi =  [handle.name, Nexus.Nexus(handle)) for handle in file_list]

combined = Nexus.combine(nexi)
combined.write_nexus_data(filename='btCOMBINED.nex')

It keeps spitting me this error:

nexi =  [handle.name, Nexus.Nexus(handle)) for handle in file_list]
NameError: name 'file_list' is not defined

If it worked it would be great, because it would automatically create partitions for each gene, etc, which is exactly what I need. Any help in this direction is welcome. :-)

biopython python • 8.2k views
ADD COMMENT
2
Entering edit mode

I think Frédéric's answer is right - I actually wrote the example on the Biopython wiki, so sorry for letting the bug slip in.

ADD REPLY
2
Entering edit mode

The BioPython Wiki helped me so many times! Thank you for work.

ADD REPLY
2
Entering edit mode

Yep - the encoding thing was reported (but not on the mailing list or bug tracker) and fixed ready for our next release: https://github.com/biopython/biopython/commit/f152fa2c2a7f71adaaf51b746cfd0a4308ea2edb - Sorry about that.

ADD REPLY
1
Entering edit mode

David, could this be simplified a little by giving Nexus filenames instead of handles?

ADD REPLY
1
Entering edit mode

Sounds great, but now I'm hitting this error message: File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 1275, in write_nexus_data fh=open(filename,'w',encoding="utf-8") TypeError: 'encoding' is an invalid keyword argument for this function PS: Thanks for all the hard work in the wiki, it's very useful indeed! :-)

ADD REPLY
1
Entering edit mode

I changed, as suggested by david w, the last line to combined.write_nexus_data(open("combined.nex", "w")) and it works now. Thank you all for the help! :D

ADD REPLY
1
Entering edit mode

@Peter - yep, the example was written when SeqIO took handles, so I guess that was just the style. Nexus.combine() takes either handles or filenames

ADD REPLY
1
Entering edit mode

@Andre, thanks for alerting us to these bugs, I gather the 'new' error you found came from file handles changing in python 2/3. I'll update the wiki with the workaround, but the problem is already fixed in the github version of Biopython - so you will be able to use a filename in the next release

ADD REPLY
4
Entering edit mode
12.8 years ago

I guess that if you replace file_list by handles, your python example should work just fine.

ADD COMMENT
1
Entering edit mode

David has fixed the wiki example now.

ADD REPLY
1
Entering edit mode
12.8 years ago

You might find mesquite useful. Pretty easy to run and I believe it does what you want.

Look under:

Fused Matrix Export (NEXUS):

http://mesquiteproject.org/mesquite_folder/docs/mesquite/molecular/molecular.html

ADD COMMENT
1
Entering edit mode

It is damn useful for phylogenetics. Happy tree building!

ADD REPLY
1
Entering edit mode

I believe it does that (concatenating, etc), but it uses a GUI that makes its use impractical for really large data sets (imagine concatenating a hundred alignment files, one by one) and it's also not very speedy (I'm not sure but I've heard it requires a lot of memory too). A script (perl/python/shell) could do it much faster and also could be integrated in a workflow pipeline. Your suggestion is good for people who intend to concatenate smaller datasets and would like to use a GUI for that. :-)

ADD REPLY
1
Entering edit mode
8.5 years ago

Hello, I have it working fine for the example. When I used it for my data with ~600 .nex files, it sure did generate the output. But the problem is, few taxon names were duplicated with copy tag. For example, ideally I should have got X Y Z

but I got

X Y Z X.copy ??? X.copy1 ??? Y.copy ??? MA??

I did ensure that, all my files do not have duplicate copies of alignment. Thanks in Advance. Arun

ADD COMMENT

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6