Entering edit mode
7.7 years ago
deni.ribicic
•
0
Hi guys,
I was wondering what is the best approach to make sequence alignment of reconstructed metagenomic bins in order to make phylogenetic tree? What I have now is fasta file for each bin containing a lot of contigs. Do I have to find and extract specific genes to perform alignment? If so, what genes would you suggest?
Thanks for any help, Deni
Thanks for the quick reply, I did already the prediction. When you saying marker genes, are you referring to the single copy genes that are used to evaluate bin completeness and contamination? Or are those genes phylogenetic specific genes? Thanks again. Deni
I would like to use the single copy marker genes if your bins are distributed in multiple phyla or even domains. See the paper A new view of the tree of life
Thanks, got you. Now, more of a technical question. Say, I have a .fa file from a single bin containing sequences of single copy genes separated by headers containing genes names. Do you remove all headers in order to merge the sequences and name it with a single header containing bin name so you can concatenate multiple bins in one file for the alignment? Sorry for bothering, but I am pretty new to this.
Hi, you can concatenate the single copy marker genes from each bin using program like Gblocks. align each marker genes individually, then concatenate them. In case some marker genes are not identified in certain bins, only universal distributed ones are used. see the paper 8of the 16 marker genes were used for phylogenetic tree reconstruction . Or you can use a method CheckM used by placing the marker genes to a reference tree using pplacer
Thanks a lot! Actually I have done that with CheckM, but CheckM places bins in a reference genome tree which is huge and not really suitable for plotting. Here kicks in my bioinformatics limitation, probably there is a way to prune it, which I am not aware of. I have a feeling that I am now spamming the thread :)