Question

Metagenome bin sequence alignement

0

Entering edit mode

8.1 years ago

deni.ribicic • 0

Hi guys,

I was wondering what is the best approach to make sequence alignment of reconstructed metagenomic bins in order to make phylogenetic tree? What I have now is fasta file for each bin containing a lot of contigs. Do I have to find and extract specific genes to perform alignment? If so, what genes would you suggest?

Thanks for any help, Deni

alignment tree bin • 2.6k views

ADD COMMENT • link updated 8.1 years ago by yzzhang ▴ 30 • written 8.1 years ago by deni.ribicic • 0

score 1 · Answer 1 · 2017-04-06

1

Entering edit mode

8.1 years ago

yzzhang ▴ 30

First, you need to predict genes from each bin, run hmm search against the marker gene database. Then use the identified marker genes for phylogenetic tree construction

ADD COMMENT • link 8.1 years ago by yzzhang ▴ 30

0

Entering edit mode

Thanks for the quick reply, I did already the prediction. When you saying marker genes, are you referring to the single copy genes that are used to evaluate bin completeness and contamination? Or are those genes phylogenetic specific genes? Thanks again. Deni

ADD REPLY • link 8.1 years ago by deni.ribicic • 0

0

Entering edit mode

I would like to use the single copy marker genes if your bins are distributed in multiple phyla or even domains. See the paper A new view of the tree of life

ADD REPLY • link 8.1 years ago by yzzhang ▴ 30

0

Entering edit mode

Thanks, got you. Now, more of a technical question. Say, I have a .fa file from a single bin containing sequences of single copy genes separated by headers containing genes names. Do you remove all headers in order to merge the sequences and name it with a single header containing bin name so you can concatenate multiple bins in one file for the alignment? Sorry for bothering, but I am pretty new to this.

ADD REPLY • link 8.1 years ago by deni.ribicic • 0

0

Entering edit mode

Hi, you can concatenate the single copy marker genes from each bin using program like Gblocks. align each marker genes individually, then concatenate them. In case some marker genes are not identified in certain bins, only universal distributed ones are used. see the paper 8of the 16 marker genes were used for phylogenetic tree reconstruction . Or you can use a method CheckM used by placing the marker genes to a reference tree using pplacer

ADD REPLY • link 8.1 years ago by yzzhang ▴ 30

0

Entering edit mode

Thanks a lot! Actually I have done that with CheckM, but CheckM places bins in a reference genome tree which is huge and not really suitable for plotting. Here kicks in my bioinformatics limitation, probably there is a way to prune it, which I am not aware of. I have a feeling that I am now spamming the thread :)

ADD REPLY • link 8.1 years ago by deni.ribicic • 0