How to construct phylogeny based on multiple sequence alignment of orthologs without assembling the genomes
0
0
Entering edit mode
8.3 years ago
abbhinay • 0

I have two sets of phylogeny-

1) Species phylogeny (in black)- Species B to D have published genomes and I have assembled a genome for Species A. I constructed the phylogeny based on multiple sequence alignment of protein orthologs across Species A to D (OrthoMCl -> MUSCLE -> trimAl -> MrBayes).

2) Subspecies phylogeny (in red) - I also have sequencing data for different subspecies and isolates of Species A. I mapped these onto Species A genome, identified SNPs (using GATK) and drew a SNP-based phylogeny.

My question now is "what is the best way to integrate both these phylogenies into one?".

I do not want to assemble the genomes for all the subspecies (tedious for 20 isolates), and I do not want to map the Species B-D reads onto Species A (They are very divergent and inferring through MSA is best I think).

I can infer nucleic acid/protein sequences of the subspecies' orthologs from variant calls and add them to the multiple sequence alignment in Species phylogeny. But I find the output of tools like vcf2fq and FastaAlternateReferenceMaker complicated -New Fasta Sequence From Reference Fasta And Variant Calls File?. In this case, how to deal with SNPs in repetitive regions that we usually exclude from analysis?

Is there any other way to achieve this?

example phylogeny

SNP alignment phylogeny • 2.7k views
ADD COMMENT
0
Entering edit mode

assemble the genomes ... tedious for 20 isolates

I find the output of tools like vcf2fq and FastaAlternateReferenceMaker complicated

What is more efficient may depend on genome size and ploidity. For bacteria I would recommend to assemble the reads denovo with spades, which is fast and very easy to use. For bacteria denovo assembling is not at all "tedious".

ADD REPLY
0
Entering edit mode

Genome size is 20Mb and the organism is haploid. So denovo assembly is tedious (ordering, filling gaps, annotating genes).

ADD REPLY
0
Entering edit mode

how to deal with SNPs in repetitive regions

You should not do phylogeny on repetitive regions. Repeats are formed by recombination and recombination events will distort the phylogenetic signal.

ADD REPLY
0
Entering edit mode

In addition, highly repetitive regions are prone to sequencing errors, and thus unreliable variant calls.

ADD REPLY
0
Entering edit mode

Thanks @piet @WouterDeCoster. Will keep that in mind! As of now, I do have discarded all SNPs in DustMasker predicted regions.

ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6