Entering edit mode
3.4 years ago
mglasena
▴
40
I will be creating multiple sequence alignments for each single-copy ortholog for a group of closely related species that have had their genomes sequenced and reads aligned to a common reference genome. What is the best way to identify genes/exons from the reference assembly that may not be single-copy orthologs in all species (i.e., paralogs resulting from duplication)? I want to be sure to exclude these from my multiple sequence alignments. Is there a rule of thumb for filtering based on expected coverage depth? Is there any software that automates this process?