Hi,
I wonder whether it's better to remove weakly aligned parts of proteins from MSA or keep them for building HMM? Case: Let's say I have a bunch of homologs and I want to generate HMM (hidden Markov-model) to be able to detect their homologs from distinct species. Questions:
- Shall I use all available homologs or there is some reasonable limit (min: 5 or 15? max: 50, 100, 200)? I keep in mind that alignment gets worse the more sequnce is incorporate, plus MSA software has their limitations as well.
- Which MSA program will you recommend? Personally, I like MUSCLE a lot, but I'm aware MAFFT or T-Coffee perform better (but slower).
- Or shall I use more aligners and used consistency based alignment (M-coffee)?
- Shall I trim badly align fragments (trimAl or gBlocks)?
Cheers,
Hi Jarretinha. Is it possible to obtain a seed, or alignment of specific subfamily? I'm looking for F1/Fo ATP synthase subunit C (atpE, atpH, atp9, atp5G), and pfam has only seed for all atp synthases (they are non orthologous, so I wont use them in my phylogenetic analysis)