I did a core/pan-genome analysis for twelve bacterial strains which are closely related.
So I have statistics about how big core, pan and accessory genomes are and how many genes fall into unique clusters (just found in one bacterium). Since the bacteria are closely related, most gene clusters contain sequences which are very similar. So on MSA you have just some base substitutions and rarely some indels.
To make the analysis more useful for other researchers I wanted to provide the core genome in a way that it can be used for some other analysis. So I got one representative sequence for every cluster and put this into a faa file,
Now I am wondering if I also should calculate a profile hmm for the aligment of the proteins of every cluster. Do you think this might be useful for other researchers? If yes, that do I have to keep in mind to make it as useful as possible?