I am interested in looking at a number of genes in 2 or 3 closely related eukaryotic genomes. I think interesting questions to answer would be:
- Which subset among my genes is most conserved in all the genomes.
- Which set of genes show higher sequence similarity/conservation as compared to
- all the other conserved genes
- exon sequence identity vs related non-coding regions such as intron sequence identity, promoter region, 3' UTR, +/- 500bp around the gene. This should give an idea of negative selection pressure.
- Which of these genes exist in syntenic regions and which of these regions seem to have moved to different regions in the genome.
I am trying to think which questions would be appropriate to ask, and good tools to answer them rather than manually running hundreds of different custom analyses.
I have found the following tools, but they all seem to be doing a specific analysis for a given ortholog pair/region:
- Phastcons for producing conservation scores per base and identifying blocks of conserved regions within genes.
- PIECE (pubmed 21041978) and SVC (pubmed 15991338) for exon-intron comparisons.
- Synmap for whole genome synteny.
- Gevo to visualize regions for conserved elements.
- MUMmer for whole chromosome alignment.
As you can see, these tools solve slightly related problems, and many of them are visual front ends and can't be simply used to run systemic analyses from a script.
What sort of tools would you recommend and which questions should I be asking?
(cross-posted on http://seqanswers.com/forums/showthread.php?p=168512#post168512)