Hello everyone,
I would like to identify the most conserved (and by the way, the most variable) sites in a multiple alignment of DNA genomic sequences with annotation information.
My final goal is to reconstruct the phylogenetic tree of a plant genus containing around 100 species. I have genomic data for only 15 of them (draft genomes). I selected about 1000 genes shared by the 15 and now I want to identify variable regions flanked by conserved ones in those 1000 genes. Indeed, I will then assume that those particular regions also exist in the 100 species. I’m interested in variable regions flanked by conserved ones because I will further design primers in the conserved parts and then amplify those regions for the species which have no genome sequenced. I guess the exon will be more conserved than the intron that’s why I need annotation information on the multiple alignments. For instance, it could be a multiple alignment with an annotation layer linked to a conserved profile.
So my question is: do you know software (command line ideally) that can perform this task?
Any suggestions will be appreciated.
Regards
Kevin
I think this a multi-step project, you can initially "measure" conservation and variabilty by %GC in contigs and/or looking for snps (bowtie2, samtools) but you need the reads.
I start to do that,
Thanks for your reply, Actually, I've already obtained the 15 species' sequences for each 1000 genes so by aligning them gene by gene I would have an idea where the snps/indels are but this would just be "visual". What I need is a 2-colum table for each gene with : column 1: position in the multiple DNA sequence alignment column 2: an index showing the conservation score for this position Then I could implement an algorithm that search the best region to amplify (i.e. a variable area flanked by 2 conserved regions) in the alignement based on the conservation score table. Having the annotation would be a plus but that could be done later. Regards
You can make that table;
I`m not an expert, I don´t know if exists a program that does it in one step (I don´t think so) but I would do it in that way. Good luck.
Thank you for your interesting ideas. I didn't know about nucmer, it sounds nice. I'll try it!