Entering edit mode
11.3 years ago
pld
5.1k
Say I have a number of sequences from isolates (culture, wild, etc) of the same species of virus where effectively the same or similar phenotype was seen. I expect to see, at most, on the order of tens of SNPs in each genome given some consensus. The genes in these sequences have not been formally identified or reported but I expect that they are not drastically different. I would like to extract the sequences of these genes in order to include them in phylogenetic analysis involving other species of the same genus. What would be the best way to extract these genes?
What sort of tools are you familiar with? Can you handle command line? Can you script? Do you know the locations of the genes in each of your genomes or do you need to identify that? Why do you want to do phylogenies just on the genes, if these are closely related within a genus, you might be able to align the untranslated regions as well? More characters (variable sites) will usally result in a better supported phylogeny.