I have a library of ~2000 nucleotide sequences for which I'd like to visualize the similarity. Because some of the sequences are very diverse, its not feasible to perform a multiple sequence alignment and then build a phylogenetic tree, as (at least when I perform this process using MEGA) no common bases can be found.
I'm looking for a way to build a phylogenetic tree based on individual pairwise distances between each pair of sequences (aligned in isolation), and not on the pairwise distances of sequences taken from a multiple alignment. Building a tree this way should overcome the issue of some sequences not being aligned due to their extreme diversity.
Alternatively, I'd like to create a plot where distances between two points are proportional to the pairwise distances between the sequences they represent. In this way, I could start to visually identify clusters of sequences which might exist.
Thanks!
Something: how meaningful can a phylogenetic tree be based on entities that have almost nothing in common? Like if you had to make a phylogeny about a 'house', a 'air plane', a 'table', a 'sunflower'. It's not that it is not possible to define a distance, but how meaningful could that be?
I would echo Micheal's comment above. While you can do something like UPGMA on e-values from an all-versus-all blast run it sounds like you need to think carefully about the biological implications of your question. What are you trying to ask and why do you need something like pairwise distances? Clustering will at least tell you which sequences should be grouped together and which should not. Keep in mind that multiple sequence alignment does contain a pairwise heuristic of some sort as the initial step.
In this case I'm trying to establish a vague classification of repeat elements. I'm out to see if sequences form into relatively distinct groups, from which I can manually classify a few representatives to get an idea for a whole. I'm not actually going to infer a biological relationship from the distantly related sequences; its more that I don't know which ones are distant and which ones are close.