Many reads from a recent RNA-seq sequencing run I had done are not mapping to the reference sequence they had come from. I converted some of the sequences to fasta format, and blasted them against NR allowing everything, and indeed the few that I have looked at appear to be on a whole different part of the phylogenetic tree than the organism I was going for.
The problem is that right now, using NCBI Blast, I can view a tree of results for a single sequence in my list of hundreds that I wanted to test. Is there a way to view a tree of hundreds of results? What I had in mind was maybe just pulling the top 5 hits from each of the 100 results, and adding those to a phylogenetic tree, along with counts on how many times each species showed up in the list.
Of course this is just an idea. What I really want to answer is "what is this stuff?". So any ideas you have for visualizing blast results on a subset of my unmapped reads would be much appreciated.
Thanks for your time!
Thanks for pointing out MEGAN! This works pretty well, but it looks like they do not want to show any reads that map to human (where my reads should be coming from). There are some reads marked as "Metazoan" which could be reads that match human? I kind of wish that I could just see the underlying tree rather than just their computationally assigned nodes.
You already remove all reads that come from human, isn't? Another problem is if a read mapped multiple species MEGAN cannot resolve the origin.