Hi friends,
I'm annotating a higher plant which is well into the higher plant lineage, it is a pond species. I've conducted a BlastX on the uniprot viridiplantae database locally and I'm getting a lot of hits for basal lineages such as the unicellular green algae. The results to me are not making much sense - given the evolutionary advancement of this plant compared to lower plants. If I look down my next best hits vice versa, there is always a more appropriate hit.
As a result of the above, I then performed a blastX with just the embryophyte lineage of uniprot and trembl (land plants and aquatic plants) and the results make much more since; however, the % identity score is low in some transcripts.What are people's takes on this? Is it better to use a more specific database in such a case, given that the evolution of my non-model organism is clear to me, and is highly evolved on a much more distant branch of the higher plants, away from the early branch of the viridiplantae? OR do I just use the entire viridiplantae lineage?
I've additionally done a BlastN to detect any contamination; with a 95% cut-off and low e-value. The database for this is made for unicellular microbial algae and eukaryotes which are appearing in the first set of blast X hits. There was very little hits for this, of which I removed.
What are people's opinions? Thanks.
Hey Chris, thanks for this. It's a de novo transcriptome assembly. Essentially what files are needed for this? Will a fasta of the assembly suffice? Thanks
Not sure how well this would work with a transcriptome assembly as the coverage varies so much (that is one critical component on a blobplot). But you could probably look at variation in GC content and overall what taxonomic groups are found via BLASTX.
EDIT: 'phylogenetic' -> 'taxonomic'
An area which I agree needs some improvement ;-). Might stick with the embryophyta and then blastN. Would someone be criticised for this approach?