I have some fasta files containing thousands of sequences (both cDNA and amino acid) for multiple species of non-model lepidopterans.
I now want to identify the names or some other identifier (eg uniprot id) that I can later use to find the most common gene ontology (GO) terms.
So far, I have been able to apply this at a small scale using tblastn and looking at the names of the hits in other model species (Drosophila) where the genes have been identified. However, this method is not scalable at all-> even in command line blast as I have to manually look at the hits to find those with a usable name rather than "PREDICTED: species uncharacterised mRNA".
Does anyone have any suggestions on how to identify my genes? Any help would be very much appreciated.
Thanks for the suggestion, I had not heard of PANNZER. However, when I click on the link given in their paper (http://ekhidna2.biocenter.helsinki.fi/sanspanz/), it won't load and the web page times out. Is this just a temporary outage or is there another way to access it?
I've just tried the link and it works.
Yep, it was just down earlier but I managed to get the local installation to work anyway, so thanks for the great suggestion.