Identify gene from sequence at scale in non-model organisms
1
0
Entering edit mode
11 days ago

I have some fasta files containing thousands of sequences (both cDNA and amino acid) for multiple species of non-model lepidopterans.

I now want to identify the names or some other identifier (eg uniprot id) that I can later use to find the most common gene ontology (GO) terms.

So far, I have been able to apply this at a small scale using tblastn and looking at the names of the hits in other model species (Drosophila) where the genes have been identified. However, this method is not scalable at all-> even in command line blast as I have to manually look at the hits to find those with a usable name rather than "PREDICTED: species uncharacterised mRNA".

Does anyone have any suggestions on how to identify my genes? Any help would be very much appreciated.

sequence gene • 399 views
ADD COMMENT
2
Entering edit mode
10 days ago
shelkmike ★ 1.5k

I don't understand why you think that you need some gene names to perform GO annotation. Just upload your proteins to PANNZER. It is an online tool that does GO annotation. Also, it will give your proteins descriptions like "UDP-glucose 6-dehydrogenase".

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion, I had not heard of PANNZER. However, when I click on the link given in their paper (http://ekhidna2.biocenter.helsinki.fi/sanspanz/), it won't load and the web page times out. Is this just a temporary outage or is there another way to access it?

ADD REPLY
1
Entering edit mode

I've just tried the link and it works.

ADD REPLY
0
Entering edit mode

Yep, it was just down earlier but I managed to get the local installation to work anyway, so thanks for the great suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 3583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6