Entering edit mode
4.3 years ago
Idania
•
0
Hi!
Im trying to find an enzyme gene in an Aspergillus genome which its not fully annotated. I made a DB with another protein sequences of that same enzyme from another Aspergillus species and after that I ran a tblastn and got my results in a .txt but I'm lost in the next step. I don't know how to handle the whole genome data. I've been trying to translate the genome in all the posible reading frames and then search my sequence with the grep command but I don't know if there is a more efficient way.
I'm using Linux. I'm being as clear as I can because i'm new in this topic, I hope you can help me.
Thanks!
You could try a de novo annotation
with PROKKAand then check the proteins of that annotation against your related protein. You could also find more of those protein sequences and build an hmm from them and use that to search against the prokka-proteins (HMMER software). You could also try a "liftover approach"(with RATT).As for your tblastn results, how do they look like? You'd want high scoring hsps that cover a good portion of the query sequence. Did you produce tabular output? Maybe show some of the first hits.
edit: Just noting that I made the prokka/ratt recommendations thinking we're dealing with a Prokaroyte. Since Aspergillus is a Eukaryote, these bacterial tools obviously don't make sense.
grep is for sure the least efficient way to do this (it's even the wrong way , because grep only matches exact matches, so as soon as there is a single AA difference between the 'genome gene' and your query gene it will fail to find it)
As suggested by cschu181 , doing a genome annotation is the better approach though is quite labour intensive. If you are only looking for a single (or few) genes it might be more efficient to dig in manually. Display the genome with the blast hits in a genome browser (IGV, GenomeView, Artemis, Apollo, ... ) and annotate the gene of interest .
I don't know, running prokka on a 30 Mb genome shouldn't be that bad (or do you mean in terms of installation/getting the environment right?). The liftover/reference-based annotation with RATT would require a bit more effort (getting the reference data in proper format and patching RATT so that it works with newer perl versions), but even that isn't too bad.edit: this is all nonsense as based on the assumption of processing a Prokaryote genome
well, yes there are the technical (potential) issues and then there are also the getting all your data together (rna-seq? proteins? parameter tuning, ... )
Running an off the shelf thing like PROKKA is feasible (bacterial is less cumbersome than eukaryotic) but still don't underestimate the whole process of doing genome annotation, I'm talking from experience here. A thorough(!) genome annotation is still a considerable effort.
Argh. It is a fungus. Of course, you're right. For some reason I thought this was a bacterial genome.
Please search fungal genome specific blast servers.