Entering edit mode
2.5 years ago
lukhanyomakhabane
▴
30
Hey good people
I have genome sequences of Armillaria species (Basidiomycota fungi) hosted in NCBI Genbank database. The genomes have no annotations, I guess they were submitted without annotations in NCBI. But now I would like to find some genes of interest in these genomes ( e.g.., DNA methyltransferase encoding genes). What is the best way to do this? Your help would be greatly appreciated! I have the query sequences ( genes and protein sequences) that I would like to find in these genomes.
You could create a local blast database using the reference fasta sequence and blast known genes of interest against that sequence to see if orthologs might be present as a simple exploration. Tutorial here. But without any gene predictions, i doubt the protein sequences you have will be useful with this method.
If you're up for it, you could run the reference sequence through a few rounds of gene predictions like Augustus, but this would be a considerable amount of work.
I'd start by simply annotating it yourself. If its a well conserved gene it'll be easy enough to pull out.
If there are good reference annotations you can use, you can usually feed annotators a list of 'reference' proteins.