Question

How can I analyze a genome with no annotation in NCBI Genbank

0

Entering edit mode

2.5 years ago

lukhanyomakhabane ▴ 30

Hey good people

I have genome sequences of Armillaria species (Basidiomycota fungi) hosted in NCBI Genbank database. The genomes have no annotations, I guess they were submitted without annotations in NCBI. But now I would like to find some genes of interest in these genomes ( e.g.., DNA methyltransferase encoding genes). What is the best way to do this? Your help would be greatly appreciated! I have the query sequences ( genes and protein sequences) that I would like to find in these genomes.

BLAST • 663 views

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 2.5 years ago by lukhanyomakhabane ▴ 30

1

Entering edit mode

You could create a local blast database using the reference fasta sequence and blast known genes of interest against that sequence to see if orthologs might be present as a simple exploration. Tutorial here. But without any gene predictions, i doubt the protein sequences you have will be useful with this method.

If you're up for it, you could run the reference sequence through a few rounds of gene predictions like Augustus, but this would be a considerable amount of work.

ADD REPLY • link 2.5 years ago by dthorbur ★ 2.6k

1

Entering edit mode

I'd start by simply annotating it yourself. If its a well conserved gene it'll be easy enough to pull out.

If there are good reference annotations you can use, you can usually feed annotators a list of 'reference' proteins.

ADD REPLY • link 2.5 years ago by Joe 21k