Hello all,
Project background: I am trying to look at chemosensory evolution (GRs and ORs) across insect orders. I am specifically planning to look at GR and OR diversity between herbivore vs non-herbivore insect orders, look at selection and diversification rates along the branches between herbivore vs non-herbivore insect orders.
I have about 180 genomes that I have selected and downloaded from NCBI assembly (Genbank), most of which do not have annotation and I wanted to know the best way to bulk annotate these reference genomes so that I can get the list of proteins and genes in each of the reference genome so that I can then extract GRs and ORs from all the genomes.
I have been looking at Braker and EggNog but it looks like it is made for annotating novel genomes and might be slow to bulk annotate.
Thank you in advance!
I find it hard to believe that you downloaded many genomes from NCBI that do not have annotations. I think it is more likely that the annotations are there, but maybe you didn't look in the correct place. If you tell us a couple of genomes you downloaded and from where, we may be able to offer advice.
Hi Mensur,
Here are some of the genomes (accession numbers) I downloaded (from Genbank):
Out of the 180 genomes I downloaded, only 52 had the associated .gff and protein.faa files.
Mensur Dlakic posted links for
RefSeq
versions of the genomes but correspondingGenBank
versions should be available following similar links. ReplaceGCF
withGCA
.https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/932/325/GCA_012932325.1_TpBJ-2018v1/
Some genomes may have
GenBank
versions but noRefSeq
. e.g https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/926/335/GCA_002926335.1_tcristinae_2.1/ This genome seems to have only genbank flat file version available (no GFF).https://github.com/jorvis/biocode/blob/master/gff/convert_genbank_to_gff3.py purportedly does GBFF to GFF conversions but you will need to verify that claim.