Sorry for the rookie question, but I don't have a ton of experience with genome annotation and microbial genomics. I want to identify the virulence genes in my microbial species of interest from the GTF/GFF file. How do I do that?
Sorry for the rookie question, but I don't have a ton of experience with genome annotation and microbial genomics. I want to identify the virulence genes in my microbial species of interest from the GTF/GFF file. How do I do that?
Very likely, you can't.
GTF or GFF3 files are annotation file formats that state where certain features like genes or TSS can be found within a genome. Supporting information may be added as attributes to column 9, but then the delineation of "virulence genes" vs "other genes" has been done before by other means.
What you need is functional annotation, that will tell you which genes are involved in e.g. antibiotic resistance, proliferation, biofilm formation etc.
It depends on how GTF/GFF files were made. If it was done by prodigal or some other gene finding program, chances are that no additional information will be in those files beyond gene boundaries. If you used prokka or a similar program for genome annotation, there will be useful information in GTF/GFF files. Still, I don't think you will get a single, neatly defined virulence category, or even a set of keywords (toxins, invasion factors). Many virulence genes are named in such a way that it isn't immediately obvious they are involved in virulence, so most likely you will need to do additional research.
The following papers might help:
Thank you (as always, you're always so helpful)! These papers are great and also provide great introduction. Funny, I have even skimmed one of them before...and as soon as a commenter mentioned "functional annotation" I realized that's what I need! A colleague said to find these virulence genes from the gtf file so I just took her word for it...she sounded very confident. I felt I was the idiot who couldn't find the info there. Thanks - it's very good to read through these papers before I jump into a tool.
You could try Bakta to annotate your bacterial genome. Internally, it uses AMRFinderPlus and VFDB and potential hits are marked in GFF3 column 9 as DBXREFs (Shameless plug, I'm the developer of Bakta).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you! That makes sense and clarifies things. I remember that now...this was a colleague who told me to find the virulence in the GTF file and I just sort of assumed she was correct and that it would be there, and I was feeling like I'm the idiot who just can't find it. Now I remember functional annotation...as opposed to gene annotation, I guess. Thanks for this clarification!