Virus core genome
1
0
Entering edit mode
4.1 years ago
Salsabil • 0

Hello everyone.

I'm working on a bunch of small viral genomes, and I need to construct their core genomes for a comparative study. I've tried using prokka and roary to do so, but since my sequences are too small ( 10 kb or less). I keep getting errors and warnings even though I obtain an output, and I keep getting 0 core genes from roary. It turned out that prokka uses a tool called Prodigal, and the sequence size threshold it uses is 20kb. So eventually, I can't get GFF3 files from prokka to run roary. Now I'm left with one of two solutions:

1/ look for an other source of GFF3 files to run roary and get my core genomes. 2/ Try something else (other than roary) to generate core genomes.

If you happen to know other tools or scripts I can use in my case please let me know. Many thanks!!

sequence assembly genome Virus core-genome • 1.2k views
ADD COMMENT
0
Entering edit mode

Separately from my other post, the concept of core genes in viruses does not make as much sense as in prokaryotes. Most viruses carry a minimal number of genes to begin with - usually only genes required to regulate their own or host transcription, and to replicate and rebuild capsids. Can't imagine that there will be much difference between somewhat related viruses of similarly-sized genomes.

ADD REPLY
0
Entering edit mode

Exactly, I was expecting a potentially high number of core genes. Especially that these viruses are from the same genus (HIV and SIV). As for the reason I want to work with core genome rather than whole genome, it's because those are highly mutable viruses.

ADD REPLY
0
Entering edit mode
4.1 years ago
Mensur Dlakic ★ 28k

I think there could be other options. You can run prodigal with -p meta (or --metagenome if using prokka), in which case it will allow genomes smaller than 20Kb. Or change the source code of prodigal to allow less than 20Kb and recompile.

This tool will find all single-copy marker genes shared between multiple genomes, but it also uses prodigal:

https://github.com/yuwwu/ezTree

ADD COMMENT
0
Entering edit mode

Thanks for your feedback. Actually, I found out that when genomes are smaller than 100kb, prokka puts prodigal in "meta" mode automatically. But then, it switches translation tables from 1 (the standard table) to 4. As a result, annotation differs, it doesn't detect all CDS within the genome. For one sequence with 10 CDS in NCBI, I got only 6 with prokka.

ADD REPLY
0
Entering edit mode

Update: I ended up retrieving GFF3 annotation files directly from NCBI, so the annotation problem is solved at this point. But I'm still getting 0 cores genes in roary results which I don't understand. Because these viruses share most of their genes (if not all of them)!

ADD REPLY

Login before adding your answer.

Traffic: 2471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6