Hi!
I have recently binned a whole lot of contigs, coming from a metagenomic sample. I have merged all my bins into one file, but the fasta-headers (>bin-name) have descriptions about what bacterial taxonomy the contig represents.
The next step was to annotate the genes of my bins. When i annotate the genes (with Prokka/Prodigal), I get a single file, with all predicted genes. However, I have no way of knowing what gene belongs to what bacteria, since the bacterial taxonomy headers are not preserved.
Do you guys have any ideas on how to know what genes belong to what bacteria? I would rather not run the bins 1 by 1, as I have thousands of them, and it would generate thousands of files to manage (that's why i merged all the bins).
Thanks for your suggestions guys! Seems like the locus_tag is not an option, with as many contigs as I have (a lot). I never studied the .gff file enough. Seems like a simple python script, could help me out there!! If that won't workout, I'll definitely look into just running the bins separately, and then merging.
I'll get back to this thread, if it doesn't work out.
Do you have spaces in your fasta headers? Can you replace them with
_
so when you run Prokka that information is preserved?My fasta-headers look like this:
And my Prokka-output look like this:
So I guess I would have to tell Prokka, to preserve the fasta-headers somehow?
I never ran
Prokka
but does the "FCHNPLMC_00004" not relate back to any of your contigs? This is a bit bizarre. Anyhow, Prokka uses Prodigal for the prediction of CDS', you can run it alone - either in metagenome mode or, ideally on each bin of your assembly individually to allow it to train itself instead of using a approximation of one of the incorporated models. It shouldn't make much difference to have an intermediate step with thousands of files, you can always merge them later.If the
locus_tag
option you feed to prokka is sufficiently unique (i.e. unique to each species/strain), then you can work out what genes come from what genomes just by the locus tag, which will be inserted in to the fasta headers.