Entering edit mode
2.6 years ago
schmiggle
•
0
I have a ~10Gb ONT metagenome from citrus psyllid that I am trying to extract bacterial contigs from to assemble. My current thought for a pipeline is broadly as follows:
- Tentative ID of each contig with Kraken2, then QC to only take assignments made with high confidence (there's no reference genome for citrus psyllid, so most of the assignments are erroneous).
- Assemble list of high confidence IDs, and search genbank for matching reference genomes.
- Minimap the metagenome to each reference, and assemble all contigs from each minimap.
Tentatively, the first and third steps work well--I can assemble individual bacterial genomes that look pretty good when I minimap to a target, and my QC method seems both reasonable and robust. However, because each Kraken2 taxID matches to many genbank IDs, I don't know of a good way to pull a genbank ID that represents a full reference genome. Is there a way to reliably get a reference genome from a Kraken2 taxID?