Question

Custom Database in Kraken2: how are plasmids in same fasta file as bacterial genome handled?

1

Entering edit mode

3.6 years ago

Raphaela ▴ 10

Hi everyone,

I am creating a custom database in order to analyse paired reads found in stool samples. My project only looks at the abundances of bacterial species of the Lactobacillaceae family, therefore I downloaded all respective .fna files from the NCBI databank.

When inspecting the .fna files I found many of them containing several genomes, often one complete genome of the bacterial strain and additionally one or two plasmid genomes of the same bacterial strain. For my analysis I am only interested in the complete genomes, however if I filter out "plasmid" I seem to lose lots of .fna files which also contain valuable bacterial genomes.

Does anybody have experience in how Kraken2 handles these plasmid genomes? Are they processed individually or is the end result one abundance score for the respective bacterial strain?

Thank you so much for your help!

BW, Rapha

Kraken2 plasmid CustomDB • 1.1k views

ADD COMMENT • link 3.6 years ago by Raphaela ▴ 10

score 1 · Answer 1 · 2021-04-24

1

Entering edit mode

3.6 years ago

Istvan Albert 102k

The tool matches sequence ids to taxonomic ids. The table file connects the two pieces of information.

A plasmid sequence is no different than any segmented genome that is present on multiple chromosomes.

What you will get at the end is how many reads are assigned to a taxonomical level, it is not important how many chromosomes/plasmids there were.