Question

very few sequences classified with Kraken2

0

Entering edit mode

2.0 years ago

10mz1 ▴ 10

Hello all,

I am trying to do taxonomy analysis via the 16S region of the bacterial genome from whole genome shotgun metagenomics samples. I installed Kraken2 and a 16S library called greengenes. After running kraken2 with this library and my aligned fastq files as a single contigs fasta file, i get 0.33% of sequences classified, and the rest are not. this represents just 82 sequences of 24,522 contigs. Does this seem correct? How can I use the resulting table to get species?

metagenomics 16s kraken2 kraken • 1.3k views

ADD COMMENT • link updated 2.0 years ago by dthorbur ★ 3.1k • written 2.0 years ago by 10mz1 ▴ 10

0

Entering edit mode

That doesn't seem correct, but also doesn't seem like an appropriate use of greengenes. Why are you using a 16S specific database for whole genome samples? And how have you processed the raw data?

EDIT: I misread the post. This is only one genome you've assembled. There are only meant to be between 1 to 15 copies in most genomes, so 0.33% may be correct depending on size of the genome.

ADD REPLY • link 2.0 years ago by dthorbur ★ 3.1k

0

Entering edit mode

bacterial genome from whole genome shotgun metagenomics samples

That seems to say that this is a metagenomic sample so there could be more than one genome in play. But OP can certainly clarify.

ADD REPLY • link 2.0 years ago by GenoMax 154k

0

Entering edit mode

sorry i should have said shotgun sequencing of DNA from stool samples. which is in a way whole genome bacterial shotgun sequencing but obviously not from a pure culture and there could be archaean, yeast, viruses present. i just don't know why so many of the reads are unmapped.

ADD REPLY • link 2.0 years ago by 10mz1 ▴ 10

0

Entering edit mode

Have you tried assembling them? I'm confused why you've used the term contig if you haven't.

I can't figure out if this is a metagenomic sampling study, or if I'm confused and this is a genome assembly project.

Either way, unless you've enriched for 16S, a likely reason you may have such low reads mapping is because you're using a a library constructed of only a minor part of any microbial genome, and not a whole genome library.

A really simple proof of principle would be to BLAST a handful of the unassigned reads and see what comes up. I would expect most would be decent hits to microbial non-16S loci.

ADD REPLY • link 2.0 years ago by dthorbur ★ 3.1k