very few sequences classified with Kraken2
0
0
Entering edit mode
12 months ago
10mz1 ▴ 10

Hello all,

I am trying to do taxonomy analysis via the 16S region of the bacterial genome from whole genome shotgun metagenomics samples. I installed Kraken2 and a 16S library called greengenes. After running kraken2 with this library and my aligned fastq files as a single contigs fasta file, i get 0.33% of sequences classified, and the rest are not. this represents just 82 sequences of 24,522 contigs. Does this seem correct? How can I use the resulting table to get species?

metagenomics 16s kraken2 kraken • 782 views
ADD COMMENT
0
Entering edit mode

That doesn't seem correct, but also doesn't seem like an appropriate use of greengenes. Why are you using a 16S specific database for whole genome samples? And how have you processed the raw data?

EDIT: I misread the post. This is only one genome you've assembled. There are only meant to be between 1 to 15 copies in most genomes, so 0.33% may be correct depending on size of the genome.

ADD REPLY
0
Entering edit mode

bacterial genome from whole genome shotgun metagenomics samples

That seems to say that this is a metagenomic sample so there could be more than one genome in play. But OP can certainly clarify.

ADD REPLY
0
Entering edit mode

sorry i should have said shotgun sequencing of DNA from stool samples. which is in a way whole genome bacterial shotgun sequencing but obviously not from a pure culture and there could be archaean, yeast, viruses present. i just don't know why so many of the reads are unmapped.

ADD REPLY
0
Entering edit mode

Have you tried assembling them? I'm confused why you've used the term contig if you haven't.

I can't figure out if this is a metagenomic sampling study, or if I'm confused and this is a genome assembly project.

Either way, unless you've enriched for 16S, a likely reason you may have such low reads mapping is because you're using a a library constructed of only a minor part of any microbial genome, and not a whole genome library.

A really simple proof of principle would be to BLAST a handful of the unassigned reads and see what comes up. I would expect most would be decent hits to microbial non-16S loci.

ADD REPLY

Login before adding your answer.

Traffic: 2316 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6