Entering edit mode
12 months ago
10mz1
▴
10
Hello all,
I am trying to do taxonomy analysis via the 16S region of the bacterial genome from whole genome shotgun metagenomics samples. I installed Kraken2 and a 16S library called greengenes. After running kraken2 with this library and my aligned fastq files as a single contigs fasta file, i get 0.33% of sequences classified, and the rest are not. this represents just 82 sequences of 24,522 contigs. Does this seem correct? How can I use the resulting table to get species?
That doesn't seem correct, but also doesn't seem like an appropriate use of greengenes. Why are you using a 16S specific database for whole genome samples? And how have you processed the raw data?
EDIT: I misread the post. This is only one genome you've assembled. There are only meant to be between 1 to 15 copies in most genomes, so 0.33% may be correct depending on size of the genome.
That seems to say that this is a metagenomic sample so there could be more than one genome in play. But OP can certainly clarify.
sorry i should have said shotgun sequencing of DNA from stool samples. which is in a way whole genome bacterial shotgun sequencing but obviously not from a pure culture and there could be archaean, yeast, viruses present. i just don't know why so many of the reads are unmapped.
Have you tried assembling them? I'm confused why you've used the term contig if you haven't.
I can't figure out if this is a metagenomic sampling study, or if I'm confused and this is a genome assembly project.
Either way, unless you've enriched for 16S, a likely reason you may have such low reads mapping is because you're using a a library constructed of only a minor part of any microbial genome, and not a whole genome library.
A really simple proof of principle would be to BLAST a handful of the unassigned reads and see what comes up. I would expect most would be decent hits to microbial non-16S loci.