Question

Problem with adapters and taxonomy

0

Entering edit mode

3.0 years ago

Giulia.cosenza ▴ 110

Hi, I've performed a taxonomic classification with kraken2 on my fastq and I've obtained a classification with 15% unclassified sequences and 85% classified sequences. But I noticed that I forgot to clean the fastQ and remove the adapters.

So I performed the cleaning with Adapter removal and did again the taxonomic classification, but now I've obtained 10 % classified and 90% unclassified.

How is it possible?

taxonomy adapters kraken • 1.7k views

ADD COMMENT • link updated 3.0 years ago by Istvan Albert 102k • written 3.0 years ago by Giulia.cosenza ▴ 110

0

Entering edit mode

Try centrifuge, ganon or other tools?

ADD REPLY • link 3.0 years ago by shenwei356 8.7k

0

Entering edit mode

what are they for?

ADD REPLY • link 3.0 years ago by Giulia.cosenza ▴ 110

score 1 · Answer 1 · 2022-04-28

1

Entering edit mode

3.0 years ago

Istvan Albert 102k

The ways of the Bioinformatics Gods are mysterious, in your case what most likely happened is that reads that previously could be classified in only one way (from left side) after adapter removal could be classified from the right as well, but as a different organism.

It could also be that, after filtering you end up with short reads that cannot be classified either way.

The tools attempts to reconcile the various classifications into a lowest common ancestor. Look up some of read classifications. Kraken2, you can generate an output file that lists all hits for each read and see what that shows for reads that get unclassified after an adapter removal.

ADD COMMENT • link 3.0 years ago by Istvan Albert 102k

0

Entering edit mode

But I have a file with R1 sequences and a file with R2 sequences so the software recognises the correct verse of the sequences, doesn’t it? If it doesn’t matter and the adapters help the software to recognise the right verse of the sequences should I consider the kraken2 report obtained from the analysis on the “dirty” sequences more accurate than the report on the “clean” ones?

What happens if I have short reads that align on different genomes? Does kraken2 put them in the unclassified cluster?

why does it happen that if I have all sequences of 100-102 nt, and adapters of 34 nt, when I remove the adapters I find reads with length in a range of 30-90 nt? Shouldn’t I find only reads of 66-68 nt?

Sorry for all this questions but I really want to learn and understand!

Thank you!

ADD REPLY • link 3.0 years ago by Giulia.cosenza ▴ 110

0

Entering edit mode

if I have all sequences of 100-102 nt, and adapters of 34 nt, when I remove the adapters I find reads with length in a range of 30-90 nt? Shouldn’t I find only reads of 66-68 nt?

Every read in illumina sequencing does not have/should contain adapter sequence. Only in cases where the insert is shorter than the length of sequencing is where one will find adapter sequence at 3'-end of that read.

ADD REPLY • link 3.0 years ago by GenoMax 151k

0

Entering edit mode

Run a tool like FastQC and see what it says about adapter content.

Across the many bacterial species, there can be a lot of commonalities, a sequence may match multiple genomes. There is an algorithm in Kraken2 that attempts to resolve the most likely ancestor. Usually you can also get classification at higher level (order, family or even phylum.

Read the publication that describes Kraken2 and/or the manual to understand the process.

ADD REPLY • link 3.0 years ago by Istvan Albert 102k