Problem with adapters and taxonomy
1
0
Entering edit mode
2.6 years ago

Hi, I've performed a taxonomic classification with kraken2 on my fastq and I've obtained a classification with 15% unclassified sequences and 85% classified sequences. But I noticed that I forgot to clean the fastQ and remove the adapters.

So I performed the cleaning with Adapter removal and did again the taxonomic classification, but now I've obtained 10 % classified and 90% unclassified.

How is it possible?

taxonomy adapters kraken • 1.4k views
ADD COMMENT
0
Entering edit mode

Try centrifuge, ganon or other tools?

ADD REPLY
0
Entering edit mode

what are they for?

ADD REPLY
1
Entering edit mode
2.6 years ago

The ways of the Bioinformatics Gods are mysterious, in your case what most likely happened is that reads that previously could be classified in only one way (from left side) after adapter removal could be classified from the right as well, but as a different organism.

It could also be that, after filtering you end up with short reads that cannot be classified either way.

The tools attempts to reconcile the various classifications into a lowest common ancestor. Look up some of read classifications. Kraken2, you can generate an output file that lists all hits for each read and see what that shows for reads that get unclassified after an adapter removal.

ADD COMMENT
0
Entering edit mode
  • But I have a file with R1 sequences and a file with R2 sequences so the software recognises the correct verse of the sequences, doesn’t it? If it doesn’t matter and the adapters help the software to recognise the right verse of the sequences should I consider the kraken2 report obtained from the analysis on the “dirty” sequences more accurate than the report on the “clean” ones?
  • What happens if I have short reads that align on different genomes? Does kraken2 put them in the unclassified cluster?
  • why does it happen that if I have all sequences of 100-102 nt, and adapters of 34 nt, when I remove the adapters I find reads with length in a range of 30-90 nt? Shouldn’t I find only reads of 66-68 nt?

Sorry for all this questions but I really want to learn and understand!

Thank you!

ADD REPLY
0
Entering edit mode

if I have all sequences of 100-102 nt, and adapters of 34 nt, when I remove the adapters I find reads with length in a range of 30-90 nt? Shouldn’t I find only reads of 66-68 nt?

Every read in illumina sequencing does not have/should contain adapter sequence. Only in cases where the insert is shorter than the length of sequencing is where one will find adapter sequence at 3'-end of that read.

ADD REPLY
0
Entering edit mode

Run a tool like FastQC and see what it says about adapter content.

Across the many bacterial species, there can be a lot of commonalities, a sequence may match multiple genomes. There is an algorithm in Kraken2 that attempts to resolve the most likely ancestor. Usually you can also get classification at higher level (order, family or even phylum.

Read the publication that describes Kraken2 and/or the manual to understand the process.

ADD REPLY

Login before adding your answer.

Traffic: 2064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6