I’d like to remove human reads from human gut metagenomes. Many studies conduct bowtie2 to human genome and retain only unmapped reads. I did so, but many reads that did not mapped to human genome were annotated as human with Kraken2 in the following step.
Then I conducted both bowtie2 against hg38 and kraken2 against the standard Kraken2 database, using public data. The result is as follows. Although the human sequences were only a small on these samples, much more reads were considered human sequences with Kraken2.
sample | total_read | bowtie2_hg38 | kraken_human | both | only_bowtie2 | only_kraken |
---|---|---|---|---|---|---|
no1 | 17549939 | 300 | 4034 | 240 | 60 | 3794 |
no2 | 17053678 | 112 | 3067 | 85 | 27 | 2982 |
no3 | 16735960 | 365 | 5121 | 343 | 22 | 4778 |
no4 | 19546779 | 123 | 5109 | 114 | 9 | 4995 |
Do you have any idea about the difference between the results of bowtie2 and Kraken2? And do you have any other suggestions on how to remove human sequences from metagenomes?
Thanks.
You can try
removehuman.sh
which is part of BBMap suite: https://www.seqanswers.com/forum/bioinformatics/bioinformatics-aa/37175-introducing-removehuman-human-contaminant-removalThank you for reply. I'll try it.
Welcome to metagenomics :-)
Seriously though, try playing with bowtie2 parameters to make it more permissive.
Thank you for your reply :-) I wondered if the bowtie2 results were too strict or if the Kraken2 results were too permissive... If anyone knows some articles comparing host removal methods, let me know.
Thanks.
I moved this to a comment as it is not an answer.
If I don't recall badly, bowtie2 relies in a true alignment, whereas Kraken uses pseudoalignment. This, in addition to the many options you have in bowtie2, can make a difference