Hello all, does anyone know of a program which can filter out all the non bacterial genomes from metagenomic data.
Thanks!
Hello all, does anyone know of a program which can filter out all the non bacterial genomes from metagenomic data.
Thanks!
I only know how to do this after the assembly, which may not be what you are asking. I am assuming that you mean non-prokaryotic
rather than non-bacterial
, but the answer is probably the same.
The first step is to bin the contigs by 4n/5n frequencies. Even related bacterial species can be separated this way, and it is almost a guarantee that any eukaryotic sequence will be well-separated from the rest. The same is true for archaeal bins, in case you really meant non-bacterial
genomes. Bins can be classified using GTDB-Toolkit, where eukaryotes will usually be classified as Asgard/Loki
group.
Hi, reecemccu, you can perform the separation at read level by using Kraken2. Download well made kraken2 and bracken database (I suggest to download the standard database, not mini one) here: https://benlangmead.github.io/aws-indexes/k2 (dec/2020).
And preform kraken2 with
kraken2 --db {kraken2_database_path} --unclassified-out {uncseq} --classified-out {cseq} --use-names --threads {threads} -output {output.txt} -report {output.kreport} {input.fq}
Then kraken2 will classify your reads into different categories, you can select them later in {cseq} by using the index produced in {output.txt}
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What about mapping the reads with
Kraken2
first and sieving out everything that matches to bacteria?Generally speaking, I am not in favor of removing the reads when it is known that the underlying database used by Kraken2 is not current (the only one I can find is about a year old). Even if the database is current, there is always a possibility that a sample contains a truly novel bacterium which is not in the database, and those sequences would be thrown out.
In my experience, there is no problem in assembling a mix of prokaryotic and eukaryotic reads, and to separate them later after binning. For that matter, separating archaeal and bacterial bins is usually not a problem either. Just so I am not hand-waving, see if you can spot a group labeled
67
at around 8 o'clock in the image below. That is the only eukaryote in a mix of prokaryotes in this metagenome, and I hope it is obvious how cleanly it is separated from the others. Most of archaea and bacteria bins also separate cleanly from each other.