Question

MetaPhlAn 3.0

0

Entering edit mode

3.5 years ago

Ap1438 ▴ 50

when i run the default command mentioned in the MetaPhlAn 3 manual i am getting a high rate of unknown estimation i.e. 80 %
metaphlan SK_1-forward_paired.fq.gz,SK_1-reverse_paired.fq.gz,SK_1-forward_unpaired.fq.gz,SK_1-reverse_unpaired.fq.gz --bowtie2out sample1.bowtie2.bz2 --nproc 5 --bt2_ps very-sensitive-local --add_viruses --unknown_estimation --input_type fastq -o profiled_sample1.txt.

Can anyone suggest how can i reduce the unknown estimation. And what is the accepted normal for unknown estimation in case of soil samples.

MetaPhlAn • 1.4k views

ADD COMMENT • link updated 3.3 years ago by boaty ▴ 220 • written 3.5 years ago by Ap1438 ▴ 50

score 1 · Answer 1 · 2022-03-01

metaphlan3 utilizes ChocoPhlAn database which is uniref based (~17,000 reference genomes, it a lot but not enough ). I think it is ok for gut microbe research but not enough for soil samples.

the better way is to run de novo assembly fastq -> contigs -> bins -> MAGs then perform genome annotation by GTDB toolkit or prokka or eggnog.

there are some snakemake pipeline tool such as sunbeam, Metagenome-atlas and metaGEM which do all the stuff altogether. Another way is to run kraken2 with much larger database as reference.