Microbial PacBio metagenomes and MAG pipeline (assembly/binning) METABAT
2
0
Entering edit mode
3 months ago
l.gallucci ▴ 20

Hi all!

I'm working with PacBio sequencing with a specific script from my group, but I would like to know if there are general suggestion for assembly, binning etc or gold-standard programs (even a tutorial to follow as refence) to your knowledge.

Actually I'm performing binning with Metabat2 but I'm stuck to the point in which I use jgi_summarize_bam_contig_depths.

The actual command looks like: jgi_summarize_bam_contig_depths --outputDepth ./binning/metabat/depth.txt "${TARGET_MAPPED_DIR}/outsort_run${num}_${let}.bam"

where the sorted files where produced from mapping the raw reads to the assembly using minimap2 and the sorting with samtools.

The problem is: when I'm doing this, the output of the summary command is:

Running with 40 threads to save memory you can reduce the number of threads with the OMP_NUM_THREADS variable
Output matrix to ./binning/metabat/depth_run{num}{let}.txt
Opening all bam files and validating headers
ERROR: validateHeader - Header count mismatch (25640 vs 14251) between bam files outsort_run542_A.bam and outsort_run542_C.bam
ERROR: ./outsort_run542_C.bam has a different reference than ./outsort_run542_A.bam
ERROR: validateHeader - Header count mismatch (25640 vs 17945) between bam files outsort_run542_A.bam and outsort_run542_D.bam
ERROR: ./outsort_run542_D.bam has a different reference than ./outsort_run542_A.bam
ERROR: validateHeader - Header count mismatch (25640 vs 34632) between bam files outsort_run542_A.bam and outsort_run549_A.bam
ERROR: ./outsort_run549_A.bam has a different reference than ./outsort_run542_A.bam
ERROR: validateHeader - Header count mismatch (25640 vs 17505) between bam files outsort_run542_A.bam and outsort_run553_B.bam
ERROR: ./outsort_run553_B.bam has a different reference than ./outsort_run542_A.bam
ERROR: It appears that your bam files contain different references.
    validHeaders (including the first bamfile) == 1 while there were 5 bams to summarize.


    Please ensure that all reads are aligned to the exact same assembly

Actually I'm using Metabat2 2.17. I would appreciate a lot suggestion on this step and in general, on approach to Pacbio.

binning mag pacbio metabat • 613 views
ADD COMMENT
4
Entering edit mode
3 months ago
dportik ▴ 40

The current best practices for metagenome assembly and binning for PacBio HiFi data can be found in this pre-print: https://doi.org/10.1101/2024.05.10.593587

There are two metagenome assembly methods designed for HiFi reads:

  • hifiasm-meta
  • metaMDBG

These work substantially better than metaFlye or HiCanu, which are not recommended for PacBio metagenome assembly.

There is a complete, open-source binning workflow for PacBio metagenomics available here (see HiFi-MAG-Pipeline): https://github.com/PacificBiosciences/pb-metagenomics-tools

The HiFi-MAG-Pipeline is described in greater detail in the pre-print above. It is designed for long-read assemblies and outperforms other tools (most of which were designed for short reads). The inputs are the contigs from a metagenome assembly, and the corresponding fasta file of HiFi reads used to generate those contigs. You can include multiple samples in a run, but they will each be treated independently in the pipeline.

ADD COMMENT
0
Entering edit mode
3 months ago
Mensur Dlakic ★ 28k

It is all there in the output: Please ensure that all reads are aligned to the exact same assembly.

Your BAM files are based on mapping to different reference assemblies, which is not how this works. You can have different sets of reads mapping to the SAME reference assembly, but not the same reads mapping to different assembly or different reads mapping to different assemblies.

By the way, I assume that you are stuck, because stacked means something altogether different.

ADD COMMENT
0
Entering edit mode

Sorry, typo :)

Yes actually this is what I have done, I'm definitively missing some basic knowledge that is also the reason for which I was asking for reference. Actually I'm trying to apply this pipeline (assembly with minimap2 and different binning tools) on 5 pacbio samples. When I have different samples how that works? Sorry for the stupid question.

ADD REPLY
0
Entering edit mode

I don't think anyone can help you productively from afar without a major time investment. This is in part because you didn't provide enough details, and also because it may not be a simple solution. It boils down to this: you may have 5 samples as in your case, but all of them have to be mapped to the same assembly. It won't work otherwise. If your pipeline doesn't allow for this scenario, you will likely have to adjust the pipeline or do some steps manually.

ADD REPLY

Login before adding your answer.

Traffic: 1794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6