Hi all!
I'm working with PacBio sequencing with a specific script from my group, but I would like to know if there are general suggestion for assembly, binning etc or gold-standard programs (even a tutorial to follow as refence) to your knowledge.
Actually I'm performing binning with Metabat2 but I'm stuck to the point in which I use jgi_summarize_bam_contig_depths
.
The actual command looks like:
jgi_summarize_bam_contig_depths --outputDepth ./binning/metabat/depth.txt "${TARGET_MAPPED_DIR}/outsort_run${num}_${let}.bam"
where the sorted files where produced from mapping the raw reads to the assembly using minimap2 and the sorting with samtools.
The problem is: when I'm doing this, the output of the summary command is:
Running with 40 threads to save memory you can reduce the number of threads with the OMP_NUM_THREADS variable
Output matrix to ./binning/metabat/depth_run{num}{let}.txt
Opening all bam files and validating headers
ERROR: validateHeader - Header count mismatch (25640 vs 14251) between bam files outsort_run542_A.bam and outsort_run542_C.bam
ERROR: ./outsort_run542_C.bam has a different reference than ./outsort_run542_A.bam
ERROR: validateHeader - Header count mismatch (25640 vs 17945) between bam files outsort_run542_A.bam and outsort_run542_D.bam
ERROR: ./outsort_run542_D.bam has a different reference than ./outsort_run542_A.bam
ERROR: validateHeader - Header count mismatch (25640 vs 34632) between bam files outsort_run542_A.bam and outsort_run549_A.bam
ERROR: ./outsort_run549_A.bam has a different reference than ./outsort_run542_A.bam
ERROR: validateHeader - Header count mismatch (25640 vs 17505) between bam files outsort_run542_A.bam and outsort_run553_B.bam
ERROR: ./outsort_run553_B.bam has a different reference than ./outsort_run542_A.bam
ERROR: It appears that your bam files contain different references.
validHeaders (including the first bamfile) == 1 while there were 5 bams to summarize.
Please ensure that all reads are aligned to the exact same assembly
Actually I'm using Metabat2 2.17. I would appreciate a lot suggestion on this step and in general, on approach to Pacbio.
Sorry, typo :)
Yes actually this is what I have done, I'm definitively missing some basic knowledge that is also the reason for which I was asking for reference. Actually I'm trying to apply this pipeline (assembly with minimap2 and different binning tools) on 5 pacbio samples. When I have different samples how that works? Sorry for the stupid question.
I don't think anyone can help you productively from afar without a major time investment. This is in part because you didn't provide enough details, and also because it may not be a simple solution. It boils down to this: you may have 5 samples as in your case, but all of them have to be mapped to the same assembly. It won't work otherwise. If your pipeline doesn't allow for this scenario, you will likely have to adjust the pipeline or do some steps manually.