Hi,
I am currently trying to use metabat2 to bin shotgun reads to obtain MAGs. I has scaffolds.fasta files from assembly (metaSpades). And I had made .bam files with bowtie2 and samtools on the same fastq files as it is my understanding you need both the sorted bam file and the contig file to run a binning software package like metabat2.
However, when I run it on one sample (the scaffolds.fasta file and the sorted bam file) I get this error:
Error: referenceFile: scaffolds.fasta is not the same as in the bam headers! (targets: 10575 from the bam vs 67331 from the ref)
It is clear that they don't match and that the scaffolds.fasta has way more reads than the bam file right? I tried using the contigs.fasta as well with no avail.
Is there any hints as to why this could be? Is there a better way to get a contigs fasta file and a sorted bam file that match? Thanks! Here is the original code I ran:
runMetaBat.sh Torgos-tracheliotus_S_S_Temp_D703-AK1682/scaffolds.fasta Torgos-tracheliotus_S_S_Temp_D703-AK1682_sorted.bam
Hi Mensur Dlakic, Thank you for the very detailed reply. I tried a test run on the first file and got 7 bins, so it seems to work. Does this mean 7 "MAGs" were binned for that sample but I necessarily won't know their coverage (i.e. how many hits per samples)?
And regarding getting coverage, is there a way you would redo where I started to ensure the sorted bam and scaffolds files have matching headers/#of reads?
You will never get the information about coverage from the binning alone. You only get sequences that belong to a given bin. However, if you get a properly sorted .bam file from the assembly, there is a little utility called
jgi_summarize_bam_contig_depths
that will calculate the coverage for each contig. It comes withmetabat2
distribution, and its use is described here.This is in general how the mapping is done, and it must start from the same assembly file (scaffolds.fasta) that will later be used for binning. I am putting arbitrary numbers for total threads (20) and mapping with both paired and single reads, which may not be realistic. You will have to adjust the commands to your setup.
Then presumably this command should work:
Hi Mensur,
I tried your method and it seems to have work. Thank you for the help. I got 11 bins.
Sorry @mensur.. one last question: If I run checkM2 on the 11 bin files and get back a "0 binds found" output... does that just mean there aren't enough bins to get any genomes with? I am reading that a lot of people will combine all the bins of all samples first and then start looking for MAGs... but I am unsure if this is an accurate assessment.
I don't have much experience with checkM2, but it sounds like you may be giving a wrong directory location, or specifying a wrong file extension for the bins. Impossible to tell from the information you provided.
No, bins are not meant to be combined. If things have been done properly, bins in most cases are equivalent to MAGs.
For your future inquiries: at a minimum one needs the whole command and the whole error message. Without them, it becomes a guessing game.
Yeah that's what I was thinking but triple checked and it seems to be right. And yes of course here is the output for reference:
running ls to show bins in .fa format in the scaffolds.fasta.metabat-bins-20250120_181114 directory
and then running checkm on that directory
I don't mean to sound harsh, but this is pretty basic stuff that is easily resolved if you read through the CheckM2 manual and follow the direction that is already given to you:
If you run
checkm2 predict -h
to get help, it will tell you that a default file extension the program looks for is.fna
. All your bins end in.fa
, so the program "sees" nothing in that directory. If you add-x fa
to your existing command, it will look for files that end in.fa
.By the way, if you also add
--threads N
whereN
is the number of threads on your computer, the program will run faster. It helps to read the manual.Mensur Dlakic apologies on subjecting you to such a trivial troubleshooting error. Hitting myself on the head because I was thinking fa and fna were the same thing. The option worked so I very much appreciate the help. Thank you!