Hello,
I've been attempting to use:
samtools merge -r -b list_of_bams.txt output_bam.bam
I have two samples, with each sample having Illumina Hiseq paired end .bam and PacBio paired end reads .bam and their respective readgroups @RG. The Illumina reads were aligned with bwa mem and the PB reads with minimap2. My attempts at concatenating these two samples and their respective reads seemed successful -- I got a rather large .bam file but the program is not recognizing it. And when I load it into IGV it appears empty... This leads me to believe that there is something I am doing wrong with the samtools merge or just the idea of merging Illumina and PacBio reads into the same tracked bam.
I am pretty sure the .bam format supports this type of merge, but was having difficulty finding specific documentation online. Does anyone have any thoughts on this matter?
Thank you
Can you simply not load these as two tracks? One for illumina other for PB? If you just want to view them in IGV.
Seconding this. Would be good to know if the files work individually or if there's already a problem without them being merged.
Was the data aligned to the same exact reference genome?
Yes I should have specified. Same reference for all reads
Which program are you referring to? Do you get an error message?
Did you zoom into a specific gene? How big is the combined BAM file? Did you generate an index?
Yes I zoomed into a specific gene but also just browsed in general around different scaffolds that I have. There are no reads to be seen I am not sure where the data is in this bam file...
The file size is some how 40gb. Yes I have an index.
Here is my samtools flagstats output:
I tried to format the samtools flagstat output, but I may have mangled something. Even so, it is clear your file is not empty.
And the output of
samtools view output_bam.bam | head
? (However, it may be too big to paste here if there are PacBio reads among the first mapped reads.)samtools view -H out_bam.bam
would also be interesting to seeI am not sure what information you would gather from the header. I can paste a portion of it after the scaffolds... What would you be looking for in the header? It just says which scaffolds, @SQ , readgroups @RG and @PG. I can't paste the header because it is too large, exceeding over 7000 characters.
Well, the diagnostics are a bit difficult as you can tell by all the comments. I can't tell you what precisely I'd be looking for in the header, but for some clue that may explain the issue.
Have you extracted an individual read from the BAM file and tried to specifically zoom into that area in IGV? Can you take a screenshot of that?
Yes, before I concatenated all the bam files I checked each one to make sure the individual reads within each bam file were viewable in IGV.
I am not sure how to extract an individual read from my concatenated bam file... would it be a samtools command? I Just checked the header again, it doesn't look like anything specific stands out but I will paste a portion.
samtools view BAMFILE.bam | head
will yield some to get you startedI think what @Friederike means is just find reads in your BAM files that are aligned to a chromosome (if there are both illumina and PB reads next to each other great) and then try going to that region by typing the coordinates in IGV search window (e.g.
chr5:2345-5000
).Yes, indeed, that's what I tried to suggest. This problem really would benefit from an interactive session ;)
What
samtools flagstats output_bam.bam
show? Orsamtools view output_bam.bam | head
?