Hi all!
I am analysing CSS Pacbio data and each sample came from different run, in particular I have three files for each sample. I tested both pbmm2 and minimap2 to align my long reads, after getting the consensus sequences.
This is the command I used to run mnimap2:
minimap2 -a -x map-pb -t 8 genome.fa $file1 $file2 $file3 > align.bam
samtools sort -@ 16 -m 3G align.bam > align.sort.bam
The size of align.bam is ~538 G, the size of align.sort.bam is ~100 G. I checked the number of reads in align.bam and align.sort.bam using samtools flagstat, and I got the same output:
29865175 + 0 in total (QC-passed reads + QC-failed reads)
5101289 + 0 secondary
6865323 + 0 supplementary
0 + 0 duplicates
26824957 + 0 mapped (89.82% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
The difference between total and secondary+supplementary is equal to the total number of reads:
29865175 - 5101289 - 6865323=17898563
After that, I ran pbmm2 using the following command:
echo $file1 > 'myfile.fofn'
echo $ile2 >> 'myfile.fofn'
echo $file3 >> 'myfile.fofn'
pbmm2 align genome.fa myfile.fofn alignPbmm2.bam --preset CCS -j 16
samtools sort -@ 16 -m 3G alignPbmm2.bam> alignPbmm2.sort.bam
The size of alignPbmm2.bam is ~460 G, the size of alignPbmm2.sort.bam is ~390 G. I checked the number of reads in those two files and I got the following file from samtools flagstat:
42157979 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
27576659 + 0 supplementary
0 + 0 duplicates
42157979 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Why do I get such different results? How can I understand which is the most reliable?
Thank you in advance
minimap
- 29865175 readspbmm2
- 42157979 readsDo you know for sure if
minimap2
can take multiple input files? I wonder if it only did the first file and that is the reason you have significantly less reads in that alignment.it should.. I also tried to align each file individually and after that I merged the file, I got the same results with minimap2. Furthermore, this value 29865175 - 5101289 - 6865323=17898563 corresponds to the number of reads in file1+file2+file3. It seems that pbmm2 gives in output much more 'supplementary' reads compare to minimap2