Hi,
I want to do alignment of paired end fastQ files (R1 and R2) for which I am using BWA MEM tool. As this aligner takes some time to do alignment with single huge fastQ file I split R1 and R2 fastq files in multiple small fastq files(All files followed same sequence of reads as in Original file) and tried to align separately small R1 and R2 pairs. Later on I merged the small SAM files generated and compared the SAM file with SAM file generated with original(huge) fastq files (with picard "CompareSAMs" command). I noticed that the SAM files differ by significant number of reads.
Can anybody please let me know if I am doing it in right way or should I stick to the original files only?
If differences are expected then what might be the possible reason?
Any help on this is really appreciated.
Issues like this get reported from time to time and typically it's due to the random seeding step, though I think it got fixed at least once (see the following thread, for example: Bwa Mem Have Different Alignment Result When Using Different Threads ).
BWA version is
0.7.10-r789
All discordant alignments are havong Zero mapping quality. I tried with changing the number of threads but it seems alright as results are not changing.
If the differences are only between alignments with MAPQ of 0 then that's expected. Those alignments are randomly chosen.
Does this mean that if I run Original fastQ file or multiple split fastQ files(generated from original fastQ file) the alignment output will not differ for Non zero mapping quality reads? If yes, then can I split and parallelly run aligner(on distributed network) and later on merge the SAM files to get reliable results?