Question

bwa mem skip orientation when aligning a small subset of reads

0

Entering edit mode

23 months ago

Yihang • 0

For pair-end read files, read1.fastq, read2.fastq, I use the following command cat read1.fastq | head -n 4000 > read1_first1000.fastq (and cat read2.fastq | head -n 4000 > read2_first1000.fastq) to extract first 1000 reads, and align with bwa mem hg19.fa read1_first1000.fastq read2_first1000.fastq. I find that all four orientations, FF, FR, RF, RR are skipped because there are not enough pairs, and bwa will map these reads without pair-end. Therefore, the same read pair will be mapped to a different place as when I align with bwa mem hg19.fa read1.fastq read2.fastq.

If I use the command cat read1.fastq | head -n 400000 > read1_first100000.fastq (and cat read2.fastq | head -n 400000 > read2_first100000.fastq) to extract first 100000 reads and align them again, bwa will detect the orientation FR and then the results will be the same as aligning with bwa mem hg19.fa read1.fastq read2.fastq. (for the first 100000 reads).

Therefore, I am curious about the following questions:

How does bwa compute unique pairs for (FF, FR, RF, RR)?
How many candidate unique read pairs would be enough for detecting each orientation? Is this related to the parameter -K (fixed chunk size)?
I find that usually, for a batch of e.g. 100000 read pairs, only around half of them will become candidate unique pairs for the orientations. If e.g., FR has significantly more candidate read pairs than other orientations, will bwa assume all reads (rather than only candidate read pairs of FR) have the orientation FR and map them accordingly?

Thanks!

fastq alignment bwa • 817 views

ADD COMMENT • link updated 23 months ago by Asaf 10k • written 23 months ago by Yihang • 0

score 1 · Answer 1 · 2023-07-09

Yes, it does guess the orientation and the average insert size from the data. It is hard-coded and cannot be set using parameters. The minimal number of pairs of each orientation to be tested is 10 but they can fail other steps of filtering. If you share your log (look for messages such as skip orientation ... or analyzing insert size distribution for orientation ...) we could tell you more.