Hello biostars,
I'm doing genome alignment with BWA-mem2. I got the raw-reads datasets of ten samples stored in ten folders; each has multiple paired-end (PE) fastq files of Illumina short reads (len_seq= 151 bp) loaded to (4 lanes) two fastq per lan, in a total of eight fastq/sample.
My goal is to do the genome alignment of each individual (sample) to the reference genome of an avian specie and address some genetic of population parameters.
I saw that it is okay to concatenate multiple paired-end, even with different lanes into one fastq file.
So I concatenated all the fastq files of one of my samples and followed two concatenation approaches.
A- I concatenated all the paired-end reads into one PE file (R1, R2), then aligned with BWA-mem2 (Use precompiled binaries) in the PE mode.
B- concatenated all the PE fastq files into one fastq file and rerun BWA-mem2 in SE mode. The results are shown below.
A - Bwa-mem2 output
Time taken for main_mem function: 26817.94 Sec
IO times (sec) :
Reading IO time (reads) avg: 373.65, (373.65, 373.65)
Writing IO time (SAM) avg: 1362.58, (1362.58, 1362.58)
Reading IO time (Reference Genome) avg: 0.96, (0.96, 0.96)
Index read time avg: 6.61, (6.61, 6.61)
Overall time (sec) (Excluding Index reading time):
PROCESS() (Total compute time + (read + SAM) IO time) : 26809.99
MEM_PROCESS_SEQ() (Total compute time (Kernel + SAM)), avg: 26809.28, (26809.28, 26809.28)
SAM Processing time (sec):
--WORKER_SAM avg: 6663.39, (6663.39, 6663.39)
Kernels' compute time (sec):
Total kernel (smem+sal+bsw) time avg: 20103.30, (20103.30, 20103.30)
SMEM compute avg: 13512.54, (13512.54, 13512.54)
SAL compute avg: 1294.74, (1294.74, 1294.74)
BSW time, avg: 4891.20, (4891.20, 4891.20)
Total allocs: 219754346 = 219754346 out total requests: 893438606, Rate: 0.25
B - results
Time taken for main_mem function: 42463.53 Sec
IO times (sec) :
Reading IO time (reads) avg: 716.52, (716.52, 716.52)
Writing IO time (SAM) avg: 2242.50, (2242.50, 2242.50)
Reading IO time (Reference Genome) avg: 0.89, (0.89, 0.89)
Index read time avg: 6.42, (6.42, 6.42)
Overall time (sec) (Excluding Index reading time):
PROCESS() (Total compute time + (read + SAM) IO time) : 42455.98
MEM_PROCESS_SEQ() (Total compute time (Kernel + SAM)), avg: 42454.83, (42454.83, 42454.83)
SAM Processing time (sec):
--WORKER_SAM avg: 1675.04, (1675.04, 1675.04)
Kernels' compute time (sec):
Total kernel (smem+sal+bsw) time avg: 40779.49, (40779.49, 40779.49)
SMEM compute avg: 27730.66, (27730.66, 27730.66)
SAL compute avg: 2570.63, (2570.63, 2570.63)
BSW time, avg: 9678.29, (9678.29, 9678.29)
Total allocs: 439503376 = 439503376 out total requests: 1786877212, Rate: 0.25
I have some questions:
- Regarding the alignment and downstream analysis, which is the better approach for me? Concatenate or interleaved the paired-end reads and how that possible for multiple paired-end reads
- Regarding the BWA-mem2 output, is there any spurious issue I should be aware of?
I'm new to this and would appreciate any suggestions or recommendations regarding this case. Thank you in advance.
A- That's exactly what I did Thank you so much for the advice. GenoMax