Entering edit mode
10 days ago
Aki
▴
20
Hi guys. I am trying to use HI-C pro (https://github.com/nservant/HiC-Pro) to analyze a publish data set. Both of when I use the published data set and their test data, I got the error "Forward and reverse reads not paired".
I checked the detail of my processed test data. I noticed that the length of bwt2merged.bam differs between R1 and R2.
samtools view SRR400264_01_R1_GRCh38.primary_assembly.genome_1_22_XYM.bwt2merged.bam| cut -f1 | wc -l
249997
samtools view SRR400264_01_R2_GRCh38.primary_assembly.genome_1_22_XYM.bwt2merged.bam| cut -f1 | wc -l
249999
Does anybody know how to avoid this error?
Looks like the original dataset has an equal number of records based on the lines so that input looks correct.
Hi GenoMax. Thanks for the comment. Yes, the input looks correct.
I verified that three types of BAM files are generated in the pipeline:
Global alignment: bwt2_global Local alignment: bwt2_local (from unmapped global reads) Merged: bwt2merged (global + local)
Here is how I checked the issue:
Output:
Then I compared the read names at the global and local alignment levels:
Comparison results:
Global alignment:
Local alignment:
This suggests that reads like SRR400264.11, .29, .30, etc. are present in R2 only in the global BAM and in R1 only in the local BAM. After merging (samtools merge -n), these reads exist in both R1 and R2 BAMs.
The merged BAM files were sorted by read name using samtools sort -n, but mergeSAM.py still fails to pair the reads. Do you have any idea why this might be happening?
Thanks in adavance.