HIC-pro with Forward and reverse reads not paired even when I use their test data.
0
0
Entering edit mode
10 days ago
Aki ▴ 20

Hi guys. I am trying to use HI-C pro (https://github.com/nservant/HiC-Pro) to analyze a publish data set. Both of when I use the published data set and their test data, I got the error "Forward and reverse reads not paired".

I checked the detail of my processed test data. I noticed that the length of bwt2merged.bam differs between R1 and R2.

samtools view SRR400264_01_R1_GRCh38.primary_assembly.genome_1_22_XYM.bwt2merged.bam| cut -f1 | wc -l     

249997

samtools view SRR400264_01_R2_GRCh38.primary_assembly.genome_1_22_XYM.bwt2merged.bam| cut -f1 | wc -l     

249999

Does anybody know how to avoid this error?

HIC-pro • 424 views
ADD COMMENT
0
Entering edit mode

Looks like the original dataset has an equal number of records based on the lines so that input looks correct.

$ wc -l SRR400264_*
  230710716 SRR400264_1.fastq
  230710716 SRR400264_2.fastq
  461421432 total
ADD REPLY
0
Entering edit mode

Hi GenoMax. Thanks for the comment. Yes, the input looks correct.

I verified that three types of BAM files are generated in the pipeline:

Global alignment: bwt2_global Local alignment: bwt2_local (from unmapped global reads) Merged: bwt2merged (global + local)

Here is how I checked the issue:

samtools view bowtie_results/bwt2/dixon_2M/SRR400264_00_R1_*.bwt2merged.bam | cut -f1 > R1.txt
samtools view bowtie_results/bwt2/dixon_2M/SRR400264_00_R2_*.bwt2merged.bam | cut -f1 > R2.txt
diff R1.txt R2.txt | head

Output:

50066d50065
< SRR400264.50066
64386d64384
< SRR400264.64386
92478a92477
> SRR400264.92479
184109a184109
> SRR400264.184111
192700a192701
> SRR400264.192703

Then I compared the read names at the global and local alignment levels:

samtools view bowtie_results/bwt2_global/dixon_2M/SRR400264_00_R1_*.bam | cut -f1 > global_R1.txt
samtools view bowtie_results/bwt2_global/dixon_2M/SRR400264_00_R2_*.bam | cut -f1 > global_R2.txt
samtools view bowtie_results/bwt2_local/dixon_2M/SRR400264_00_R1_*.bam | cut -f1 > local_R1.txt
samtools view bowtie_results/bwt2_local/dixon_2M/SRR400264_00_R2_*.bam | cut -f1 > local_R2.txt

Comparison results:

Global alignment:

0a1
> SRR400264.11
2a4,6
> SRR400264.29
> SRR400264.30
> SRR400264.31
3a8
> SRR400264.34
4a10,11
> SRR400264.46

Local alignment:

11d10
< SRR400264.11
27,29d25
< SRR400264.29
< SRR400264.30
< SRR400264.31
31d26
< SRR400264.34
42,43d36
< SRR400264.46

This suggests that reads like SRR400264.11, .29, .30, etc. are present in R2 only in the global BAM and in R1 only in the local BAM. After merging (samtools merge -n), these reads exist in both R1 and R2 BAMs.

The merged BAM files were sorted by read name using samtools sort -n, but mergeSAM.py still fails to pair the reads. Do you have any idea why this might be happening?

Thanks in adavance.

ADD REPLY

Login before adding your answer.

Traffic: 3358 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6