Question

HIC-pro with Forward and reverse reads not paired even when I use their test data.

0

Entering edit mode

10 days ago

Aki ▴ 20

Hi guys. I am trying to use HI-C pro (https://github.com/nservant/HiC-Pro) to analyze a publish data set. Both of when I use the published data set and their test data, I got the error "Forward and reverse reads not paired".

I checked the detail of my processed test data. I noticed that the length of bwt2merged.bam differs between R1 and R2.

samtools view SRR400264_01_R1_GRCh38.primary_assembly.genome_1_22_XYM.bwt2merged.bam| cut -f1 | wc -l

249997

samtools view SRR400264_01_R2_GRCh38.primary_assembly.genome_1_22_XYM.bwt2merged.bam| cut -f1 | wc -l

249999

Does anybody know how to avoid this error?

HIC-pro • 424 views

ADD COMMENT • link 9 days ago by Aki ▴ 20

0

Entering edit mode

Looks like the original dataset has an equal number of records based on the lines so that input looks correct.

$ wc -l SRR400264_*
  230710716 SRR400264_1.fastq
  230710716 SRR400264_2.fastq
  461421432 total

ADD REPLY • link 10 days ago by GenoMax 150k

0

Entering edit mode

Hi GenoMax. Thanks for the comment. Yes, the input looks correct.

I verified that three types of BAM files are generated in the pipeline:

Global alignment: bwt2_global Local alignment: bwt2_local (from unmapped global reads) Merged: bwt2merged (global + local)

Here is how I checked the issue:

samtools view bowtie_results/bwt2/dixon_2M/SRR400264_00_R1_*.bwt2merged.bam | cut -f1 > R1.txt
samtools view bowtie_results/bwt2/dixon_2M/SRR400264_00_R2_*.bwt2merged.bam | cut -f1 > R2.txt
diff R1.txt R2.txt | head

Output:

50066d50065
< SRR400264.50066
64386d64384
< SRR400264.64386
92478a92477
> SRR400264.92479
184109a184109
> SRR400264.184111
192700a192701
> SRR400264.192703

Then I compared the read names at the global and local alignment levels:

samtools view bowtie_results/bwt2_global/dixon_2M/SRR400264_00_R1_*.bam | cut -f1 > global_R1.txt
samtools view bowtie_results/bwt2_global/dixon_2M/SRR400264_00_R2_*.bam | cut -f1 > global_R2.txt
samtools view bowtie_results/bwt2_local/dixon_2M/SRR400264_00_R1_*.bam | cut -f1 > local_R1.txt
samtools view bowtie_results/bwt2_local/dixon_2M/SRR400264_00_R2_*.bam | cut -f1 > local_R2.txt

Comparison results:

Global alignment:

0a1
> SRR400264.11
2a4,6
> SRR400264.29
> SRR400264.30
> SRR400264.31
3a8
> SRR400264.34
4a10,11
> SRR400264.46

Local alignment:

11d10
< SRR400264.11
27,29d25
< SRR400264.29
< SRR400264.30
< SRR400264.31
31d26
< SRR400264.34
42,43d36
< SRR400264.46

This suggests that reads like SRR400264.11, .29, .30, etc. are present in R2 only in the global BAM and in R1 only in the local BAM. After merging (samtools merge -n), these reads exist in both R1 and R2 BAMs.

The merged BAM files were sorted by read name using samtools sort -n, but mergeSAM.py still fails to pair the reads. Do you have any idea why this might be happening?

Thanks in adavance.

ADD REPLY • link 9 days ago by Aki ▴ 20