Question

Why Pair Ends Data'S Ecah Pair'S Alignment Statistic And The Sum Of Them Are Different

0

Entering edit mode

11.6 years ago

litiancheng.gansu ▴ 10

I have a sample's data, using illumina 's Piar End sequencing technology.

RE19E2T40PA_L1_I040.pairPrimer_1.fastq   (Read1)
RE19E2T40PA_L1_I040.pairPrimer_2.fastq   (Read2)

I have aligned both Read1 and Read2 to hg19 using BWA, and generated three bam file.

RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read1.sorted.bam   (Read1's bam file,  generated via command :' bwa samse' )
RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read2.sorted.bam   (Read2's bam file,  generated via command :' bwa samse' )
RE19E2T40PA_L1_I040.pairPrimer_1.fastq.sorted.bam         (pair end bam file,  generated via command :' bwa sampe' )

Then I using picard to generate alignment statistics.

java -jar /usr/app/picard/picard-tools-1.77/CollectAlignmentSummaryMetrics.jar R=/home/share/hg19/ucsc.hg19.fasta I=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read1.sorted.bam O=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read1.sorted.bam.stat

java -jar /usr/app/picard/picard-tools-1.77/CollectAlignmentSummaryMetrics.jar R=/home/share/hg19/ucsc.hg19.fasta I=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read2.sorted.bam O=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read2.sorted.bam.stat

java -jar /usr/app/picard/picard-tools-1.77/CollectAlignmentSummaryMetrics.jar R=/home/share/hg19/ucsc.hg19.fasta I=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.sorted.bam O=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.sorted.bam.stat

Look below! The results are contradictory:

Read1's alignment statistic ('RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read1.sorted.bam.stat')

CATEGORY    TOTAL_READS    PF_READS    PCT_PF_READS    PF_NOISE_READS    PF_READS_ALIGNED    PCT_PF_READS_ALIGNED
UNPAIRED    225020    225020    1    0    127242    56.55%

Read2's alignment statistic('RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read2.sorted.bam.stat')

CATEGORY    TOTAL_READS    PF_READS    PCT_PF_READS    PF_NOISE_READS    PF_READS_ALIGNED    PCT_PF_READS_ALIGNED
UNPAIRED    225020    225020    1    0    44101    19.60%

Pair End's alignemnt statistic (RE19E2T40PA_L1_I040.pairPrimer_1.fastq.sorted.bam.stat)

CATEGORY    TOTAL_READS    PF_READS    PCT_PF_READS    PF_NOISE_READS    PF_READS_ALIGNED    PCT_PF_READS_ALIGNED
FIRST_OF_PAIR    225020    225020    1    0    129286    57.46%
SECOND_OF_PAIR    225020    225020    1    0    87054    38.69%
PAIR    450040    450040    1    0    216340    48.07%

Read1 have 127242(56.55%) reads aligned by standalone while 129286(57.46%) reads aligned by Pair Ends.

Read2 have 44101(19.60%) reads alingned by standalone while 87054(38.69%) reads aligned by Pair Ends <--- This is so different, why?

bwa picard paired-end • 3.1k views

ADD COMMENT • link updated 11.6 years ago by swbarnes2 14k • written 11.6 years ago by litiancheng.gansu ▴ 10

score 2 · Answer 1 · 2013-05-08

2

Entering edit mode

11.6 years ago

swbarnes2 14k

When read 1 maps, the sampe software knows to look more carefully in the approximate region of that alignment for a suitable mapping site for read 2, if it couldn't find one before.

When you treat the reads as single end, you can't get the benefit of that other read's mapping position.

ADD COMMENT • link 11.6 years ago by swbarnes2 14k

0

Entering edit mode

Yes, I think you are right. I rechecked another sample, which the difference between standalone end's alignment statistics and Pair Ends's end alignment statistics are small, they are nearly same(like the above example's Read1's case), this is because the Read2(above example)'s quality is so bad that sampe process can do lots of realign jobs for unmapped read2. But when with the good quality data, only a small fraction of reads were realigned, so the difference between standalone end's alignment statistics and Pair Ends's end alignment statistics are small

ADD REPLY • link 11.6 years ago by litiancheng.gansu ▴ 10