Why Pair Ends Data'S Ecah Pair'S Alignment Statistic And The Sum Of Them Are Different
1
0
Entering edit mode
11.6 years ago

I have a sample's data, using illumina 's Piar End sequencing technology.

RE19E2T40PA_L1_I040.pairPrimer_1.fastq   (Read1)
RE19E2T40PA_L1_I040.pairPrimer_2.fastq   (Read2)

I have aligned both Read1 and Read2 to hg19 using BWA, and generated three bam file.

RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read1.sorted.bam   (Read1's bam file,  generated via command :' bwa samse' )
RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read2.sorted.bam   (Read2's bam file,  generated via command :' bwa samse' )
RE19E2T40PA_L1_I040.pairPrimer_1.fastq.sorted.bam         (pair end bam file,  generated via command :' bwa sampe' )

Then I using picard to generate alignment statistics.

java -jar /usr/app/picard/picard-tools-1.77/CollectAlignmentSummaryMetrics.jar R=/home/share/hg19/ucsc.hg19.fasta I=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read1.sorted.bam O=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read1.sorted.bam.stat

java -jar /usr/app/picard/picard-tools-1.77/CollectAlignmentSummaryMetrics.jar R=/home/share/hg19/ucsc.hg19.fasta I=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read2.sorted.bam O=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read2.sorted.bam.stat

java -jar /usr/app/picard/picard-tools-1.77/CollectAlignmentSummaryMetrics.jar R=/home/share/hg19/ucsc.hg19.fasta I=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.sorted.bam O=RE19E2T40PA_L1_I040.pairPrimer_1.fastq.sorted.bam.stat

Look below! The results are contradictory:

Read1's alignment statistic ('RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read1.sorted.bam.stat')

CATEGORY    TOTAL_READS    PF_READS    PCT_PF_READS    PF_NOISE_READS    PF_READS_ALIGNED    PCT_PF_READS_ALIGNED
UNPAIRED    225020    225020    1    0    127242    56.55%

Read2's alignment statistic('RE19E2T40PA_L1_I040.pairPrimer_1.fastq.read2.sorted.bam.stat')

CATEGORY    TOTAL_READS    PF_READS    PCT_PF_READS    PF_NOISE_READS    PF_READS_ALIGNED    PCT_PF_READS_ALIGNED
UNPAIRED    225020    225020    1    0    44101    19.60%

Pair End's alignemnt statistic (RE19E2T40PA_L1_I040.pairPrimer_1.fastq.sorted.bam.stat)

CATEGORY    TOTAL_READS    PF_READS    PCT_PF_READS    PF_NOISE_READS    PF_READS_ALIGNED    PCT_PF_READS_ALIGNED
FIRST_OF_PAIR    225020    225020    1    0    129286    57.46%
SECOND_OF_PAIR    225020    225020    1    0    87054    38.69%
PAIR    450040    450040    1    0    216340    48.07%

Read1 have 127242(56.55%) reads aligned by standalone while 129286(57.46%) reads aligned by Pair Ends.

Read2 have 44101(19.60%) reads alingned by standalone while 87054(38.69%) reads aligned by Pair Ends <--- This is so different, why?

bwa picard paired-end • 3.1k views
ADD COMMENT
2
Entering edit mode
11.6 years ago

When read 1 maps, the sampe software knows to look more carefully in the approximate region of that alignment for a suitable mapping site for read 2, if it couldn't find one before.

When you treat the reads as single end, you can't get the benefit of that other read's mapping position.

ADD COMMENT
0
Entering edit mode

Yes, I think you are right. I rechecked another sample, which the difference between standalone end's alignment statistics and Pair Ends's end alignment statistics are small, they are nearly same(like the above example's Read1's case), this is because the Read2(above example)'s quality is so bad that sampe process can do lots of realign jobs for unmapped read2. But when with the good quality data, only a small fraction of reads were realigned, so the difference between standalone end's alignment statistics and Pair Ends's end alignment statistics are small

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6