CollectAlignmentSummaryMetrics PE FIRST_OF_PAIR alignment discrepancy
0
0
Entering edit mode
6.8 years ago
natsterbug ▴ 10

I have four samples of whole genome sequence from potato leaves that were run on a single lane of a Illumina Hiseq 4000 flow cell to generate 150 PE reads. I trimmed the reads with Trimmomatic before proceeding to align to the potato reference with bwamemusing the code below:

bwa mem -t 18 -k 16 -M -R"@RG\tID:Lane3_R\tSM:Resistant\tPL:Illumina\tLB:Resistant" potato_dm_v404_all_pm_un.fasta Resistant_Filtered_2P.fastq Resistant_Filtered_1P.fastq | samtools view -Sub - | samtools sort -O BAM -o Resistant.sorted.bam

Upon running CollectAlignmentSummaryMetrics

java -Xmx20g -jar /opt/software/picardTools/1.113/CollectAlignmentSummaryMetrics.jar R=potato_dm_v404_all_pm_un.fasta INPUT=Resistant.sorted.bam OUTPUT=Resistant_algn_summary.txt

Although 98.4% of PF Pair Reads aligned to the reference, I ascertained that for one of my samples the FIRST_OF_PAIR has substantially more PF_HQ_ALIGNED_READS than TOTAL_READS as well as other peculiar values potentially indicative of misaligned PE reads. The results are below:

 FIRST_OF_PAIR   TOTAL_READS                 70524781
FIRST_OF_PAIR   PF_READS                    70524781
FIRST_OF_PAIR   PCT_PF_READS                1
FIRST_OF_PAIR   PF_NOISE_READS              0
FIRST_OF_PAIR   PF_READS_ALIGNED            69453601
FIRST_OF_PAIR   PCT_PF_READS_ALIGNED        0.984811
FIRST_OF_PAIR   PF_ALIGNED_BASES            9620894381
FIRST_OF_PAIR   PF_HQ_ALIGNED_READS         50804548
FIRST_OF_PAIR   PF_HQ_ALIGNED_BASES         7174609113
FIRST_OF_PAIR   PF_HQ_ALIGNED_Q20_BASES     7105678130
FIRST_OF_PAIR   PF_HQ_MEDIAN_MISMATCHES     2
FIRST_OF_PAIR   PF_MISMATCH_RATE            0.035645
FIRST_OF_PAIR   PF_HQ_ERROR_RATE            0.03284
FIRST_OF_PAIR   PF_INDEL_RATE               0.002595
FIRST_OF_PAIR   MEAN_READ_LENGTH            147.023646
FIRST_OF_PAIR   READS_ALIGNED_IN_PAIRS      69198407
FIRST_OF_PAIR   PCT_READS_ALIGNED_IN_PAIRS  0.996326
FIRST_OF_PAIR   BAD_CYCLES                  0
FIRST_OF_PAIR   STRAND_BALANCE              0.500124
FIRST_OF_PAIR   PCT_CHIMERAS                0.182996
FIRST_OF_PAIR   PCT_ADAPTER                 0.000006
SECOND_OF_PAIR  TOTAL_READS                 70524781
SECOND_OF_PAIR  PF_READS                    70524781
SECOND_OF_PAIR  PCT_PF_READS                1
SECOND_OF_PAIR  PF_NOISE_READS              0
SECOND_OF_PAIR  PF_READS_ALIGNED            69469731
SECOND_OF_PAIR  PCT_PF_READS_ALIGNED        0.98504
SECOND_OF_PAIR  PF_ALIGNED_BASES            9752646754
SECOND_OF_PAIR  PF_HQ_ALIGNED_READS         50858464
SECOND_OF_PAIR  PF_HQ_ALIGNED_BASES         7273428558
SECOND_OF_PAIR  PF_HQ_ALIGNED_Q20_BASES     7233606433
SECOND_OF_PAIR  PF_HQ_MEDIAN_MISMATCHES     2
SECOND_OF_PAIR  PF_MISMATCH_RATE            0.035133
SECOND_OF_PAIR  PF_HQ_ERROR_RATE            0.032285
SECOND_OF_PAIR  PF_INDEL_RATE               0.00263
SECOND_OF_PAIR  MEAN_READ_LENGTH            148.983855
SECOND_OF_PAIR  READS_ALIGNED_IN_PAIRS      69198407
SECOND_OF_PAIR  PCT_READS_ALIGNED_IN_PAIRS  0.996094
SECOND_OF_PAIR  BAD_CYCLES                  0
SECOND_OF_PAIR  STRAND_BALANCE              0.500204
SECOND_OF_PAIR  PCT_CHIMERAS                0.182996
SECOND_OF_PAIR  PCT_ADAPTER                 0.000001
PAIR            TOTAL_READS                 141049562
PAIR            PF_READS                    141049562
PAIR            PCT_PF_READS                1
PAIR            PF_NOISE_READS              0
PAIR            PF_READS_ALIGNED            138923332
PAIR            PCT_PF_READS_ALIGNED        0.984926
PAIR            PF_ALIGNED_BASES            19373541135
PAIR            PF_HQ_ALIGNED_READS         101663012
PAIR            PF_HQ_ALIGNED_BASES         14448037671
PAIR            PF_HQ_ALIGNED_Q20_BASES     14339284563
PAIR            PF_HQ_MEDIAN_MISMATCHES     2
PAIR            PF_MISMATCH_RATE            0.035387
PAIR            PF_HQ_ERROR_RATE            0.03256
PAIR            PF_INDEL_RATE               0.002613
PAIR            MEAN_READ_LENGTH            148.00375
PAIR            READS_ALIGNED_IN_PAIRS      138396814
PAIR            PCT_READS_ALIGNED_IN_PAIRS  0.99621
PAIR            BAD_CYCLES                  0
PAIR            STRAND_BALANCE              0.500164
PAIR            PCT_CHIMERAS                0.182996
PAIR            PCT_ADAPTER                 0.000003

When I rerun the alignment summary metrics on the bam with duplicated marked, these questionable values seem to be rectified. The following are the results:

FIRST_OF_PAIR   TOTAL_READS                 70524781
FIRST_OF_PAIR   PF_READS                    70524781
FIRST_OF_PAIR   PCT_PF_READS                1
FIRST_OF_PAIR   PF_NOISE_READS              0
FIRST_OF_PAIR   PF_READS_ALIGNED            69453601
FIRST_OF_PAIR   PCT_PF_READS_ALIGNED        0.984811
FIRST_OF_PAIR   PF_ALIGNED_BASES            9620894381
FIRST_OF_PAIR   PF_HQ_ALIGNED_READS         50804548
FIRST_OF_PAIR   PF_HQ_ALIGNED_BASES         7174609113
FIRST_OF_PAIR   PF_HQ_ALIGNED_Q20_BASES     7105678130
FIRST_OF_PAIR   PF_HQ_MEDIAN_MISMATCHES     2
FIRST_OF_PAIR   PF_MISMATCH_RATE            0.035645
FIRST_OF_PAIR   PF_HQ_ERROR_RATE            0.03284
FIRST_OF_PAIR   PF_INDEL_RATE               0.002595
FIRST_OF_PAIR   MEAN_READ_LENGTH            147.023646
FIRST_OF_PAIR   READS_ALIGNED_IN_PAIRS      69198407
FIRST_OF_PAIR   PCT_READS_ALIGNED_IN_PAIRS  0.996326
FIRST_OF_PAIR   BAD_CYCLES                  0
FIRST_OF_PAIR   STRAND_BALANCE              0.500124
FIRST_OF_PAIR   PCT_CHIMERAS                0.182996
FIRST_OF_PAIR   PCT_ADAPTER                 0.000006
SECOND_OF_PAIR  TOTAL_READS                 70524781
SECOND_OF_PAIR  PF_READS                    70524781
SECOND_OF_PAIR  PCT_PF_READS                1
SECOND_OF_PAIR  PF_NOISE_READS              0
SECOND_OF_PAIR  PF_READS_ALIGNED            69469731
SECOND_OF_PAIR  PCT_PF_READS_ALIGNED        0.98504
SECOND_OF_PAIR  PF_ALIGNED_BASES            9752646754
SECOND_OF_PAIR  PF_HQ_ALIGNED_READS         50858464
SECOND_OF_PAIR  PF_HQ_ALIGNED_BASES         7273428558
SECOND_OF_PAIR  PF_HQ_ALIGNED_Q20_BASES     7233606433
SECOND_OF_PAIR  PF_HQ_MEDIAN_MISMATCHES     2
SECOND_OF_PAIR  PF_MISMATCH_RATE            0.035133
SECOND_OF_PAIR  PF_HQ_ERROR_RATE            0.032285
SECOND_OF_PAIR  PF_INDEL_RATE               0.00263
SECOND_OF_PAIR  MEAN_READ_LENGTH            148.983855
SECOND_OF_PAIR  READS_ALIGNED_IN_PAIRS      69198407
SECOND_OF_PAIR  PCT_READS_ALIGNED_IN_PAIRS  0.996094
SECOND_OF_PAIR  BAD_CYCLES                  0
SECOND_OF_PAIR  STRAND_BALANCE              0.500204
SECOND_OF_PAIR  PCT_CHIMERAS                0.182996
SECOND_OF_PAIR  PCT_ADAPTER                 0.000001
PAIR            TOTAL_READS                 141049562
PAIR            PF_READS                    141049562
PAIR            PCT_PF_READS                1
PAIR            PF_NOISE_READS              0
PAIR            PF_READS_ALIGNED            138923332
PAIR            PCT_PF_READS_ALIGNED        0.984926
PAIR            PF_ALIGNED_BASES            19373541135
PAIR            PF_HQ_ALIGNED_READS         101663012
PAIR            PF_HQ_ALIGNED_BASES         14448037671
PAIR            PF_HQ_ALIGNED_Q20_BASES     14339284563
PAIR            PF_HQ_MEDIAN_MISMATCHES     2
PAIR            PF_MISMATCH_RATE            0.035387
PAIR            PF_HQ_ERROR_RATE            0.03256
PAIR            PF_INDEL_RATE               0.002613
PAIR            MEAN_READ_LENGTH            148.00375
PAIR            READS_ALIGNED_IN_PAIRS      138396814
PAIR            PCT_READS_ALIGNED_IN_PAIRS  0.99621
PAIR            BAD_CYCLES                  0
PAIR            STRAND_BALANCE              0.500164
PAIR            PCT_CHIMERAS                0.182996
PAIR            PCT_ADAPTER                 0.000003

Any suggestions on what might be amiss would be greatly appreciate.

bwamem picardtools alignment dna • 1.6k views
ADD COMMENT
0
Entering edit mode

If you had trimmed the paired-end read files independently then it is possible that they reads went out of sync in the two files. You can use repair.sh from BBMap suite to fix that issue or re-do the trimming using both files in the same trim run.

ADD REPLY
0
Entering edit mode

I trimmed both reads together using

java -jar $TRIM/trimmomatic PE -phred33 -threads 16 -basein Resistant_R1 -baseout Resistant_Filtered.fastq ILLUMINACLIP:/mnt/research/common-data/Bio/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:100
ADD REPLY

Login before adding your answer.

Traffic: 2256 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6