Hello,
I tested two analyse pipelines for targeted paired end DNA sequencing:
- mapping/alignment with bwa mem, followed by picards MarkDuplicates
- merge overlapping paired reads with bbmerge, mapping /alignment with bwa mem, followed by picards MarkDuplicates
I than run CollectHsMetrics. This is the result:
BAIT_SET GENOME_SIZE BAIT_TERRITORY TARGET_TERRITORY BAIT_DESIGN_EFFICIENCY TOTAL_READS PF_READS PF_UNIQUE_READS PCT_PF_READS PCT_PF_UQ_READS PF_UQ_READS_ALIGNED PCT_PF_UQ_READS_ALIGNED PF_BASES_ALIGNED PF_UQ_BASES_ALIGNED ON_BAIT_BASES NEAR_BAIT_BASES OFF_BAIT_BASES ON_TARGET_BASES PCT_SELECTED_BASES PCT_OFF_BAIT ON_BAIT_VS_SELECTED MEAN_BAIT_COVERAGE MEAN_TARGET_COVERAGE MEDIAN_TARGET_COVERAGE PCT_USABLE_BASES_ON_BAIT PCT_USABLE_BASES_ON_TARGET FOLD_ENRICHMENT ZERO_CVG_TARGETS_PCT PCT_EXC_DUPE PCT_EXC_MAPQ PCT_EXC_BASEQ PCT_EXC_OVERLAP PCT_EXC_OFF_TARGET FOLD_80_BASE_PENALTY PCT_TARGET_BASES_1X PCT_TARGET_BASES_2X PCT_TARGET_BASES_10X PCT_TARGET_BASES_20X PCT_TARGET_BASES_30X PCT_TARGET_BASES_40X PCT_TARGET_BASES_50X PCT_TARGET_BASES_100X HS_LIBRARY_SIZE HS_PENALTY_10X HS_PENALTY_20X HS_PENALTY_30X HS_PENALTY_40X HS_PENALTY_50X HS_PENALTY_100X AT_DROPOUT GC_DROPOUT HET_SNP_SENSITIVITY HET_SNP_Q SAMPLE LIBRARY READ_GROUP
A 3095693983 6850692 6850692 1 21580058 21580058 18330522 1 0,849419 18306903 0,998711 3075391546 2599640506 1447696719 709339799 918355028 889936855 0,701386 0,298614 0,671151 211,321239 129,904666 123 0,46798 0,287679 212,716292 0,010396 0,154696 0,029064 0,005036 0,25095 0,656279 1,732062 0,990236 0,989601 0,98554 0,977191 0,963095 0,941365 0,911599 0,652761 18811463 5,176015 5,299345 5,426413 5,557217 5,707456 6,649249 3,923906 2,925639 0,98916 20
B 3095693983 6850692 6850692 1 12559951 12559951 6959842 1 0,55413 6933545 0,996222 2140104653 1150765565 960680816 542955905 636467932 369181943 0,7026 0,2974 0,638905 140,231208 53,88973 53 0,445238 0,171102 202,846579 0,01021 0,462285 0,020848 0,003892 0,011729 0,506522 1,456479 0,990324 0,989492 0,982661 0,958973 0,89456 0,764513 0,573995 0,020449 7065860 4,791253 5,088152 5,438156 5,862988 6,395959 -1 2,330678 2,474334 0,988827 20
Until now I cannot explain where the big differences between the coverage data come from. I thought that CollectHsMetrics counts overlapping paired reads as 1?
Any ideas?
fin swimmer
What did you do with bbmerge output? It can output merged reads and the unmerged reads. Did you map both merged and unmerged reads (then combine the BAM files) or just mapped only the merged reads with bwa mem?
Sorry, i have forgot to mention it. I mapped both, the merged and unmerged reads and combined the bam files afterwards.
fin swimmer