How to find On-Target and Off-Targer percentage of reads?
2
2
Entering edit mode
7.8 years ago

Hi friends,

Recently, we performed exome sequencing for 3 samples using Nextera sequencing machine. We used new kit for exome sequencing. So I am interested in finding out the on-target and off-target percentage of reads from my exome sequencing run.

This is my understanding, On-target - Reads that are aligned to the regions that are targeted (exome regions as per the manifest file). Off-target - Reads that are aligned to the regions which are not targeted.

What should be the percentage of reads covering on-target region? and How can we calculate the on-target reads from BAM file? Which tool is useful in getting the percentage of reads covering on-target and off-target regions?

DNAseq Exome targeted sequencing bam • 15k views
ADD COMMENT
0
Entering edit mode

Thanks, genomax2. I am currently working on the output files from picard.

ADD REPLY
0
Entering edit mode

Hi Genomax2,

This is the output I got from picard hsmetrics. I am getting the on-target bases and on-bait bases (bait otherwise probes used for exon capturing). I used two kinds of bam file, a) just sorted bam and b) sorted, deduped, recalibrated bam. I believe that I should stick with the clean.dedup.recal.bam statistics.

1) What is vendor's filter?

2) I could see off_bait_bases, but not off-target_bases.

3) Will I be able to get the reads percentage for on-target and off-target?

The reason why I want reads on-target is,

A base within a read is considered on target if it is aligned with a targeted region. A read is considered on target if a single base within a read aligns to a targeted region. Measuring reads on target might be more accurate in representing the target fragments.

enter image description here

ADD REPLY
0
Entering edit mode

Dear Friend, I have a sample whose read count is 51,578,482. After Duplicate removal, i have about 46,393,168 reads with me!! Mapped read count is 46,676,207(99.36%) & on Target read count is 38,221,722 (81.36). Do u think it is a good output? What should be the ideal on target mappability in term of %. I have used Agilent V6+UTR, 150 PE

Please give your valuable output!!

ADD REPLY
1
Entering edit mode
7.8 years ago
aham ▴ 40

You can determine coverage using GATK's DepthOfCoverage walker. For 'on target' coverage, you can specify an interval list (bed file), for which GATK will calculate coverage.

java -Xms12g -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R hg19.fa -o out_depth_file -I input_deplicate_removed.bam -pt readgroup -ct 4 -ct 6 -ct 10 -L exome_capture_kit.bed

For detailed information on the parameters: https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_coverage_DepthOfCoverage.php
For percentage of reads covering on-target region: Exome sequencing generates high quality data in non-target regions

ADD COMMENT
0
Entering edit mode

Thanks, Mshakeel. I will read the links you provided.

ADD REPLY
0
Entering edit mode
7.8 years ago
dyollluap ▴ 310

Picard tools can give those alignment stats as a stand alone tool - you just need the associated bed file for the exome capture kit specific to your protocol. It will generage a few output txt files, you want the HsMetrics and within that you will find the targeted coverage percentage.

ADD COMMENT
0
Entering edit mode

Thanks, Dyolluap. I used Picard tools HSmetrics.

ADD REPLY
0
Entering edit mode

We used HSmetics too. But some large WXS files have extreme memory requirement. Sometimes we have to double memory allocation from 16G to 32G, or even 64G to get this done. We don't see similar problems with WGSmetrics no matter how big the file is. Does anyone know alternatives of HSmetrics?

ADD REPLY

Login before adding your answer.

Traffic: 1740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6