To Understand The Output Of The Diagnosetarget And Depthofcoverage Walker In Gatk
1
0
Entering edit mode
11.1 years ago
ivivek_ngs ★ 5.2k

Hello All,

I have run the DepthofCoverage and the DaignoseTarget walker for my tumor samples and its IPS lines. I am finding it difficult to settle with the Diagnose Targets output but the DepthofCoverage outputs are quite clear. I have some query I would like to get it clarified. I was looking at the file _cumulative_coverage_counts which helps to draw a histogram with cumulative frequency of all bases that have been mapped on the exomes. Here I see two columns at the beginning

           gte_0  gte_1

NSamples_1 80050421 64522700

Can you tell me what does gte stands for and if I want to understand what is the highest number of reads that got mapped on the exonic region with my samples it should the 64 million read count in the second column right? As this number varies across all samples but the gte_0 is same across all the samples and the same 80 million reads is showing up in all samples. So if I think of the reads that mapped ultimately on the exome it should be second column right? I am trying to understand how many reads of my aligned bam file ultimately got mapped on the exonic regions. I am doing it for a QC purpose. As for the reference genome I had a mapping of around 98% so I want to know how much reads got mapped on the exome as the FastQC report showed I had well over 50% duplicates so just to be sure if my reads that got mapped on the exome are near to that value or much less than that. It would be nice if someone can provide some statistics regarding the reads alignment on exome. I read from the wiki page of exome analysis that 60-70% reads map on the exomes provided the duplicates are removing and the errors as well but am a bit perplexed with my stas so need some advice from experts who already did such analysis with QC.Please suggest me.

Regards,

Vivek

exome-sequencing gatk qc • 4.7k views
ADD COMMENT
1
Entering edit mode

"gte" = "greater or equal than"

ADD REPLY
0
Entering edit mode

Thank you for the reply Pierre but I would like to know which column shows me the reads that got mapped on the exonic region? is it the first with gte_0 or gte_1 as the first value is over 80 million which is same in all samples but the second column shows different cumulative read counts that got mapped on the exome. So it should be that value right?

ADD REPLY
0
Entering edit mode

Can anyone give me suggestions here?

ADD REPLY
0
Entering edit mode
10.9 years ago
JuJo ▴ 10

Hi,

"As for the reference genome I had a mapping of around 98% so I want to know how much reads got mapped on the exome"

You could calculate this (dependent on the bases, not the reads) on your own by using the .DepthofCoverage file.

Header of the file should be:

Locus    Total_Depth    Average_Depth_sample    Depth_for_yoursample

Just grep all the bases where the depth is zero and count them. Divided by all lines of the .DepthofCoverage (-1 for header...) you will get the percentage of bases which are not covered.

Regards,

JuJo

ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6