Hello All,
I have run the DepthofCoverage and the DaignoseTarget walker for my tumor samples and its IPS lines. I am finding it difficult to settle with the Diagnose Targets output but the DepthofCoverage outputs are quite clear. I have some query I would like to get it clarified. I was looking at the file _cumulative_coverage_counts which helps to draw a histogram with cumulative frequency of all bases that have been mapped on the exomes. Here I see two columns at the beginning
gte_0 gte_1
NSamples_1 80050421 64522700
Can you tell me what does gte stands for and if I want to understand what is the highest number of reads that got mapped on the exonic region with my samples it should the 64 million read count in the second column right? As this number varies across all samples but the gte_0 is same across all the samples and the same 80 million reads is showing up in all samples. So if I think of the reads that mapped ultimately on the exome it should be second column right? I am trying to understand how many reads of my aligned bam file ultimately got mapped on the exonic regions. I am doing it for a QC purpose. As for the reference genome I had a mapping of around 98% so I want to know how much reads got mapped on the exome as the FastQC report showed I had well over 50% duplicates so just to be sure if my reads that got mapped on the exome are near to that value or much less than that. It would be nice if someone can provide some statistics regarding the reads alignment on exome. I read from the wiki page of exome analysis that 60-70% reads map on the exomes provided the duplicates are removing and the errors as well but am a bit perplexed with my stas so need some advice from experts who already did such analysis with QC.Please suggest me.
Regards,
Vivek
"gte" = "greater or equal than"
Thank you for the reply Pierre but I would like to know which column shows me the reads that got mapped on the exonic region? is it the first with gte_0 or gte_1 as the first value is over 80 million which is same in all samples but the second column shows different cumulative read counts that got mapped on the exome. So it should be that value right?
Can anyone give me suggestions here?