Picard outputs a quality metric named PCT_USABLE_BASES_ON_TARGET, which is the number of aligned, de-duped, on-target bases out of the PF bases available.
For a successful exome sequencing experiment, what minimum percentages are acceptable here?
Picard outputs a quality metric named PCT_USABLE_BASES_ON_TARGET, which is the number of aligned, de-duped, on-target bases out of the PF bases available.
For a successful exome sequencing experiment, what minimum percentages are acceptable here?
It's not as much about percentages, as having sufficient coverage of your target.
In general, the companies that make exon capture kits promise about 60% on target. So if you are in that ballpark, the library is okay.
Yes, but what if of the 60% of on-target reads, 50% are duplicates that you would through away by default? Would you consider this library to be still ok?
That's why I like the PCT_USABLE_BASES_ON_TARGET metric, because it takes duplicate reads into account ("de-duped"). What I am still wondering is what typical values other users get for this metric in their exome sequencing experiments.
There is not really a strict threshold for this metric as far as I can see, as it is obviously related to the number of reads you have to begin with and then tells you something about your coverage depth. If your coverage depth is high enough I wouldn't worry - although obviously that depends on what you are trying to do.
I would see this metric more as a diagnostic test to run if I have unusually low or high coverage depth of my target, or the coverage uniformity is biased toward one end of the target. This metric can answer the question of 'why do I have such good/bad coverage depth' (is it because of not enough runs, or because we aren't hitting the target, or maybe because we filtered on quality too aggressively?), but I can't see what else it could tell you.
I'm willing to be corrected here - I am going on what I have read of the PICARD manual.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Found this statement on genohub:
But AFAIK and already discussed below, this metric does not take duplicate reads into account.