Someone sent me some BAMs and that is all they sent, so I don't exactly know if they are analysis ready
After some googling I ran Qualimap on all the BAMs. I am wondering what the major QC metrics are to look out for
Mapping rate seemed to be the most obvious to me, looks north of 99% for the most part.
The other main things I seem to read about is duplication rate. It sounds like a high duplication rate could be produced by PCR artifacts. Qulalimap reports Duplicated reads (flagged)
at around 13ish % for many samples.
There are quite a few other metrics, but I am trying to figure out which ones would raise big red flags when looking at BAMs? Is it mostly mapping and duplication rate?
What quality metrics to look for, besides general run-specific metrics from the reads (e.g., the distribution of base qualities), will depend strongly on what's actually in the BAMs - WGS, WES, capture, amplicons, etc. For example, a high duplication rate may suggest libraries were overamplified for WES and should be deduped, whereas this is entirely expected for a panel of short amplicons (and you may not be able to dedup these).
The way I read this, 13% of my reads are duplicated, is this high for WES (which is what these are from)?
No, that's quite good as far as I've seen. Removing them may help reduce false positives from your variant calling, and you won't lose much information by doing so.
Thank you very much!