Hi, I might as well ask this community as well just to be certain
Background: I mapped my reads to reference genome, then used SAM tool flagstat and SAM tool stats. Felt like using both but at the same time felt like it didn't help. Now I have my mapping statistics that I can use to calculate the sequence depth. The read depth for whole exome sequencing = (Total number of reads X average read length)/total length of all exons. Sounds easy but I have a couple of problems.
Which statistic am I supposed to use exactly for my calculation? Is it the total reads mapped? The reads that have been mapped AND paired? Am I supposed to subtract the unmapped reads from the total mapped reads in my SAMtool stat output? Or do I just use the mapped reads from SAMtool flagstat? It is confusing!
Where am I supposed to obtain the length of all exons? Note: I'm not using R or any coding tool, I am only using databases (UCSC, NCBI, Esembl etc) and Galaxy.
Thanks for your time.
You would do this over the boundaries of the regions that were captured in your data. You can get the BED files for these regions from manufacturers of kit that you used for your experiment.
Here is one way to do this: Compute mean depth coverage for exome data with paired end, overlapping, features
mosdepth
(LINK) is an excellent option. Again used with your BAM and BED files.second mosdepth for this type of task