Hi,
I ran DepthOfCoverage using a RefSeq interval list to use with the -genelist
argument.
I obtained a gene summary table displaying notably the 'total coverage' and 'mean coverage' values (see here for an example).
I annotated the file and got this kind of values:
Gene total coverage mean coverage
AGL 54765 5.75
AGMAT 1089 0.33
Q1: Does the 'total coverage' value is the sum of all the bases or reads (across all my samples) mapped in each RefSeq interval provided?
Q2: How is the 'mean coverage' calculated or related to the 'total coverage'? Does it takes the read length or gene length (or something else) into account? Is it the value usually reported in the papers as 100X coverage for example?
What output format is your -genelist for refseq. I tried bed, gtf and all fields from UCSC and none have worked for me. Can you attach this list?
Hello,
Did you sort the file according to your reference genome?
Method used to download the refseq file: GATK RefSeq file download instructions Only "all fields from UCSC" option is recommended by GATK (which worked for me)
As I use GATK 4.1.7 there is a bug where your genelist file name should end with .refSeq not .txt as the tool does not recognize it.
Following script was suggested in some posts regarding sorting of the refseq file. Suggested script link sortByRef.pl
This script did not work for me. I had to write an ad-hoc python script to create the sorted file and then the command ran smoothly.
The header of the file:
#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
Hello,
total coverage = Total reads in the region specified (for all the samples)
mean coverage = Average reads per base in the region specified