Hi,
I have a human WGS sequences formatted with FASTQ from HiSeq2000.
In a raw sequence QC using FastQC, I knew that Total Sequences (fastqc_data.txt) is about 436Mb.
After an alignment using BWA MEM, I also knew that Genome territory is about 2,86Gb and Mean coverage is about 31X from CollectWgsMetrics of Picard.
Genome territory means that the number of non-N bases in the genome reference over which coverage will be evaluated and Mean coverage means that the mean coverage in bases of the genome territory, after all filters are applied.
I can't understand the relation between Total sequences(FastQC) and Mean coverage(Picard). How can 31X cover to 2.86Gb genome using 436Mb sequences? Please explain about these relations.
Plus, hundreds of mega bases throughput is normal in WGS?
Any comments are welcome.
Thank you.
Eric
Those numbers don't add up. A fastq file that produces 31x coverage of a human-sized genome will be many gigabytes. One of those numbers that you're seeing must be wrong (or there are multiple fastq files).
How did you come up with the 436Mb? You multiplied "Total Sequences" line from
fastqc_data.txt
file with the "Sequence length" line?