Whole Genome Sequence (WGS) throughput (FastQC, Picard)
1
0
Entering edit mode
9.5 years ago
toshnam ▴ 650

Hi,

I have a human WGS sequences formatted with FASTQ from HiSeq2000.

In a raw sequence QC using FastQC, I knew that Total Sequences (fastqc_data.txt) is about 436Mb.

After an alignment using BWA MEM, I also knew that Genome territory is about 2,86Gb and Mean coverage is about 31X from CollectWgsMetrics of Picard.

Genome territory means that the number of non-N bases in the genome reference over which coverage will be evaluated and Mean coverage means that the mean coverage in bases of the genome territory, after all filters are applied.

I can't understand the relation between Total sequences(FastQC) and Mean coverage(Picard). How can 31X cover to 2.86Gb genome using 436Mb sequences? Please explain about these relations.

Plus, hundreds of mega bases throughput is normal in WGS?

Any comments are welcome.

Thank you.

Eric

WGS FastQC Picard • 3.4k views
ADD COMMENT
0
Entering edit mode

Those numbers don't add up. A fastq file that produces 31x coverage of a human-sized genome will be many gigabytes. One of those numbers that you're seeing must be wrong (or there are multiple fastq files).

ADD REPLY
0
Entering edit mode

How did you come up with the 436Mb? You multiplied "Total Sequences" line from fastqc_data.txt file with the "Sequence length" line?

ADD REPLY
4
Entering edit mode
9.5 years ago
Zaag ▴ 870

( 436 M reads * 2 (PE) * 101 (readlength) ) / 2,86 (genome size) = 31

So I guess it is Millions of reads, not Mb

ADD COMMENT
0
Entering edit mode

I think it makes sense. I'll ask FastQC developer about the definition of "Total Sequences".

Thank you for your reply.

ADD REPLY

Login before adding your answer.

Traffic: 1672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6