Question

Whole Genome Sequence (WGS) throughput (FastQC, Picard)

0

Entering edit mode

9.9 years ago

toshnam ▴ 650

Hi,

I have a human WGS sequences formatted with FASTQ from HiSeq2000.

In a raw sequence QC using FastQC, I knew that Total Sequences (fastqc_data.txt) is about 436Mb.

After an alignment using BWA MEM, I also knew that Genome territory is about 2,86Gb and Mean coverage is about 31X from CollectWgsMetrics of Picard.

Genome territory means that the number of non-N bases in the genome reference over which coverage will be evaluated and Mean coverage means that the mean coverage in bases of the genome territory, after all filters are applied.

I can't understand the relation between Total sequences(FastQC) and Mean coverage(Picard). How can 31X cover to 2.86Gb genome using 436Mb sequences? Please explain about these relations.

Plus, hundreds of mega bases throughput is normal in WGS?

Any comments are welcome.

Thank you.

Eric

WGS FastQC Picard • 3.6k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by toshnam ▴ 650

0

Entering edit mode

Those numbers don't add up. A fastq file that produces 31x coverage of a human-sized genome will be many gigabytes. One of those numbers that you're seeing must be wrong (or there are multiple fastq files).

ADD REPLY • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by Devon Ryan 105k

0

Entering edit mode

How did you come up with the 436Mb? You multiplied "Total Sequences" line from fastqc_data.txt file with the "Sequence length" line?

ADD REPLY • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by 5heikki 11k

Ram · Accepted Answer · 2015-06-01

4

Entering edit mode

9.9 years ago

Zaag ▴ 870

( 436 M reads * 2 (PE) * 101 (readlength) ) / 2,86 (genome size) = 31

So I guess it is Millions of reads, not Mb

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by Zaag ▴ 870

0

Entering edit mode

I think it makes sense. I'll ask FastQC developer about the definition of "Total Sequences".

Thank you for your reply.

ADD REPLY • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by toshnam ▴ 650