Question

Quality of whole genome sequencing and whole exome sequencing

0

Entering edit mode

10.1 years ago

devinliao0918 ▴ 40

I have some whole genome sequenced data with coverage ~6x and some whole exome sequenced data with coverage ~60x. The sequencing platforms are the same for the above data. However, I found there are more low phred scores in the whole genome sequenced data than that in the whole exome sequenced data. Could anyone tell me whether this is cased by the difference in coverages? Say deep sequencing leads to more high quality reads.

next-gen • 3.0k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by devinliao0918 ▴ 40

Ram · Answer 1 · 2014-10-01

1

Entering edit mode

10.1 years ago

Devon Ryan 104k

Are you really talking about base-call phred scores or do you mean scores associated with variant calls? The former are unaffected by depth, the latter highly affected by it.

ADD COMMENT • link 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

I am talking about the base-call phred scores. All the scores are extracted from pileup files generated by "samtools pileup". I didn't call variants.

Also, the base-quality scores are sort of truncated at 41 because there is none phred score greater than 41 in the WGS data. The wiki page for FASTQ format says the following

For raw reads, the range of scores will depend on the technology and the base caller used, but will typically be up to 41 for recent Illumina chemistry. Since the maximum observed quality score was previously only 40, various scripts and tools break when they encounter data with quality values larger than 40. For processed reads, scores may be even higher. For example, quality values of 45 are observed in reads from Illumina's Long Read Sequencing Service (previously Moleculo).

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by devinliao0918 ▴ 40

1

Entering edit mode

Phred scores are unlikely to be affected by WGS vs. exome seq.

ADD REPLY • link 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

I agree with you. I guess the difference may be caused by different technologies, though I know both the WES and WGS are done on Illumina HiSeq 2000.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by devinliao0918 ▴ 40

0

Entering edit mode

I agree, this is more likely due to the library prep or difference in the source DNA.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by User 59 13k