Question

GC Bias in Sequence Capture - Analysis & Quantification

0

Entering edit mode

6.0 years ago

jnowacki ▴ 110

QUESTION
I have a bed file & alignment file (bam). I'd like to know if the reads contained inside the bam are biased towards GC rich regions.

RESEARCH

Picard CollectGcBiasMetrics does a great job with whole genome sequencing but doesn't take bed files as input.

Picard HsMetrics can tell me GC % & coverage per region but not all regions are the same size.

I thought about taking the original bedfile and creating a new bed file of sliding 100 base pair windows with bedtools makewindows and then analyzing with Picard HsMetrics. The problem with that is if I use a window of 100 bp and step of 10 bp then large capture regions will have more windows than small capture regions.

For example: ORIGINAL BED FILE ----> NEW BED FILE
100 bp region ----> 1 window of 100 base pairs, 1x coverage in new bed file
1000 bp region ----> 90 windows of 100 base pairs. 10x the size but 90x as many windows.

If I pad things out then there will be windows that have 0 depth of coverage by the bam file and these will be clustered around areas with few base pairs (small capture regions).

Gcbias sequencing sequence capture bed picard • 1.8k views

ADD COMMENT • link updated 5.9 years ago by Biostar 20 • written 6.0 years ago by jnowacki ▴ 110

0

Entering edit mode

Hello jnowacki ,

could you please explain what you hope to find out and why the output of HsMetrics isn't suitable for your needs?

Thanks.

fin swimmer

ADD REPLY • link 6.0 years ago by finswimmer 16k

0

Entering edit mode

I'm trying to detect GC bias. The problem with the normal HsMetrics output is not all of the bed regions are the same size so any binning/graphing will heavily weight the results towards whatever GC content is contained in average of the smaller regions.

ADD REPLY • link 6.0 years ago by jnowacki ▴ 110