Question

Comparing WGS, WXS, and SNP Array data

0

Entering edit mode

8.6 years ago

novice ★ 1.1k

Hi

I have samples that are processed in three ways: whole genome seqeuncing, whole exome sequencing, and infinium SNP array. I'm looking for suggestions on how I could compare these data to see how much variance there exists simply due to using different technologies. Specifically, I'm interested in copy number analysis. My initial thought is to obtain the log ratios for each and then see the correlation in log ratio between different methods. I can get the log ratio for SNP array data, but I don't know how to do it for WGS or WES. Has anyone done something similar in the past? I also can't seem to find any recent work that has done this kind of work before, so I would appreciate any pointers.

SNP wgs • 6.9k views

ADD COMMENT • link updated 8.6 years ago by charco ▴ 50 • written 8.6 years ago by novice ★ 1.1k

1

Entering edit mode

To look for differences, I'd compare SNPs, indels, etc for base difference, position difference, even the call quality. But for CNVs, I am not sure if the SNP array will cooperate unless your SNP array results are different from what I have seen. In general, don't you usually get a genotype call per locus for each sample with SNP array? That said, I have seen people run PCR/qPCR with fluorescence-labeled SNP tags though to get an idea of copy number. Maybe you have this kind of data.

ADD REPLY • link 8.6 years ago by berge2015 ▴ 110

1

Entering edit mode

8.6 years ago

rkostadi ▴ 60

The key is to get the break points right. See if the 3 segmented wgs wxs array profiles get the same or different break points for CN events. Segmentation is art. Also, all 3 platforms will give you allelic imbalance information, use it. Evaluate # of events called by 1,2,3, concordance in break point positions, etc. Segmentation methods like to smooth profiles, whereas the genome is not smooth at break points it is a discrete "cut". Signal intensity - log r ratio, and read depth will vary, wxs will be wild due to gc bias, nad capture, wgs will have low read depth, array will probably not have a good dynamic range.

Good luck.

ADD COMMENT • link 8.6 years ago by rkostadi ▴ 60

score 1 · Accepted Answer · 2016-10-03

1

Entering edit mode

8.6 years ago

charco ▴ 50

The resolution of SNP arrays WGS and WXS is quite different. Generally WGS and WXS will be able to call more focal copy number changes. It is important to take this into account in your comparison.

There are various software packages for calling copy numbers from sequencing, far too many to list here. I provide some examples of packages I have used.

This works on tumour samples: https://sites.google.com/site/oncosnp/ https://sites.google.com/site/oncosnpseq/

For WXS and WGS, log ratios could be obtained using CopywriteR: https://www.bioconductor.org/packages/devel/bioc/html/CopywriteR.html Integer copy numbers could come from facets: https://github.com/mskcc/facets

ADD COMMENT • link 8.6 years ago by charco ▴ 50

0

Entering edit mode

I've been working with CopwriteR and it does exactly what I was looking for; thanks. However, it is extremely slow on WGS data. Do you know of a more efficient (probably by being more parallelizable) tool?

ADD REPLY • link 8.6 years ago by novice ★ 1.1k

0

Entering edit mode

I couldn't quite tell from you comment - are you using the parallel functionality of CopywriteR?

ADD REPLY • link 8.5 years ago by charco ▴ 50

0

Entering edit mode

Yes. The problem is that CopywriteR is only parallel in the sense that it can work with multiple samples at the same time.

ADD REPLY • link 8.5 years ago by novice ★ 1.1k