Comparing WGS, WXS, and SNP Array data
2
0
Entering edit mode
8.2 years ago
novice ★ 1.1k

Hi

I have samples that are processed in three ways: whole genome seqeuncing, whole exome sequencing, and infinium SNP array. I'm looking for suggestions on how I could compare these data to see how much variance there exists simply due to using different technologies. Specifically, I'm interested in copy number analysis. My initial thought is to obtain the log ratios for each and then see the correlation in log ratio between different methods. I can get the log ratio for SNP array data, but I don't know how to do it for WGS or WES. Has anyone done something similar in the past? I also can't seem to find any recent work that has done this kind of work before, so I would appreciate any pointers.

SNP wgs • 6.0k views
ADD COMMENT
1
Entering edit mode

To look for differences, I'd compare SNPs, indels, etc for base difference, position difference, even the call quality. But for CNVs, I am not sure if the SNP array will cooperate unless your SNP array results are different from what I have seen. In general, don't you usually get a genotype call per locus for each sample with SNP array? That said, I have seen people run PCR/qPCR with fluorescence-labeled SNP tags though to get an idea of copy number. Maybe you have this kind of data.

ADD REPLY
1
Entering edit mode
8.2 years ago
charco ▴ 50

The resolution of SNP arrays WGS and WXS is quite different. Generally WGS and WXS will be able to call more focal copy number changes. It is important to take this into account in your comparison.

There are various software packages for calling copy numbers from sequencing, far too many to list here. I provide some examples of packages I have used.

This works on tumour samples: https://sites.google.com/site/oncosnp/ https://sites.google.com/site/oncosnpseq/

For WXS and WGS, log ratios could be obtained using CopywriteR: https://www.bioconductor.org/packages/devel/bioc/html/CopywriteR.html Integer copy numbers could come from facets: https://github.com/mskcc/facets

ADD COMMENT
0
Entering edit mode

I've been working with CopwriteR and it does exactly what I was looking for; thanks. However, it is extremely slow on WGS data. Do you know of a more efficient (probably by being more parallelizable) tool?

ADD REPLY
0
Entering edit mode

I couldn't quite tell from you comment - are you using the parallel functionality of CopywriteR?

ADD REPLY
0
Entering edit mode

Yes. The problem is that CopywriteR is only parallel in the sense that it can work with multiple samples at the same time.

ADD REPLY
1
Entering edit mode
8.2 years ago
rkostadi ▴ 60

The key is to get the break points right. See if the 3 segmented wgs wxs array profiles get the same or different break points for CN events. Segmentation is art. Also, all 3 platforms will give you allelic imbalance information, use it. Evaluate # of events called by 1,2,3, concordance in break point positions, etc. Segmentation methods like to smooth profiles, whereas the genome is not smooth at break points it is a discrete "cut". Signal intensity - log r ratio, and read depth will vary, wxs will be wild due to gc bias, nad capture, wgs will have low read depth, array will probably not have a good dynamic range.

Good luck.

ADD COMMENT

Login before adding your answer.

Traffic: 2930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6