From Imputed Snps To Cnvs
1
2
Entering edit mode
12.1 years ago
romsen ▴ 70

Hello,

I try to describe it in more detail. :)

I have 300,000 SNP genome wide data (ILM HumanHap). And of course I have intensity values and B allele frequency. In principle every kind of value one can export from beadstudio.

For CNV detection I used Log R ratio and B allele frequency and the software PennCNV. This was successful. But because of low SNP density in some regions, you mentioned already, breakpoints are over/ underestimated or a CNV call is completely missing. For special regions I have positive control CNVs which were genotyped via TaqMan/PCR. But I can't call them via PennCNV because there are no SNPs on the array.

In a side project I made a imputation of the 300k SNP data with Impute2. Therefore I used genotype calls (AA, AB and BB).

Now I wonder about a method to use imputed SNP data (genotype calls AA, BB) for calculating CNVs. Over linkage disequilibrium or tag SNP Information or whatever :) I found this, but I'm not sure if it works with my data.

snp imputation cnv • 5.3k views
ADD COMMENT
1
Entering edit mode

I think your question is not totally clear (and that might be the reason why you get little response). So you have data on (GW=genome wide?) 300.000 SNPs? What kind of data is this? Genotype calls (AA,AB,BB), intensity/relative abundance in genome like probe intensity or read counts, B allele frequency? Plink is used to do Genome Wide Association Studies (GWAS) which is totally different from CNV-analysis. Furthermore, when you have data on 300.000 SNPs distributed across the whole genome you are fine with doing CNV-analysis either with PennCNV, QuantiSNP or any other CNV-tool. When you don't have much probes your limit of detection on the length of CNV-segments just becomes smaller. However people have done CNV analyses with much less probes/SNPs than 300.000

ADD REPLY
1
Entering edit mode

CNV analysis requires intensities. 300k snps is fine for CNV detection.

ADD REPLY
0
Entering edit mode

And how come that you dont have the intensity data? You sure its not in the public domain?

ADD REPLY
0
Entering edit mode

I edited and changed some things!

ADD REPLY
0
Entering edit mode

Did you manage to find a way to call CNVs from imputed SNP data?

ADD REPLY
1
Entering edit mode
12.1 years ago
Irsan ★ 7.8k

As far as I know, you can not estimate copy number states based on genotype calls. You can estimate Loss of Heterozygosity (LOH) based on the B Allele Frequencies (the numbers that are used to make genotype calls).

Since you have succesfully done CNV-analysis with PennCNV I assume you have the Log R Ratios (LRR)and B Allele Frequencies (BAF) somewhere. For copy number variation analysis and Loss of Heterozygosity (I think detecting LOH is what you are looking for) I would do the following using R/Bioconductor:

  1. Put the LRR of all samples in one GRanges object where each column in the elementMetadata represents the LRR-figures of 1 sample. Make sure the rows in the GRanges object are sorted by chr1,chr2,chr3 so not string sorted like chr1,chr10,chr11
  2. Perform CBS-segmentation on the GRanges object with fastseg
  3. Make a dot-plot of the LRR of each chromosome and each sample along with the CBS-segmented regions with ggplot (popular visualization that supports common bioconductor data formats like GRanges). Check in the LRR-dotplot whether the CBS-segmentation describes your data well. CBS is known for producing to much breakpoints. When you see that happening you might want to change the segmentation parameters until you feel the segmentation is optimized for your data
  4. Also, for each chromosome and sample visualize the B Allele Frequencies (for example with ggplot2) and look for regions where the the B Allele Frequency profiles change. I dont know if/how you can do segmentation on LOH...
ADD COMMENT

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6