Hi All,
I am analyzing CNV data from PennCNV and I want to know if it is important to filter out CNVs overlapping segmental duplication region?
I would appreciate your comments,
Hi All,
I am analyzing CNV data from PennCNV and I want to know if it is important to filter out CNVs overlapping segmental duplication region?
I would appreciate your comments,
Are you using Affymetrix or Illumina array data? Since so many CNVs are ostensibly caused by segmental duplication, your question is valid. In practice, we have found that the most important things in filtering your output are 1) merging long CNV calls interrupted by a minority of markers 2) dropping out telomere regions, 3) dropping out centromeric regions, 4)filtering out samples with excessively high LRR standard deviations, 5) filtering out samples with excessive CNV calls, 6) removing CNVs shorter than some threshold number of probes/length (e.g. 10 probes, 10kb depending on your chip.) The segmental duplications have not proved too much an obstacle, but if you think it could be upsetting your data, take your BED file of called CNVs and see the extent to which it overlaps segmental duplications in the UCSC genome browser. My coordinate for the hg18 centromeric and telomeric regions are below. Sorry for the length.
Centromere:
chr1:121100001-128000000
chr2:91000001-95700000
chr3:89400001-93200000
chr4:48700001-52400000
chr5:45800001-50500000
chr6:58400001-63400000
chr7:57400001-61100000
chr8:43200001-48100000
chr9:46700001-60300000
chr10:38800001-42100000
chr11:51400001-56400000
chr12:33200001-36500000
chr13:13500001-18400000
chr14:13600001-19100000
chr15:14100001-18400000
chr16:34400001-40700000
chr17:22100001-23200000
chr18:15400001-17300000
chr19:26700001-30200000
chr20:25700001-28400000
chr21:10000001-13200000
chr22:9600001-16300000
chrX:56600001-65000000
chrY:11200001-12500000
Telomeres:
chr1:1-500000
chr1:246749719-247249719
chr2:1-500000
chr2:242451149-242951149
chr3:1-500000
chr3:199001827-199501827
chr4:1-500000
chr4:190773063-191273063
chr5:1-500000
chr5:180357866-180857866
chr6:1-500000
chr6:170399992-170899992
chr7:1-500000
chr7:158321424-158821424
chr8:145774826-146274826
chr8:1-500000
chr9:139773252-140273252
chr9:1-500000
chr10:134874737-135374737
chr10:1-500000
chr11:133952384-134452384
chr11:1-500000
chr12:131849534-132349534
chr12:1-500000
chr13:113642980-114142980
chr13:1-500000
chr14:105868585-106368585
chr14:1-500000
chr15:1-500000
chr15:99838915-100338915
chr16:1-500000
chr16:88327254-88827254
chr17:1-500000
chr17:78274742-78774742
chr18:1-500000
chr18:75617153-76117153
chr19:1-500000
chr19:63311651-63811651
chr20:1-500000
chr20:61935964-62435964
chr21:1-500000
chr21:46444323-46944323
chr22:1-500000
chr22:49191432-49691432
chrX:1-500000
chrX:154413754-154913754
Just grab the coordinates yourself from the ucsc genome browser: http://genome.ucsc.edu/cgi-bin/hgTables?command=start under Mapping and Sequencing Tracks :: Gap
What are you asking here? Why some of your CNV call overlaps with a centromere? That could be bad or okay, depending on how the rest of the call looks. (If it's a whole chromosome or otherwise large amplification, then it's probably fine. If the call is mostly in the centromere, it's probably crap). In any case, please start a new question if you need followup help.
Hi Chris,
thanks for helpful answer. my question was about the UCSC table :: GAP. I grabed the coordinate of centromeres from UCSC table but I don't know why it gives me half of the centromeric region (as an example please try this coordinate in UCSC : chr1:121535434-124535434) . is there any options i missed to check in UCSC table pages to give me the whole centromeres regions? thank you
Hi Ryan, I would like to know how do you do to merge CNVs. I've just obtained errors from PennCNV...
tnaks
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
The answer very much depends on what you're trying to do with the data. Could you give us some more details on what you're looking for? Do you have an individual and you're looking for rare, disease-causing CNVs, or are you looking a a population and trying to identify shared and common CNVs? If it's a tumor, do you have a matched normal sample? etc.
I agree with Chris here: the questions rests on what is "important" for you: If you're trying to find rare CNVs, you'll probably want to filter. If you're trying to get a general census, you prob want to keep them...