Regions to exclude in CNV analysis
4
0
Entering edit mode
9.1 years ago
wanabi ▴ 60

Hello,

I have called CNVs from my WGS data and want to do some QC. For this, I want to exclude segments overlapping with more than 50% of their length with the list below, and I have some doubts. Can you please help me?

. Telomeres/centromers
. Immunoglobulin regions
. Extreme GC content (>90%, <10%): Do these threshold make any sense. My CNVs were called with GC option in Control-Freec so I dont know if this is really necessary. Any thoughts?
. Mappabiliy: Should I use uniqness or alignability definitions to do this? In case I chose uniqness, should I just filter out all regions with uniqness < 1? In case I use alignability, which threshold would you use?
. Repeat masker
. Common CNVs: I am currently using the dgv. Is this recommendable?

. Any other list you would recommend me to use to clean my data?


Thanks a lot

CNV filter exclude • 2.9k views
ADD COMMENT
0
Entering edit mode

Can you please explain a little bit about Immunoglobulin regions? Why would that be a bias?

ADD REPLY
1
Entering edit mode
9.1 years ago
ebrown1955 ▴ 320

The first step is to get a list of regions that you want to eliminate (in BED format) and use Bedtools -overlap to compare your CNV list with the lists that you have. You can then filter these out in Excel or R (using which, etc.)

There may be a more elegant way to do this, but this is what I do.

Good luck!

ADD COMMENT
1
Entering edit mode
9.1 years ago
wanabi ▴ 60

Thanks for the info.

Yes, I am planning to exclude those regions either with plink or bedtools.

My question is more about the lists of regions to exclude, and specially, about the threshold people use for regions with low mappability.

Thanks

ADD COMMENT
1
Entering edit mode

The UCSC Genome Bioinformatics Site has two BED files (designed by other teams) that are useful for excluding low-mappability regions in the human genome:

ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/

ADD REPLY
1
Entering edit mode
9.1 years ago
wanabi ▴ 60

hanks a lot for the info. Do you know where I can get a similar file for extreme (>90, <10 GC content). Ucsc just provides 5bp tracks which are not very useful to filter out CNV data.

Thanks!

ADD COMMENT
0
Entering edit mode
9.1 years ago
ebrown1955 ▴ 320

As for DGV, these regions are okay, however it really depends on what you are trying to do! There are some control populations available from dbGap that you can download and use to determine whether a CNV is rare or not, as DGV can and does include regions that are not considered rare. It is also important to note that rare != pathogenic.

ADD COMMENT

Login before adding your answer.

Traffic: 1079 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6