Entering edit mode
7.0 years ago
nkausthu
▴
30
Hi,
I have XHMM generated VCF file which contains the CNVs from exomes of 200 individuals. It would be really great if someone can suggest me how to annotate this VCF (I have annotated using annovar but I don't know whether this is best way to do CNV annotation). Is there any methodology available to incorporate the information from DGV (whether the called CNV is already known or not ).
Hi , which kind of annotation are you looking for ?
Basically I would like to know which all are the gene or genes involved in deletions or duplications. Along with that I need to know its a rare or a common cnv..
to check CNV common usually they use http://dgv.tcag.ca/dgv/app/downloads?ref=GRCh37/hg19 at lab but it's manual cheeking... For the annotation annovar is not bad maybe you can look at OMIM genes to classify by pathologies associated too.
Right now I am also checking manually by downloading the complete set of cnv from DGV. But wanted to figure out whether it could be automated. Looking at OMIM genes is one thing I thought of doing to identify cnvs for know disease causing genes. Again cnv of a new gene which is not yet associated with a particular condition is still difficult. For this purpose I want to use the information from control population as well as our in house frequency. But as such I am not finding any way to do this.
I understand your bio-informatic problematic, you can have look to bedtools (in particular http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html) which is a good tool to compare genomic coordinates. To use it you will need to transform your data in a bed files. If you don't know a programming language you could use excel (even if it's not a recommended in bio-informatic in my point of view) to create your bed files and compare DVG and your XHMM results :)
Hope it's helping
Great!! Thank you so much for reminding about bedtools. I think this will work. Let me try. One more thing I want to know its regarding the allele count which is in Xhmm vcf file. Some times for a particular cnv vcf shows AC=3 and when I go back and check in .xcnv file only one individual has this cnv. Do you know what it means. I thought xhmm gives allele count equal to number of individuals.
Hi, strange think never saw that but could it be that XHMM found a CNV duplicates 2 times in your sample , it will mean that he got 3 times more coverage than the rest of you sample ?
Oks.. If this is the case I can't filter cnvs based on allele count. I generally consider AC=1,0 or 0,1 thinking that only one patient has the corresponding CNV. Then it's very difficult to analyse and narrow down to a pathogenic cnv
I don't know this is only an hypothesis did you check XHMM documentation to check the meaning of AC ?
From xhmm Google group, I could understand that ac corresponds to number of individuals.
Can you link me the information please ?
XHMM
Yes, CNV annotation could be automated (OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also your own in-house information)!
You can look at this post describing the annotSV tool: Annotation for SV and CNV