Question

How to annotate CNV events with gene information?

2

Entering edit mode

8.0 years ago

bioinforesearchquestions ▴ 370

Hello friends,

I have CNV calls from four different CNV callers. I would like to annotate each CNV calls with gene information.

What are commonly used tools to annotate CNVs?

How much of overlap do I need to consider between CNV calls and gene coordinates, if I am using bedtools intersect to annotate CNV calls.

I have 300 samples. Therefore I am looking for command line options.

SNP RNA DNA-seq CNV events annovar • 5.5k views

ADD COMMENT • link updated 8.0 years ago by Amitm ★ 2.3k • written 8.0 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

CNV annotation can be easily automated (with OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information)!

You can look at this post describing the annotSV tool: Annotation for SV and CNV

ADD REPLY • link 6.4 years ago by LGMgeo ▴ 110

score 3 · Answer 1 · 2016-11-17

3

Entering edit mode

8.0 years ago

Amitm ★ 2.3k

hi,

A float of 0.5 passed to -f seems reasonable, in intersectBed. Once happy with threshold, make a shell script like this -

bedtools2-2.20.1/bin/intersectBed \
-a "$1" \
-b Homo_sapiens.GRCh37_BED_SORTD.txt \
-wao \
-f 0.5 \
>"$1"_ANNO

And save it in a file, say CNV_anno.sh Then you could run something like this on the shell -

for myCNVreslts in $(ls cnv_result_*); do
    sh CNV_anno.sh $myCNVreslts
done

Assuming that your result files start with prefix pattern cnv_result_*. Alter the pattern depending on your exact filename and dir location.

The output files would get a suffix of _ANNO. You can change again.

The -wao param in intersectBed, ensures that both features are printed out in the result, with the overlap detail.

ADD COMMENT • link 8.0 years ago by Amitm ★ 2.3k

0

Entering edit mode

Thanks, Amit. I was away for a conference.

bedtools intersect -wa -wb -a Homo_sapiens.GRCh37_BED_SORTD.bed -b Sample1_cnv_file.bed -f 0.5 -r > GRCg37_Sample1_overlap.txt

Also, now I am annotating my CNV events with DGV database using annovar tool.

First I tried, this command "$ annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype dgvMerged example/ex1.avinput humandb/". All my 500 cnv events got annotated.

Do I need to increase the minimum overlap fraction ?

Does it mean all my CNV events are common in the population?

How do I check my CNVs are pathogenic or not?

ADD REPLY • link 8.0 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

I was able to follow this approach and get the annotations for the CNVs. Essentially, I got the genes which are overlapping with CNVs and then I assigned the status (Amp/Del/Neutral) to each gene according to CNV status. However, this is a mere overlap approach and what is your opinion on directly using this (Amp/Del) status in visualization tools like maftools? I know that there are tools like GISTIC can be run - but our data is non-human and GISTIC and many other standard tools may not work.

ADD REPLY • link 6.3 years ago by sutturka ▴ 190