Calculation variant frequencies for copy number events for a cohort
1
0
Entering edit mode
8 months ago
barslmn ★ 2.3k

Hi,

I have CNV calls of a cohort of patients in both individual VCF and GFF files.

I want to get to the rare ones, so I want to calculate frequencies for the variants within the cohort.

I found this plink approach but wondering if there is any other method or tool that can calculate frequencies directly using the VCF or GFF files?

GFF example

##gff-version 3
chr1    DRAGEN  CNV     925732  945653  3       .       .       Alt=DUP;LinearCopyRatio=1.23097;CopyNumber=3;Genotype=./1;Qual=3;Filter=cnvQual;Start=925731;Stop=945653;Length=19922;BinCount=30;ImproperPairsCount=8,7;color=#DDDDDD;
chr1    DRAGEN  CNV     1468885 1478297 11      .       .       Alt=DEL;LinearCopyRatio=0.677751;CopyNumber=1;Genotype=0/1;Qual=11;Filter=cnvQual;Start=1468884;Stop=1478297;Length=9413;BinCount=4;ImproperPairsCount=1,10;color=#DDDDDD;
chr1    DRAGEN  CNV     2073665 2073797 52      .       .       Alt=DEL;LinearCopyRatio=8.51417e-10;CopyNumber=0;Genotype=1/1;Qual=52;Filter=PASS;Start=2073664;Stop=2073797;Length=133;BinCount=1;ImproperPairsCount=0,0;color=#0000FF;
chr1    DRAGEN  CNV     2321254 2336850 51      .       .       Alt=DUP;LinearCopyRatio=1.43867;CopyNumber=3;Genotype=.1;Qual=51;Filter=PASS;Start=2321253;Stop=2336850;Length=15597;BinCount=8;ImproperPairsCount=4,4;color=#FF0000;
chr1    DRAGEN  CNV     5952701 5986327 4       .       .       Alt=DUP;LinearCopyRatio=1.23552;CopyNumber=3;Genotype=.1;Qual=4;Filter=cnvQual;Start=5952700;Stop=5986327;Length=33627;BinCount=12;ImproperPairsCount=32,8;color=#DDDDDD;

VCF example

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ID_001
chr1    925731  DRAGEN:GAIN:chr1:925732-945653  N       <DUP>   3       cnvQual SVLEN=19922;SVTYPE=CNV;END=945653;REFLEN=19922  GT:SM:CN:BC:PE  ./1:1.23097:3:30:8,7
chr1    1468884 DRAGEN:LOSS:chr1:1468885-1478297        N       <DEL>   11      cnvQual SVLEN=-9413;SVTYPE=CNV;END=1478297;REFLEN=9413  GT:SM:CN:BC:PE  0/1:0.677751:1:4:1,10
chr1    2073664 DRAGEN:LOSS:chr1:2073665-2073797        N       <DEL>   52      PASS    SVLEN=-133;SVTYPE=CNV;END=2073797;REFLEN=133    GT:SM:CN:BC:PE  1/1:8.51417e-10:0:1:0,0
chr1    2321253 DRAGEN:GAIN:chr1:2321254-2336850        N       <DUP>   51      PASS    SVLEN=15597;SVTYPE=CNV;END=2336850;REFLEN=15597 GT:SM:CN:BC:PE  ./1:1.43867:3:8:4,4
cnv vcf variant frequency • 510 views
ADD COMMENT
2
Entering edit mode
8 months ago
LChart 4.6k

If you use the multisample CNV caller from DRAGEN, you'll be able to calculate AC/AF/AN directly in the VCF, as copy numbers will be jointly segmented across samples. As it is, you should find that a small fraction of the variants exactly match, as the breakpoints will be ragged. As such you're probably best off with gff -> bed -> bedtools multiinter (or other interval arithmetic you choose) to establish "consensus" copy number events and a list of their carriers. Also dup/del need to be processed separately, with deletions retaining their genotypes (since one can me missing 1 or 2 copies).

Good luck!

ADD COMMENT
0
Entering edit mode

Thanks for your directions. One thing I notice when I look back on documentation is that joint CNV calling examples are given as trios. That's why I didn't try to use the joint calling. I will give it shot and see if it works without the pedigree file.

ADD REPLY

Login before adding your answer.

Traffic: 1049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6