how to find ratio of cases vs control per variant from a vcf file
1
1
Entering edit mode
7.5 years ago

how to find ratio of cases vs control per variant from a vcf file

vcf file looks like -

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Co1 Co2 Ca1 Ca2
1   15455   .   A   C   2599    PASS    AC=1;AF=6.227e-04;AN=1606;BaseQRankSum=-6.731e+00;CCC=1606;ClippingRankSum=1.27;DP=31258;FS=1.355;GQ_MEAN=99.94;GQ_STDDEV=84.82;HWP=1.0000;InbreedingCoeff=-0.0006;MLEAC=1;MLEAF=6.227e-04;MQ=60.00;MQ0=0;MQRankSum=1.79;NCC=0;QD=15.47;ReadPosRankSum=-1.258e+00;VQSLOD=0.451;culprit=MQ   GT:AD:DP:GQ:PL  0/0:26,0:26:78:0,78,987 0/0:28,0:28:62:0,62,1170    0/0:49,0:49:99:0,120,1800   0/0:26,0:26:78:0,78,1007
1   15456   .   C   T   2276.10 PASS    AC=753;AF=0.469;AN=1606;BaseQRankSum=-5.629e+00;CCC=1606;ClippingRankSum=-4.790e-01;DB;DP=145744;FS=1.372;GQ_MEAN=1472.49;GQ_STDDEV=1218.73;HWP=0.7663;InbreedingCoeff=0.0124;MLEAC=753;MLEAF=0.469;MQ=59.72;MQ0=0;MQRankSum=0.072;NCC=0;POSITIVE_TRAIN_SITE;QD=19.85;ReadPosRankSum=0.753;VQSLOD=5.00;culprit=QD   GT:AD:DP:GQ:PL  0/1:109,66:175:99:1782,0,3496   0/1:116,93:209:99:2547,0,3609   0/1:133,115:248:99:3342,0,4025  0/1:106,84:190:99:2343,0,3387
1   154558  .   G   A   3620    PASS    AC=1;AF=6.227e-04;AN=1606;BaseQRankSum=-8.649e+00;CCC=1606;ClippingRankSum=-9.750e-01;DP=115668;FS=1.869;GQ_MEAN=123.41;GQ_STDDEV=97.83;HWP=1.0000;InbreedingCoeff=-0.0006;MLEAC=1;MLEAF=6.227e-04;MQ=59.74;MQ0=0;MQRankSum=-6.000e-03;NCC=0;QD=17.00;ReadPosRankSum=0.820;VQSLOD=2.75;culprit=InbreedingCoeff  GT:AD:DP:GQ:PL  0/0:117,0:117:99:0,120,1800 0/0:116,0:116:99:0,120,1800 0/0:201,0:201:99:0,120,1800 0/0:121,0:121:99:0,120,1800
vcf case vs control • 2.2k views
ADD COMMENT
2
Entering edit mode

Hi abhishek.abhishekkumar,

People will be more eager to help if you show what you tried and ensure us that you also have put effort in this question. We would be happy to point out your mistakes or put you back on track.

In addition, I have converted your post from a 'forum' to a 'question', because that's what it is.

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

Finally: Interesting guidelines for posting can be found in the following posts:

Cheers,
Wouter

ADD REPLY
0
Entering edit mode

Thanks for the multiple answers - Pierre, as many of your ideas, worked for me in past, so special thanks.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY
3
Entering edit mode
7.5 years ago

using bioalcidae: http://lindenb.github.io/jvarkit/BioAlcidae.html

script:

var sample2status={
    "M10475":0,
    "M10478":1,
    "M10500":1
    };
out.println("contig pos ref aff unaff");
while(iter.hasNext())
    {
    var vc = iter.next();
    var affected=0;
    var unaffected=0;
    var total=0.0;
    for(var sample in sample2status)
        {
        var genotype= vc.getGenotype(sample);
        if( genotype == null || !genotype.isCalled()) continue;
        var alleles = genotype.getAlleles();
        for(var i=0;i< alleles.size();++i)
            {
            var allele = alleles.get(i);
            if(allele.isNoCall()) continue;
            total++;
            if(allele.isReference()) continue;
            if(sample2status[sample]==1)
                {
                affected++;
                }
            else
                {
                unaffected++;
                }
            }
        }
    if(total==0) continue;
    out.println(vc.getContig()+" "+vc.getStart()+" "+vc.getReference().getDisplayString()+" "+(affected/total)+" "+(unaffected/total));
    }

invoke:

$ curl -s "https://raw.githubusercontent.com/arq5x/gemini/master/test/test.region.vep.vcf"  | java -jar dist/bioalcidae.jar -f script.js -F VCF | column -t 

contig  pos       ref  aff                  unaff
chr1    10001     T    0.3333333333333333   0.16666666666666666
chr1    10056     A    0.16666666666666666  0
chr1    10121     A    0                    0
chr1    10177     A    0.16666666666666666  0.16666666666666666
chr1    10180     T    0.3333333333333333   0
chr1    10234     C    0.16666666666666666  0.16666666666666666
chr16   72057282  A    0.3333333333333333   0.3333333333333333
chr16   72057435  C    0                    0.16666666666666666
chr16   72059269  T    1                    0

shameless self promotion, see also:

     Alleles
     +-----+-----+-----+-------+--------+----+----+-----+-------------+---------------+---------+-----------+
     | Idx | REF | Sym | Bases | Length | AC | AN | AF  | AC_affected | AC_unaffected | AC_male | AC_female |
     +-----+-----+-----+-------+--------+----+----+-----+-------------+---------------+---------+-----------+
     | 0   | *   |     | T     | 1      | 4  | 8  | 0.5 | 2           | 1             | 1       | 2         |
     | 1   |     |     | TC    | 2      | 4  | 8  | 0.5 | 2           | 1             | 1       | 2         |
     +-----+-----+-----+-------+--------+----+----+-----+-------------+---------------+---------+-----------+

enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 2739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6