How can I do in GTAK downstream analysis?
1
1
Entering edit mode
5.6 years ago
jiangpeng59 ▴ 20

Hi, friends I have finished call variants via GTAK best practices(joint-call) with 20 WES samples. My goal is to find the genes that may causes the disease from these 20 samples, I also tried some tools and methods, but I don't whether they are rigth. So, Is there a good guidance documentation like GTAK best practices to get my target?

I have tried an annotation tool named annova,

First, I transform vcf into the annova input format, more precisely, I got 20 annova input file for each sample.

convert2annovar.pl -format vcf4 relapse.filtered.snps.indels.vcf -allsample -filter PASS -out out/relapse

For each sample, I did the following:

1.filtering the irrelevant variants via 1000 Genomes Project dataset with MAF=0.01

2.annotating each variants with gene info, due to the data is WES, I got the "exonic variantfunction"

intronic        KCNMA1  chr10   77008214        77008214        T       C       het     127.30  51
exonic  CDHR4   chr3    49795287        49795287        C       T       het     8460.17 274
exonic  CLCN5   chrX    50081733        50081733        A       G       hom     69769.27        255
intronic        TUFT1   chr1    151566031       151566031       -       C       hom     7453.64 18
intronic        PDE4D   chr5    60147674        60147674        T       G       het     1524.08 112
intronic        USF1    chr1    161041947       161041948       GA      -       het     2536.97 91
intronic        RPS6KB2 chr11   67432552        67432552        A       G       het     2772.98 156
intergenic      LINC01296(dist=73295),DUXAP10(dist=113998)      chr14   19180792        19180792        C       A       het     2378.33 172

3.Combining there 20 output file with .MAF format to generate a waterfall plot, however, the percent of mutant is almost 100%, I don't think that is a right result.

waterfall

GTAK • 1.9k views
ADD COMMENT
1
Entering edit mode

You should consider changing your question to the actual problem you are having, which is "find the genes that may causes the disease". GATK helps you get mutations, but it will not help you with understanding them.

ADD REPLY
0
Entering edit mode

Thanks for your reply, I added a description of the problem.

ADD REPLY
0
Entering edit mode

You mean GATK, right?

ADD REPLY
0
Entering edit mode

Yes, I got a vcf with gatk_v4.1.0.0, here is part of the output.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  GXZ-T1  P_11    P_12    P_13    P_16    P_17    P_18    P_3     P_4     P_6     P_7     S1448662        S1450168        S1534039        SLJ-T1  TZ-T1   ZAL-T1
chr1    13418   .       G       A       247.50  .       AC=1;AF=0.100;AN=10;BaseQRankSum=0.397;DP=645;ExcessHet=3.4242;FS=90.299;MLEAC=3;MLEAF=0.300;MQ=33.87;MQRankSum=-4.350e+00;QD=2.00;ReadPosRankSum=-2.355e+00;SOR=6.271  GT:AD:DP:GQ:PL  0/1:102,22:124:99:250,0,2794    ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       0/0:137,0:137:99:0,120,1800     0/0:193,0:193:99:0,120,1800     0/0:142,0:142:0:0,0,3722        0/0:49,0:49:99:0,120,1800       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0

but I have no ideal what should I do to find these disease genes. is there a common practice ?

ADD REPLY
0
Entering edit mode

What have you tried?

ADD REPLY
0
Entering edit mode

Thanks for your reply, I added a description of the problem ,and I tried an annotation tool named 'annova'

ADD REPLY
2
Entering edit mode
5.6 years ago
Garan ▴ 690

I take it this is a mendelian rare disease that you are looking for a causal variant? Are the 20 samples trios (parents-proband) or singletons (unrelated cases with similar symptoms), as this will change the filtering strategy? Do you have family history? Population details / Phenotype details?

If trios you can probably filter autosomal variants present in parents (who do not exhibit the disease - be careful here of reduced penetrance). If singletons you would probably start with MAF (1% is still very high for a rare disease) specific for that population (after annotation - Gnomad), then look at nonsense > splicing variants > missense > rare synonymous variant present in a high percentage of the samples (less likely and much harder to prove). You might also want to run a CNV caller to pick up larger structural variants. You are probably looking for a HMZ variant (HTZ in both parents), possibly a denovo HTZ (will not be present in parents), depending on the severity of the phenotype / symptoms. This is a very brief overview since there's alot more you would do depending on the samples / family history / phenotype, including HPO / OMIM lookup of phenotype terms to provide possible candidate genes etc.

ADD COMMENT
1
Entering edit mode

Thanks for your reply, Garan, I think the sample is a singletons, more precisely, it is a tumor(glioma). these samples come from 20 individual and no normal sample.

ADD REPLY
1
Entering edit mode

I guess if you're looking for somatic mutations not germline variants then this complicates things, without non-tumor samples to compare to. I guess you'd also want to take into account the source and age profile of the Gnomad/Exac variant information since I believe some of datasets were recruited from patients with various cancers, and the age profiles (can be seen in the age histograms) are sometimes skewered towards older participants (more likely to carry somatic variants). Annotation with COSMIC is probably high on the agenda (https://cancer.sanger.ac.uk/cosmic)?

ADD REPLY

Login before adding your answer.

Traffic: 1521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6