Entering edit mode
18 months ago
Payal
▴
160
Hello,
I am new to exome sequencing. I ran GATK4 and have a g.VCF file. I used VEP to annotate the gVCF file. I have a couple of questions.
- I noticed not all the rows(variants) have the CSQ flag. Is that normal? Or I missed something? [I used
--coding_only
in VEP along with other options] - What the main things to look for in a vcf file to get meaningful information out of the vcf file and what tools to use for that? [I don't have a specific list of genes to look for right now]
Thanks,
Payal
A
.vcf
file or agvcf
file?Its a gvcf file
A gvcf files has blocks for both variant loci and non-variant loci, so you won't see a CSQ for each record, only for those variant loci with an actual consequence to the variation. Since you use the
--coding-only
flag, the number of sites with annotations will be even fewer.this makes sense
if your talking about the gvcf produced by HaplotypeCaller with ERC=GVCF, you don't annote g.vcf files , you must first merge the g.vcf.files with CombineGVCFs and then call GenotypeGVCFs
True, but it's not technically impossible to annotate GVCFs, which is why I did not address that point.
I am sorry. I should have been clear. I actually did run CombineGVCFs and then call GenotypeGVCFs and this is my final vcf file.
Ignore my earlier point on gVCFs. If you ran GenotypeGVCFs, you should already have regular VCF output. However, given you used
--coding-only
, only coding variants will be annotated.