Hi,
I have done tumor-only somatic structural variant (SV) calling with Manta. This results in several VCFs with candidate variants. However, some of these do not contain GT in FORMAT (GitHub issue). In tumor-only mode I do not get somaticSV.vcf.gz as final output. Instead, I need to work with tumorSV.vcf.gz. This is "a subset of the candidateSV.vcf.gz file after removing redundant candidates and small indels less than the minimum scored variant size (50 by default)", and looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 156390 chr1 50937 MantaDEL:33:0:0:1:0:0 TATGTCACTCTTAAATGTACTTCTAATTTTTCACTTTACATCACATAATGAATGGATCCAAATATGTTATGGATAGATATCTTCAAACTTTCTACTTACAAGTAGTGATAATAACAG T . MaxMQ0Frac END=51053;SVTYPE=DEL;SVLEN=-116;CIGAR=1M116D;CIPOS=0,4;HOMLEN=4;HOMSEQ=ATGT PR:SR 31,0:729,53 chr1 66098 MantaBND:0:18689:19690:0:0:0:0 T ]chr19:108084]T . MaxDepth SVTYPE=BND;MATEID=MantaBND:0:18689:19690:0:0:0:1;IMPRECISE;CIPOS=-336,336;BND_DEPTH=401;MATE_BND_DEPTH=864 PR 324,25
I understand that GT information for tumor-only samples may not be meaningful, but I need a placeholder GT field for my downstream annotation pipeline with ANNOVAR.
tl;dr: I am looking for a script that adds GT: to the FORMAT column for every row in a VCF file. The GT field may contain some random placeholder 0/1.
Thanks,
Christian