FORMAT column with GT for somatic SV VCFs
0
0
Entering edit mode
18 months ago
Christian ▴ 40

Hi,

I have done tumor-only somatic structural variant (SV) calling with Manta. This results in several VCFs with candidate variants. However, some of these do not contain GT in FORMAT (GitHub issue). In tumor-only mode I do not get somaticSV.vcf.gz as final output. Instead, I need to work with tumorSV.vcf.gz. This is "a subset of the candidateSV.vcf.gz file after removing redundant candidates and small indels less than the minimum scored variant size (50 by default)", and looks like this:

#CHROM    POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  156390
chr1  50937   MantaDEL:33:0:0:1:0:0   TATGTCACTCTTAAATGTACTTCTAATTTTTCACTTTACATCACATAATGAATGGATCCAAATATGTTATGGATAGATATCTTCAAACTTTCTACTTACAAGTAGTGATAATAACAG   T   .   MaxMQ0Frac  END=51053;SVTYPE=DEL;SVLEN=-116;CIGAR=1M116D;CIPOS=0,4;HOMLEN=4;HOMSEQ=ATGT PR:SR   31,0:729,53
chr1  66098   MantaBND:0:18689:19690:0:0:0:0  T   ]chr19:108084]T .   MaxDepth    SVTYPE=BND;MATEID=MantaBND:0:18689:19690:0:0:0:1;IMPRECISE;CIPOS=-336,336;BND_DEPTH=401;MATE_BND_DEPTH=864  PR  324,25

I understand that GT information for tumor-only samples may not be meaningful, but I need a placeholder GT field for my downstream annotation pipeline with ANNOVAR.

tl;dr: I am looking for a script that adds GT: to the FORMAT column for every row in a VCF file. The GT field may contain some random placeholder 0/1.

Thanks,

Christian

vcf sv manta • 403 views
ADD COMMENT

Login before adding your answer.

Traffic: 1959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6