Entering edit mode
21 months ago
minoo
▴
10
I have a vcf files like below:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT treatmentSample
chr1 857100 . C T 1756.06 PASS AC=2;AF=1;AN=2;DP=60;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=29.27;SOR=1.812;CSQ=chr1:857100|T|SNV|ENSG00000228794|ENST00000445118|LINC01128||1|MODIFIER|non_coding_transcript_exon_variant||||5/5|||||||||||||||||| GT:AD:DP:GQ:PL 1/1:0,60:60:99:1770,180,0
Does anyone know how to seperate INFO columns into different columns? And also how to separate treatmentSample column following the FORMAT ORDER? I TRIED TO USE bcftools +split-vep
and awk but they didn't work.
bcftools +split-vep treatmentSample.vcf.gz -f '\t%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%FORMAT\t%treatmentSample\t%INFO\t% AC\t%AF\t%AN\t% MLEAC\t% MLEAF\t% MQ\t% QD\t%SOR\t%CSQ[\t%GT][\t%GQ][\t%DP][\t%MIN_DP][\t%AD][\t%VAF][\t%PL][\t%MED_DP]\n' -d -A tab > output.vcf
The output was like below:
Warning: duplicate INFO/CSQ key "CADD_phred"
Note: ambiguous key %AC; using the AC subfield of CSQ, not the INFO/AC tag
Note: ambiguous key %AN; using the AN subfield of CSQ, not the INFO/AN tag
Note: ambiguous key %Diag_Germline_Gene; using the Diag_Germline_Gene subfield of CSQ, not the INFO/Diag_Germline_Gene tag
Could not parse format string: \t%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%FORMAT\t%treatmentSample\t%INFO\t% AC\t%AF\t%AN\t% MLEAC\t% MLEAF\t% MQ\t% QD\t%SOR\t%Location %Allele %VARIANT_CLASS %Gene %Feature %SYMBOL %CCDS %STRAND %IMPACT %Consequence %SIFT %PolyPhen %CADD_phred %EXON %DISTANCE %COSMIC_ID %COSMIC_CNT %AC %AN %CADD_phred %gnomAD_exomes_AF %gnomAD_exomes_NFE_AF %ExAC_nonTCGA_AF %ExAC_nonTCGA_NFE_AF %gnomAD_genomes_AF %gnomAD_genomes_NFE_AF %phyloP100way_vertebrate %clinvar_rs %clinvar_clnsig %clinvar_trait %clinvar_golden_stars %Diag_Germline_Gene[\t%GT][\t%GQ][\t%DP][\t%MIN_DP][\t%AD][\t%VAF][\t%PL][\t%MED_DP]\n
I need to be something like the below table:
#CHROM POS ID REF ALT QUAL FILTER AC AF AN DP ExcessHet FS MLEAC MLEAF MQ QD SOR Location Allele VARIANT_CLASS Gene Feature SYMBOL CCDS STRAND IMPACT Consequence SIFT PolyPhen CADD_phred EXON FORMAT treatmentSample
chr1 857100 . C T 1756.06 PASS 2 1 2 60 30.103 0 2 1 60 29.27 1.812 chr1:857100 T SNV ENSG00000228794 ENST00000445118 LINC01128 1 MODIFIER non_coding_transcript_exon_variant 45051 GT:AD:DP:GQ:PL 0/1:39,11:50:99:172,0,1122
read https://meta.stackexchange.com/questions/147616/
Right sorry I added the error.