Entering edit mode
2.4 years ago
Alewa
▴
170
I have a badly formatted vcf header. For for reason, the description tag in format field doesn't have quotes.
According to vcf documentation should be like this
##FORMAT=<ID=ID,Number=number,Type=type,Description="description">
how can i put quotes around my description?
thanks
$ bcftools view -h $main_vcf | less
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##ALT=<ID=NON_REF,Description=Represents any possible alternative allele at this location>
##FILTER=<ID=FS60,Description=FS>
##FILTER=<ID=LowQual,Description=Low quality>
##FILTER=<ID=MQ40,Description=MQ < 40.0>>
##FILTER=<ID=QD2,Description=QD < 2.0>>
##FILTER=<ID=QUAL30,Description=QUAL < 30.0>>
##FILTER=<ID=SOR3,Description=SOR>
##FORMAT=<ID=AD,Number=R,Type=Integer,Description=Allelic depths for the ref and alt alleles in the order listed>
##FORMAT=<ID=DP,Number=1,Type=Integer,Description=Approximate read depth (reads with MQ=255 or with bad mates are filtered)>
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description=Genotype Quality>
##FORMAT=<ID=GT,Number=1,Type=String,Description=Genotype>
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description=Minimum DP observed within the GVCF block>
##FORMAT=<ID=PS,Number=1,Type=Integer,Description=Phasing set (typically the position of the first variant in the set)>
##FORMAT=<ID=SB,Number=4,Type=Integer,Description=Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.>
##GATKCommandLine=<ID=VariantFiltration,CommandLine=VariantFiltration --output Moldova_SNPs_filtered.vcf --filter-expression QD < 2.0 --filter-expression QUAL < 30.0 --filter-expression SOR > 3.0 --filter-expression FS > 60.0 --filter-expression MQ < 40.0 --filter-name QD2 --filter-name QUAL30 --filter-name SOR3 --filter-name FS60 --filter-name MQ40 --variant Moldova_SNPs.vcf --cluster-size 3 --cluster-window-size 0 --mask-extension 0 --mask-name Mask --filter-not-in-mask false --missing-values-evaluate-as-failing false --invalidate-previous-filters false --invert-filter-expression false --invert-genotype-filter-expression false --set-filtered-genotype-to-no-call false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false,Version=4.1.2.0,Date=January 20, 2021 4:12:58 PM PST>>
##INFO=<ID=AN,Number=1,Type=Integer,Description=Total number of alleles in called genotypes>
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description=Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities>
##INFO=<ID=DP,Number=1,Type=Integer,Description=Approximate read depth; some reads may have been filtered>
##INFO=<ID=DS,Number=0,Type=Flag,Description=Were any of the samples downsampled?>
##INFO=<ID=END,Number=1,Type=Integer,Description=Stop position of the interval>
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description=Phred-scaled p-value for exact test of excess heterozygosity>
##INFO=<ID=FS,Number=1,Type=Float,Description=Phred-scaled p-value using Fisher's exact test to detect strand bias>
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description=Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation>
##INFO=<ID=MQ,Number=1,Type=Float,Description=RMS Mapping Quality>
thanks, that helps!