Entering edit mode
2.8 years ago
pixie@bioinfo
★
1.5k
Hello, I am using VariantAnnotation_1.40.0 to read in a VCF file using
vcf <- readVcf("220106_MN01111_0064_v2_A000H3MT5V.tomato.filtered.txt")
However, I get this error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'seqinfo': no 'header' line
"#CHROM POS ID..."?
I need some help to process this file. Thanks
The file format is :
##fileformat=VCFv4.2
##bcftoolsVersion=1.9+htslib-1.9
##bcftoolsCommand=mpileup -O v -f /4_illumina/Genomes/Solanum_lycopersicum_3.00/S_lycopersicum_chromosomes.3.00.fa -R /4_illumina/gt_seq/gt_seq_proccessor/220106_MN01111_0064_v2_A000H3MT5V/tomato/tomato.tsv -a DP,AD,ADF,ADR,SP -d 1000000 -L 1000000 --threads 12 /4_illumina/gt_seq/gt_seq_proccessor/220106_MN01111_0064_v2_A000H3MT5V/tomato/220106_MN01111_0064_v2_A000H3MT5V.purged.tomato.bam
##reference=file:///4_illumina/Genomes/Solanum_lycopersicum_3.00/S_lycopersicum_chromosomes.3.00.fa
##bcftools_callVersion=1.9+htslib-1.9
##bcftools_callCommand=call -c -A --threads 12 -O z -o /4_illumina/gt_seq/gt_seq_proccessor/220106_MN01111_0064_v2_A000H3MT5V/tomato/220106_MN01111_0064_v2_A000H3MT5V.tomato.raw.vcf.gz; Date=Wed Jan 12 13:56:41 2022
##filter_vcfVersion=PyVCF==0.6.8
##Command=/home/airflow/anaconda3/envs/gt_seq_airflow/bin/python /5_workspace/repos/gt-seq/scripts/variant_filter.py -v /4_illumina/gt_seq/gt_seq_proccessor/220106_MN01111_0064_v2_A000H3MT5V/tomato/220106_MN01111_0064_v2_A000H3MT5V.tomato.raw.vcf.gz -g /4_illumina/gt-seq_primers/Solanum/20211214/GBS_primer_set_198.gff3 -o /4_illumina/gt_seq/gt_seq_proccessor/220106_MN01111_0064_v2_A000H3MT5V/tomato/220106_MN01111_0064_v2_A000H3MT5V.tomato.filtered.vcf -f 0 1
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)">
##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)">
##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)">
##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AF2,Number=1,Type=Float,Description="Max-likelihood estimate of the first and second group ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths">
##FORMAT=<ID=ADF,Number=R,Type=Integer,Description="Allelic depths on the forward strand">
##FORMAT=<ID=ADR,Number=R,Type=Integer,Description="Allelic depths on the reverse strand">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=gff3_notarget,Description="SNP is not targeted as described in the GFF3 input file">
##FILTER=<ID=Sample_DP,Description="Sample DP field value lower than 5 (filtering does not appear in FILTER column however is applied)">
##FILTER=<ID=Sample_Het,Description="The ratio of heterozygoes calles must be between the set fraction 0.0 - 1.0. (filtering does not appear in FILTER column however is applied)">
##FILTER=<ID=QS,Description="QUAL field value lower than 30">
##FILTER=<ID=DP,Description="FORMAT DP field value lower than 50 (not enough reads)">
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##contig=<ID=SL3.0ch00,length=20852292>
##contig=<ID=SL3.0ch01,length=98455869>
##contig=<ID=SL3.0ch02,length=55977580>
##contig=<ID=SL3.0ch03,length=72290146>
##contig=<ID=SL3.0ch04,length=66557038>
##contig=<ID=SL3.0ch05,length=66723567>
##contig=<ID=SL3.0ch06,length=49794276>
##contig=<ID=SL3.0ch07,length=68175699>
##contig=<ID=SL3.0ch08,length=65987440>
##contig=<ID=SL3.0ch09,length=72906345>
##contig=<ID=SL3.0ch10,length=65633393>
##contig=<ID=SL3.0ch11,length=56597135>
##contig=<ID=SL3.0ch12,length=68126176>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BXXXX_0 BXXXX_220 DXXXX_0 MXXXX_0
SL3.0ch01 83929063 0 A G 999 PASS DP=1667;VDB=0.0;RPB=0.999232;MQB=1.0;BQB=0.969803;SGB=293.924;MQ0F=0.0;AF1=0.517147;AC1=94.0;MQ=60;FQ=999.0;PV4=1.0,0.128142,1.0,1.0;G3=0.340041,0.286409,0.373549;HWE=3.7029e-05;DP4=758,0,909,0 GT:PL:DP:SP:ADF:ADR:AD 1/1:216,66,0:22:0:0,22:0,0:0,22 1/1:214,57,0:19:0:0,19:0,0:0,19 1/1:214,57,0:19:0:0,19:0,0:0,19 1/1:217,78,0:26:0:0,26:0,0:0,26
SL3.0ch01 97631153 0 A T 999 PASS DP=9192;VDB=0.0;RPB=0.996356;MQB=1.0;BQB=0.997647;SGB=960.891;MQ0F=0.0;AF1=0.477998;AC1=87.0;MQ=60;FQ=999.0;PV4=1.0,1.0,1.0,1.0;G3=0.362637,0.318703,0.318659;HWE=0.000491945;DP4=4817,0,4374,0 GT:PL:DP:SP:ADF:ADR:AD 0/1:198,0,193:122:0:59,63:0,0:59,63 1/1:255,255,0:115:0:0,115:0,0:0,115 0/0:0,255,255:151:0:150,0:0,0:150,0 0/0:0,255,255:149:0:149,0:0,0:149,0
SL3.0ch01 83952645 0 T C 999 PASS DP=7200;VDB=0.0;RPB=0.99006;MQB=0.999827;BQB=0.562521;SGB=1279.27;MQ0F=0.0;AF1=0.516484;AC1=94.0;MQ=60;FQ=999.0;PV4=1.0,1.61487e-06,0.180924,1.0;G3=0.340659,0.285715,0.373626;HWE=3.34968e-05;DP4=3269,0,3931,0 GT:PL:DP:SP:ADF:ADR:AD 1/1:195,175,0:104:0:4,100:0,0:4,100 1/1:234,216,0:85:0:1,84:0,0:1,84 1/1:255,255,0:123:0:1,122:0,0:1,122 1/1:255,255,0:133:0:0,133:0,0:0,133
SL3.0ch01 83936249 0 C T 999 PASS DP=14208;VDB=0.0;RPB=0.963987;MQB=0.998652;BQB=0.87956;SGB=676.78;MQ0F=0.0;AF1=0.516484;AC1=94.0;MQ=60;FQ=999.0;PV4=1.0,0.0640157,0.286743,1.0;G3=0.340659,0.285714,0.373626;HWE=3.34953e-05;DP4=6722,0,7486,0 GT:PL:DP:SP:ADF:ADR:AD 1/1:255,255,0:144:0:0,144:0,0:0,144 1/1:255,255,0:190:0:0,190:0,0:0,190 1/1:255,255,0:208:0:0,208:0,0:0,208 1/1:255,255,0:203:0:0,203:0,0:0,203
I can confirm the same error. You have a malformed vcf file. How was it generated?
Something with the header line (#CHROM POS ID REF etc) is causing the issue. Can you post to say pastebin the plantext vcf file? What you posted has spaces.