I am attempting to intersect a VCF file with coordinates from a BED file.
e.g.
bedtools intersect -a /path/to/vcf -b /path/to/.bed
The command results in the following error:
Error: Invalid record in file /path/to/vcf. Record is
And prints out the offending line.
The records it doesn't like are the ones where the ALT field is something other than a SNV or INDEL, for example a copy number allele <CN0> <CN1> <CN2>etc. or an inversion <INV>. The "<>" are part of the record (see example below). When the less-than greater-than signs are removed from around the entry (e.g. CN0, INV), the error goes away. In the example below, the first record does not cause an error, but the rest do.
Why can't bedtools handle these records, and is there any way for them to just be ignored so the command can run on the rest of the records?
Example VCF:
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##filedate=2022.10.17
##contig=<ID=1>
##INFO=<ID=AF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Estimated Minor Allele Frequency">
##INFO=<ID=R2,Number=1,Type=Float,Description="Estimated Imputation Accuracy (R-square)">
##INFO=<ID=ER2,Number=1,Type=Float,Description="Empirical (Leave-One-Out) R-square (available only for genotyped variants)">
##INFO=<ID=IMPUTED,Number=0,Type=Flag,Description="Marker was imputed but NOT genotyped">
##INFO=<ID=TYPED,Number=0,Type=Flag,Description="Marker was genotyped AND imputed">
##INFO=<ID=TYPED_ONLY,Number=0,Type=Flag,Description="Marker was genotyped but NOT imputed">
##ALT=<ID=CN0,Description="Copy number allele: 0 copies">
##ALT=<ID=CN1,Description="Copy number allele: 1 copy">
##ALT=<ID=CN2,Description="Copy number allele: 2 copies">
##ALT=<ID=INV,Description="Inversion">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]">
##FORMAT=<ID=HDS,Number=2,Type=Float,Description="Estimated Haploid Alternate Allele Dosage">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Estimated Posterior Probabilities for Genotypes 0/0, 0/1 and 1/1">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1
1 668630 1:668630:G:<CN2> G CN2 . PASS AF=0.00423;MAF=0.00423;R2=0.01830;IMPUTED GT:DS:HDS:GP 0|0:0.002:0.000,0.001:0.998,0.002,0.000
1 668630 1:668630:G:<CN2> G <CN2> . PASS AF=0.00423;MAF=0.00423;R2=0.01830;IMPUTED GT:DS:HDS:GP 0|0:0.002:0.000,0.001:0.998,0.002,0.000
1 21530509 1:21530509:G:<INV> G <INV> . PASS AF=0.03698;MAF=0.03698;R2=0.98592;IMPUTED GT:DS:HDS:GP 0|0:0:0,0:1,0,0
1 713044 1:713044:C:<CN0> C <CN0> . PASS AF=0.00070;MAF=0.00070;R2=0.00653;IMPUTED GT:DS:HDS:GP 0|0:0:0,0:1,0,0
I don't know but may be you just want:
Do you have an empty line somewhere?