bedtools intersect VCF with BED Invalid record error.
0
0
Entering edit mode
2.1 years ago

I am attempting to intersect a VCF file with coordinates from a BED file.

e.g.

bedtools intersect -a /path/to/vcf -b /path/to/.bed

The command results in the following error:

Error: Invalid record in file /path/to/vcf. Record is 

And prints out the offending line.

The records it doesn't like are the ones where the ALT field is something other than a SNV or INDEL, for example a copy number allele <CN0> <CN1> <CN2>etc. or an inversion <INV>. The "<>" are part of the record (see example below). When the less-than greater-than signs are removed from around the entry (e.g. CN0, INV), the error goes away. In the example below, the first record does not cause an error, but the rest do.

Why can't bedtools handle these records, and is there any way for them to just be ignored so the command can run on the rest of the records?

Example VCF:

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##filedate=2022.10.17
##contig=<ID=1>
##INFO=<ID=AF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Estimated Minor Allele Frequency">
##INFO=<ID=R2,Number=1,Type=Float,Description="Estimated Imputation Accuracy (R-square)">
##INFO=<ID=ER2,Number=1,Type=Float,Description="Empirical (Leave-One-Out) R-square (available only for genotyped variants)">
##INFO=<ID=IMPUTED,Number=0,Type=Flag,Description="Marker was imputed but NOT genotyped">
##INFO=<ID=TYPED,Number=0,Type=Flag,Description="Marker was genotyped AND imputed">
##INFO=<ID=TYPED_ONLY,Number=0,Type=Flag,Description="Marker was genotyped but NOT imputed">
##ALT=<ID=CN0,Description="Copy number allele: 0 copies">
##ALT=<ID=CN1,Description="Copy number allele: 1 copy">
##ALT=<ID=CN2,Description="Copy number allele: 2 copies">
##ALT=<ID=INV,Description="Inversion">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]">
##FORMAT=<ID=HDS,Number=2,Type=Float,Description="Estimated Haploid Alternate Allele Dosage">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Estimated Posterior Probabilities for Genotypes 0/0, 0/1 and 1/1">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE1
1   668630  1:668630:G:<CN2>    G   CN2 .   PASS    AF=0.00423;MAF=0.00423;R2=0.01830;IMPUTED   GT:DS:HDS:GP    0|0:0.002:0.000,0.001:0.998,0.002,0.000
1   668630  1:668630:G:<CN2>    G   <CN2>   .   PASS    AF=0.00423;MAF=0.00423;R2=0.01830;IMPUTED   GT:DS:HDS:GP    0|0:0.002:0.000,0.001:0.998,0.002,0.000
1   21530509    1:21530509:G:<INV>  G   <INV>   .   PASS    AF=0.03698;MAF=0.03698;R2=0.98592;IMPUTED   GT:DS:HDS:GP    0|0:0:0,0:1,0,0 
1   713044  1:713044:C:<CN0>    C   <CN0>   .   PASS    AF=0.00070;MAF=0.00070;R2=0.00653;IMPUTED   GT:DS:HDS:GP    0|0:0:0,0:1,0,0
vcf bedtools • 1.2k views
ADD COMMENT
0
Entering edit mode

Why can't bedtools handle these records,

I don't know but may be you just want:

bcftools view --regions-file in.bed indexed.vcf.gz
ADD REPLY
0
Entering edit mode

Do you have an empty line somewhere?

ADD REPLY

Login before adding your answer.

Traffic: 1251 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6