[ERROR] Malformed VCF: empty alleles are not permitted in VCF records
1
0
Entering edit mode
8.8 years ago
umn_bist ▴ 390

I am running BaseRecalibrator for my RNA-seq:

java -jar -Xmx120g ${GATK} -T BaseRecalibrator \
                           -R "${reference}" \
                           -I "${file4}" \
                           -knownSites "${gerVar}" \
                           -knownSites "${somVar}" \
                           -o "${file4%_tstaids.bam}_tstaidsr.table1"
java -jar -Xmx120g ${GATK} -T BaseRecalibrator \
                           -R "${reference}" \
                           -I "${file4}" \
                           -knownSites "${gerVar}" \
                           -knownSites "${somVar}" \
                           -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" \
                           -o "${file4%_tstaids.bam}_tstaidsr.table2"
java -jar -Xmx120g ${GATK} -T AnalyzeCovariates 
                           -R "${reference}" \
                           -before "${file4%_tstaids.bam}_tstaidsr.table1" \
                           -after "${file4%_tstaids.bam}_tstaidsr.table2" \
                           -plots "${file1%_tsta.bam}_BQSR.pdf"
java -jar -Xmx120g ${GATK} -T PrintReads 
                           -R "{reference}" \
                           -I "${file4}" \
                           -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" \
                           -o "${file7}"

Note that I got 2 variant VCF from Ensembl (germline and somatic). My reference is Ensembl GRCh38.p5. I ran the command below to append 'chr' notation and change chrMT to chrM:

sed -e '/^[^#]/s/^/chr/' -e 's/^chrMT/chrM/'

I received this error:

##### ERROR MESSAGE: The provided VCF file is malformed at approximately line number 18354680: empty alleles are not permitted in VCF records

I used the command below to inspect my VCF file (it is ${gerVar} that is malformed):

sed -n '18354680p'

which returned:

chr11    5249456    HbVar.633    G        .    .    PhenCode_20140430;TSA=sequence_alteration;AA=A
VCF GATK RNA-Seq • 3.8k views
ADD COMMENT
0
Entering edit mode

You found the origin of your problem. So, what is the question ?

ADD REPLY
0
Entering edit mode

Yes, the question is, is there a better way of amending this error without redownloading the original VCF file to cross check what replaces the empty allele? Could this malformation due to the sed function, potentially opening up other empty alleles in the file?

ADD REPLY
0
Entering edit mode

I guess it's not due to sed functionality. It's quite possible that there will be empty alleles. So check how many of them are there, then if they are very few, remove them from the file.

ADD REPLY
3
Entering edit mode
8.8 years ago

Cleanup your vcf :

awk -F '\t' '($0 ~ /^#/ || $5!=".")' in.vcf > out.vcf
ADD COMMENT
0
Entering edit mode

This solution does not work.

ADD REPLY
2
Entering edit mode
ADD REPLY
0
Entering edit mode

How about checking in-del which has empty alternative allele column ?

ADD REPLY

Login before adding your answer.

Traffic: 1707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6