Entering edit mode
8.8 years ago
umn_bist
▴
390
I am running BaseRecalibrator for my RNA-seq:
java -jar -Xmx120g ${GATK} -T BaseRecalibrator \
-R "${reference}" \
-I "${file4}" \
-knownSites "${gerVar}" \
-knownSites "${somVar}" \
-o "${file4%_tstaids.bam}_tstaidsr.table1"
java -jar -Xmx120g ${GATK} -T BaseRecalibrator \
-R "${reference}" \
-I "${file4}" \
-knownSites "${gerVar}" \
-knownSites "${somVar}" \
-BQSR "${file4%_tstaids.bam}_tstaidsr.table1" \
-o "${file4%_tstaids.bam}_tstaidsr.table2"
java -jar -Xmx120g ${GATK} -T AnalyzeCovariates
-R "${reference}" \
-before "${file4%_tstaids.bam}_tstaidsr.table1" \
-after "${file4%_tstaids.bam}_tstaidsr.table2" \
-plots "${file1%_tsta.bam}_BQSR.pdf"
java -jar -Xmx120g ${GATK} -T PrintReads
-R "{reference}" \
-I "${file4}" \
-BQSR "${file4%_tstaids.bam}_tstaidsr.table1" \
-o "${file7}"
Note that I got 2 variant VCF from Ensembl (germline and somatic). My reference is Ensembl GRCh38.p5. I ran the command below to append 'chr' notation and change chrMT to chrM:
sed -e '/^[^#]/s/^/chr/' -e 's/^chrMT/chrM/'
I received this error:
##### ERROR MESSAGE: The provided VCF file is malformed at approximately line number 18354680: empty alleles are not permitted in VCF records
I used the command below to inspect my VCF file (it is ${gerVar}
that is malformed):
sed -n '18354680p'
which returned:
chr11 5249456 HbVar.633 G . . PhenCode_20140430;TSA=sequence_alteration;AA=A
You found the origin of your problem. So, what is the question ?
Yes, the question is, is there a better way of amending this error without redownloading the original VCF file to cross check what replaces the empty allele? Could this malformation due to the sed function, potentially opening up other empty alleles in the file?
I guess it's not due to sed functionality. It's quite possible that there will be empty alleles. So check how many of them are there, then if they are very few, remove them from the file.