Entering edit mode
4.7 years ago
helen
▴
70
Hi,
I used RNA-Seq data (a pair of control and treatment sample) as input for GATK4 for variant calling. In the HaplotypeCaller step the engine shut down after a few minutes and an error returned as follows:
A USER ERROR has occurred: Read A00355:100:HJCKMDRXX:1:1154:5367:30765 chr1:43621182-43621257 is malformed: read ends with deletion. Cigar: 58H52M2D2M1D3M1I5M4I2M5I3M2D3M2I1D. Although the SAM spec technically permits such reads, this is often indicative of malformed files.
And here is my code:
gatk --java-options "-Xmx20G -Djava.io.tmpdir=./" HaplotypeCaller -ERC GVCF -R hg38.fa -I Control_recal.bam --dbsnp dbsnp_146.hg38.vcf.gz -O Control_g.vcf
same code for the treatment sample except for the prefix.
Does anyone know how to fix this problem? Thanks
Hello helen ,
the problem occurs due to a malformed/strange bam file. How was this created?
fin swimmer
Hi fin,
I used STAR 2-pass for alignment, and this step generates the sam files
Then I used Picard to add read groups, covert sam to bam and sort, and then mark duplicates
Then GATK was to split 'N' trim, base quality recalibration, apply BQSR, and variant calling
I am not sure the malformed bam file was created at which step, though...
Please post on the GATK forum.