I am Calling snps and indels using the pipeline referred in this post: http://www.biostars.org/post/show/1268/what-is-the-best-pipeline-for-human-whole-exome-sequencing/ In a step, realign, my code like this:
java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R GRCh37.c.fasta -I ../s_4.sort.rg.bam -o ../s_4.intervals
And usually it cannot generate a interval file...(But I can really get it sometimes, I do not know why) The error message like this:
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: SAM/BAM file ../s_4/s_4.sort.rg.bam is malformed: Premature EOF; BinaryCodec in readmode; streamed file (filename not available)
I got s_4.sort.rg.bam by sorting it using samtools sort and then using picardtools AddOrReplaceReadGroups.jar .
Furthermore, if I samtools the file s_4.sort.rg.bam, the warning like this:
samtools view s_4.sort.rg.bam | less
[bam_header_read] EOF marker is absent. The input is probably truncated.
So how to deal with this problem?
But my raw bam files are also truncated. How could I deal with that? By the way, why I can run -T UnifiedGenotyper but cannot run -T RealignerTargetCreator
The answer is the same - remake the BAM (perhaps from the raw reads). 9 times out of 10, this means something bad happened and you shouldn't want to continue your analysis with the defective file.
I don't know why GATK is pickier on one command than another.