Dear all, I am running picard MarkDuplicates but getting the error "Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file:...". And it does not produce any output files. I tried picard's SortSam by coordinates - did not help. Then I added EOF to the .bam file by the home-made script (it worked before on other data), still MarkDuplicates does not work and show the same error message. I know that my .bam files are ok, because I can perform my downstream analysis with no problems and the results seam reasonable. Just one step I can not pass - this duplicates marking... Is there any way to make picard not to pay attention to the fact that .bam files are truncated? :)
(the command I ran: picard MarkDuplicates I=in.bam O=out.bam M=metrics.txt ASSUME_SORTED=true REMOVE_DUPLICATES=false VALIDATION_STRINGENCY=LENIENT. I tried VALIDATION_STRINGENCY=SILECT, too)
I will be so glad to any help! :)
Your BAM is really incomplete/corrupted. You will lose data even if you manage to bypass picard's sanity check. You need to re-download or re-process it.
can it be that it was produced by the older software version, thats why it has no EOF? or is it not the problem of EOF after all?
EOF may or may not be present, but it was not expected in either case (means that the bam file was end prematurely, owing to some corruption)
thank you! is there a way to find out what is wrong then? because all the files are of the size I would expect and when I calculate read depth from these files it looks good and normal. and i have 150 these files... so, really really dont want to trash them and start everything from the beginning:)
Run
samtools view -H your.bam > /dev/null
on all your bams. If you see "EOF marker is absent" for all your bams, they must have been produced by ancient tools, which would really surprise me nowadays – EOF marker was added over 5 years ago. If some of them yields the warning but others not, they are corrupted files you should fix.thank you for your advice! the original .bam files i have downloaded from a cancer database, they are kind of old and theoretically could be processed by the old version. I ran 'bedtools intersect' on these files to produce new "intersected" .bam files. That means that my new .bams will have EOF anyway, because I am using the new software edition, right? or it may depend on the original .bams?
There is a difference between EOF not present and premature EOF - in the later case, the EOF was probably present, but it was not expected. It means that the bam may be corrupted.
You may try converting bam -> sam and then sam back to bam, if it resolves the problem.
well, now i have tried to convert my bam into sam, ant then into bam again, and that way MarkDuplicates worked. I guess, I have to repeat this elabortae procedure for each of files... sad :)