Hi, I am working on whole exome sequencing data whose pipeline has been standardized in the lab as such -
Align with bwa-mem --> SortSam based on coordinates with Picard --> MarkDuplicates --> Remove sequencing duplicates --> Variant calling --> BQSR --> Variant calling --> Annotate with annovar
When I started working on this (on our institutional server), I had not created a java temp directory and it was working fine, but over time I started receiving
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: java.io.IOException: No space left on device
Caused by: java.io.IOException: No space left on device
errors, because of which I created the java temp directory. Upon doing so I am receiving the following error -
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 1: null:A01223:29:HY2N7DMXX:2:2306:22562:23375
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:528)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:232)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
I tried working on it by adding read groups, fixing mate pair information, removing duplicates with tools apart from Picard, but nothing seems to be helping. I tried to remove the java temp directory, but I received
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: java.io.IOException: No space left on device
Caused by: java.io.IOException: No space left on device
yet again. I am looking for a solution as early as possible.
Thanks in advance :)
-Vinay
see also Markduplicates: Value Was Put Into Pairinfomap More Than Once ; Picard: Value was put into PairInfoMap more than once. Even bwa enabled -M ; Problem using PicardTools MarkDuplicates: Value Was Put Into Pairinfomap More Than Once ; etc...
Pierre, I also tried the solutions in these, and none of them seemed to work for me.The errors continued to stay. Trying multiple solutions, none of which worked led me to hypothesize that maybe the java temp directory was obstructing my analysis pipeline.
Pierre, there was also a comment here that suggested me to grep
A01223:29:HY2N7DMXX:2:2306:22562:23375
from my fastq file, and when I tried it, I found it twice in both of my fastq files. The commenter also linked me to two other posts -Out Of Disk Space With Picard Tools ? and Picard: Value was put into PairInfoMap more than once. Even bwa enabled -M. In the second post, the observation was similar, and you had suggested to remove the duplicates. Could you please advice me on how to do that?
Thank you
Edit: I could use uniq to remove duplicate entries from a regular fastq file, but I have gz files
what is the exact picard command line.
Hi Pierre,
Thank you for your response. The command used to run Picard is -
java -jar /path/to/picard/picard.jar MarkDuplicates INPUT=/path/to/bam/input.bam OUTPUT=/path/to/bam/input_marked.bam METRICS_FILE=/path/to/bam/input_metrics.txt
java -jar /path/to/picard/picard.jar MarkDuplicates REMOVE_SEQUENCING_DUPLICATES=true INPUT=/path/to/bam/input_marked.bam OUTPUT=/path/to/bam/input_deduplicated.bam METRICS_FILE=/path/to/bam/input_metrics.txt