Entering edit mode
5.9 years ago
Peter Chung
▴
210
I am new in WGS analysis. First, I combine all my bam files into one and it's 157GB and then add read group on it to 159GB. Then I do the picard markduplicate step by using the following code:
java -Xmx8g -Djava.io.tmpdir=${TMPFILE} -jar $PICARD MarkDuplicates \
INPUT=${FILE}.addRG.bam \
OUTPUT=${FILE}.addRG.mkdup.bam \
ASSUME_SORTED=true \
TMP_DIR=${TMPFILE} \
REMOVE_DUPLICATES=true \
READ_NAME_REGEX=null \
METRICS_FILE=${FILE}_Metrics.txt
It returns no error but the output file is 18Gb and there is not metrics file generated. I don't know what happened, any advice? Thanks.
The result outputs from picard MarkDuplicates, but there is no error inside.
[Fri Jan 18 08:53:43 UTC 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 111.00 minutes.
Runtime.totalMemory()=7318011904
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Alignments added out of order in
SAMFileWriterImpl.addAlignment for file:///data/data/Samples/CHS
/SRS006915/SRS006915.addRG.mkdup.bam. Sort order is coordinate. Offending records are at [*:0]
and [chrM:1]
at htsjdk.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.java:213)
at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:200)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:406)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
Hello Peter Chung ,
the message you are showing is an error. You can see it by the word
Exception
. A quick web search suggest, that the sorting order given in the header is different to the alignment. People who uses Picard's ReorderSam seem to have this problem.So the questions are:
bam
files?fin swimmer
oh thanks. First I used bwa to align them and then use samtools sort to sort each bam files. Afterwards, I combined all the bam files into one bam file by samtools merge. After that, I used samtools addreplacerg to add readgroup.
bwa and samtools sort
samtools merge
samtools addreplacerg
any advice? thanks.
Hmm, I cannot see any crucial thing. What version of samtools and picard are you using? Also maybe we can see something in the header of the input file for MarkDuplicate (
samtools view -H input.bam
).BTW: You can define the ReadGroup already with
bwa
. Then no extra step withsamtools addreplacerg
is neccessary.Hi, have you tried increasing the heap size? also, check the
TMP_DIR
location has more than 159 GB free space.