Hi All- Picard/markDuplicates sometimes produces the histogram of sequencing vs gain and sometimes doesn't. Does anybody know how/why markDuplicates decides this?
(I haven't checked this carefully and I haven't looked at the code... apologies if I missed it in the docs.)
Edit: Java command:
java -jar -Xmx2g ~/bin/MarkDuplicates.jar I=input.bam O=/dev/null M=md.txt AS=true VALIDATION_STRINGENCY=SILENT java -jar ~/bin/MarkDuplicates.jar --version 1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010)
Edit2: Examples of a metrics file with histogram (I stripped some rows) and one without
cat grm030_pb_BS.mm9_pb11.md.txt ## htsjdk.samtools.metrics.StringHeader # picard.sam.MarkDuplicates INPUT=[/lustre/sblab/berald01/repository/bwameth/grm030_pb_BS.mm9_pb11.bam] OUTPUT=/dev/null METRICS_FILE=grm030_pb_BS.mm9_pb11.md.txt ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates REMOVE_DUPLICATES=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false ## htsjdk.samtools.metrics.StringHeader # Started on: Fri Aug 01 11:23:16 BST 2014 ## METRICS CLASS picard.sam.DuplicationMetrics LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE 252563 3095840 1110601 38000 182731 156031 0.062608 160862710 ## HISTOGRAM java.lang.Double BIN VALUE 1.0 1.052566 2.0 2.085069 3.0 3.097891 4.0 4.091408 ... 98.0 46.844894 99.0 47.00454 100.0 47.161142
This file has no histogram:
cat grm035_REBUiLT_AD06.clean.pb11.md.txt ## htsjdk.samtools.metrics.StringHeader # picard.sam.MarkDuplicates INPUT=[/lustre/sblab/berald01/repository/bwameth/grm035_REBUiLT_AD06.clean.pb11.bam] OUTPUT=/dev/null METRICS_FILE=grm035_REBUiLT_AD06.clean.pb11.md.txt ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates REMOVE_DUPLICATES=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false ## htsjdk.samtools.metrics.StringHeader # Started on: Fri Aug 01 11:23:15 BST 2014 ## METRICS CLASS picard.sam.DuplicationMetrics LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE 46913 7889832 0 18667 74972 49840 0.010654 1220238098 Unknown Library 135291 21433156 0 96225 323809 143791 0.017298 1251759451
Thanks!
what's your java command line please ?
Thanks for replying Pierre. Java command included, I thought there is nothing special there...
and md.txt is empty or not produced ?
I put a couple of example files, they probably look messy. I kind of suspect that if there are multiple read groups in the input bam file the is histogram is not produced, could it be the case?