Estimate Library Complexity Output Histogram - what do duplicate sets mean exactly?

1

Entering edit mode

5.8 years ago

a.rex ▴ 350

I have run this command on my bam file (it has already gone through MARKDUP):

java -jar picard.jar EstimateLibraryComplexity INPUT=lib.bam OUTPUT=sample_libcomp.txt

I get this output:

## METRICS CLASS        picard.sam.DuplicationMetrics
LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED     SECONDARY_OR_SUPPLEMENTARY_RDS  UNMAPPED_READS  UNPAIRED_READ_DUPLICATES        READ_PAIR_DUPLICATES    READ_PAIR_OPTICAL_DUPLICATES    PERCENT_DUPLICATION     ESTIMATED_LIBRARY_SIZE
Unknown 0       15424936        0       0       0       13638   2       0.000884        8719138504

## HISTOGRAM    java.lang.Integer
duplication_group_count Unknown
1       15398401
2       12287
3       514
4       71
5       17
6       6
7       2
8       1
9       1

Does this mean that I have 15398401 reads with no other duplicates (i.e. 15398401 that appear once)? I have looked on other blogs and the histogram first column represents duplicate sets?

picard alignment • 1.2k views

ADD COMMENT • link 5.8 years ago by a.rex ▴ 350

Login before adding your answer.