Estimate Library Complexity Output Histogram - what do duplicate sets mean exactly?
0
1
Entering edit mode
5.8 years ago
a.rex ▴ 350

I have run this command on my bam file (it has already gone through MARKDUP):

java -jar picard.jar EstimateLibraryComplexity INPUT=lib.bam OUTPUT=sample_libcomp.txt

I get this output:

## METRICS CLASS        picard.sam.DuplicationMetrics
LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED     SECONDARY_OR_SUPPLEMENTARY_RDS  UNMAPPED_READS  UNPAIRED_READ_DUPLICATES        READ_PAIR_DUPLICATES    READ_PAIR_OPTICAL_DUPLICATES    PERCENT_DUPLICATION     ESTIMATED_LIBRARY_SIZE
Unknown 0       15424936        0       0       0       13638   2       0.000884        8719138504

## HISTOGRAM    java.lang.Integer
duplication_group_count Unknown
1       15398401
2       12287
3       514
4       71
5       17
6       6
7       2
8       1
9       1

Does this mean that I have 15398401 reads with no other duplicates (i.e. 15398401 that appear once)? I have looked on other blogs and the histogram first column represents duplicate sets?

picard alignment • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 2070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6