I am trying to understand the metrics file generated by Markduplicates. The histogram found in this file is what I want to know.
The histogram of my run looks like this
And the information I have found describing this is here
“MarkDuplicates estimates the return on investment for sequencing a library to higher coverage than the observed coverage. The first column is the coverage multiple, and the second column is the multiple of additional actual coverage for the given coverage multiple. The first row (1x, i.e. the actual amount of sequencing done) should have ROI of approximately 1. The next row estimates the ROI for twice as much sequencing of the library. As one increases the amount of sequencing for a library, the ROI for additional sequencing diminishes because more and more of the reads are duplicates.”
The first question I have is: From what data is the program generating this data? Second, in my data it is true that the first row is 1, BUT the ROI for additional sequencing DOESN'T diminish Third, What doest exactly mean the Return of investment? The accuracy of the contig?
Finally, what this histogram is really telling is the number of times from which, increasing coverage (in my case 20 something) does not have a positive effect on my ROI (whatever this is)?