Empty output from Picard's EstimateLibraryComplexity
1
1
Entering edit mode
8.3 years ago
rubic ▴ 270

Hi,

I'm running Picard's EstimateLibraryComplexity on 12 bam files, that are pretty shallow (~400000 reads per file), no other arguments except for I and O, and am getting no output other the standard output messages.

Note that I do find duplicates in these data. For example this is the standard output of Picard for one sample:

INFO    2016-09-26 18:20:00 MarkDuplicates  Start of doWork freeMemory: 2046635632; totalMemory: 2058354688; maxMemory: 28478275584

INFO    2016-09-26 18:20:00 MarkDuplicates  Reading input file and constructing read end information.

INFO    2016-09-26 18:20:00 MarkDuplicates  Will retain up to 113009030 data points before spilling to disk.

WARNING 2016-09-26 18:20:02 AbstractDuplicateFindingAlgorithm   Default READ_NAME_REGEX '[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*' did not match read name '534371-1'.  You may need to specify a READ_NAME_REGEX in order to correctly identify optical duplicates.  Note that this message will not be emitted again even if other read names do not match the regex.

INFO    2016-09-26 18:20:16 MarkDuplicates  Read 129293 records. 0 pairs never matched.

INFO    2016-09-26 18:20:19 MarkDuplicates  After buildSortedReadEndLists freeMemory: 1967321816; totalMemory: 2884632576; maxMemory: 28478275584

INFO    2016-09-26 18:20:19 MarkDuplicates  Will retain up to 889946112 duplicate indices before spilling to disk.

INFO    2016-09-26 18:23:23 MarkDuplicates  Traversing read pair information and detecting duplicates.

INFO    2016-09-26 18:23:23 MarkDuplicates  Traversing fragment information and detecting duplicates.

INFO    2016-09-26 18:23:23 MarkDuplicates  Sorting list of duplicate records.

INFO    2016-09-26 18:23:26 MarkDuplicates  After generateDuplicateIndexes freeMemory: 3237064784; totalMemory: 10367795200; maxMemory: 28478275584

INFO    2016-09-26 18:23:26 MarkDuplicates  Marking 91622 records as duplicates.

INFO    2016-09-26 18:23:26 MarkDuplicates  Found 0 optical duplicate clusters.

INFO    2016-09-26 18:23:36 MarkDuplicates  Before output close freeMemory: 10352402680; totalMemory: 10367795200; maxMemory: 28478275584

INFO    2016-09-26 18:23:37 MarkDuplicates  After output close freeMemory: 10352475912; totalMemory: 10367795200; maxMemory: 28478275584

But then this is the stard output of EstimateLibraryComplexity of the same sample:

INFO    2016-09-26 18:23:38 EstimateLibraryComplexity   Will store 46230966 read pairs in memory before sorting.

INFO    2016-09-26 18:23:46 EstimateLibraryComplexity   Finished reading - moving on to scanning for duplicates.

[Mon Sep 26 18:23:46 EDT 2016] picard.sam.EstimateLibraryComplexity done. Elapsed time: 0.12 minutes.

Anyone ever experienced that?

RNA-Seq picard EstimateLibraryComplexity • 3.6k views
ADD COMMENT
1
Entering edit mode
8.3 years ago

Although unlikely, I suppose it's a formal possibility at low coverage that none of the reads are duplicated. If so, then library complexity cannot be estimated (it's based on the degree of duplication). You can check by running MarkDuplicates to see if any are present.

ADD COMMENT
0
Entering edit mode

Could be a PCR-free library prep?

ADD REPLY
0
Entering edit mode

It's a selction for short RNAs (miRs) but MarkDuplicates reports 91622 records as duplicates

ADD REPLY
0
Entering edit mode

Have the data been trimmed to a length typical for miRs (~30bp)? EstimateLibraryComplexity matches the first 50bp to identify duplicates. It may not work if the read lengths are shorter (although I don't know for sure).

But it's unclear why you need this metric, since MarkDuplicates indicates that you're near saturation - 70% (91662/129293) of the reads are duplicates.

ADD REPLY

Login before adding your answer.

Traffic: 3571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6