Our recent attempt of obtaining illumina hiseq mate pair reads was plagued by high read duplication (8kb, >95%duplicates). We were informed by the service providers that it is common in 8kb mate pair library and: 1) small insert size mate pair libraries e.g. 1kb,2kb,3kb etc. (compared to 8kb,10kb,20kb etc) produces less duplicates. 2) use of illumina's 'nextera' kit would reduce read duplication level.
So I was excited when two days ago we received new lanes of mate pair reads using 3kb, 5kb and 8kb inserts and 'nextera' kits. Ridiculously all of them have read duplication level >90% !!! Needless to say their lack of usefulness in denovo genome assembly, scaffolding etc.
Can anyone please help me finding some published docs/stats or from lab experience on mate pair library and duplication level? Also is there any agreed consensus on what should be the accepted level of duplication?