Question

What is the stance on optical duplicates in RNASeq?

0

Entering edit mode

4 months ago

Davor • 0

Hello, I ran a MarkDuplicates analysis on my STAR output and there are some optical duplicates, which got me reading, and, while there's a lot of discussion, it's difficult to sift out info on RNASeq and optical dups, specifically.

We're interested in duplicates in RNASeq, but optical ones are a technical artifact, but I got the feeling deduplication in any shape usually isn't done at all for RNASeq. I suppose the main deciding factor is the accuracy with which we're able to say whether a duplication is a technical artifact, and not a real duplication (with any tool available at our disposal).

Are my views and impressions correct? Is there a current consensus/best practice opinion on this?

duplicates rnaseq optical-duplicates • 531 views

ADD COMMENT • link 4 months ago by Davor • 0

1

Entering edit mode

but optical ones are a technical artifact

Generally these are applicable only if your data was run on patterned flowcells (which may be the norm of late). As long as the loading was properly optimized the occurance of optical dups should be minimal : https://knowledge.illumina.com/instrumentation/novaseq-x-x-plus/instrumentation-novaseq-x-x-plus-reference_material-list/000008911

clumpify.sh will allow you to identify optical replicates --> Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.

ADD REPLY • link 4 months ago by GenoMax 152k

0

Entering edit mode

Thanks, I attempted a Picard MarkDuplicates analysis so far and it did identify some - I posted the report in this post. The machine used did indeed use patterned flowcells. I'll try Clumpify too. Would it make sense to remove them in this specific case?

ADD REPLY • link 4 months ago by Davor • 0