Hello, I ran a MarkDuplicates analysis on my STAR output and there are some optical duplicates, which got me reading, and, while there's a lot of discussion, it's difficult to sift out info on RNASeq and optical dups, specifically.
We're interested in duplicates in RNASeq, but optical ones are a technical artifact, but I got the feeling deduplication in any shape usually isn't done at all for RNASeq. I suppose the main deciding factor is the accuracy with which we're able to say whether a duplication is a technical artifact, and not a real duplication (with any tool available at our disposal).
Are my views and impressions correct? Is there a current consensus/best practice opinion on this?
Generally these are applicable only if your data was run on patterned flowcells (which may be the norm of late). As long as the loading was properly optimized the occurance of optical dups should be minimal : https://knowledge.illumina.com/instrumentation/novaseq-x-x-plus/instrumentation-novaseq-x-x-plus-reference_material-list/000008911
clumpify.sh
will allow you to identify optical replicates --> Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.