Question

MarkDuplicates RNASeq: A few samples look weird. What could be the cause?

0

Entering edit mode

2 days ago

Davor • 0

Hello, I ran a MarkDuplicates analysis on my STAR output of rat brain region samples and a few samples look pretty weird. I could just not include them, but I'm interested in what could have gone wrong in any part of the process from a technical side to better understand the process itself. Here's the output:

Picard MarkDuplicates - Counts Picard MarkDuplicates - Percentages

Specifically, the last one, and the one above HPC_C1. I have uploaded the Falco FastQC HTML reports for each of these two samples and their two reads, MultiQC STAR and MultiQC idxstats HTML reports for these two samples, as well as the full MultiQC STAR report for all my samples on our University Computing Centre's NextCloud, available here.

Incidentally, both of these were ran across multiple lanes. For samples such as these, I concatenated all their FASTQ files (separated by reads 1 and 2) before doing anything, and others seem to be okay (the ones with the short name ending with _LA, like the first one). I don't currently have much more info on the setup used to perform the sequencing, aside from it being Illumina SBS. I can ask the facility but I'm not even sure what information I need so all extra info requests are welcome.

qc markduplicates picard rnaseq • 203 views

ADD COMMENT • link updated 2 days ago by GenoMax 150k • written 2 days ago by Davor • 0

0

Entering edit mode

Is this different data than what we were discussing in last two questions? Simple explanation may be that the quality of initial RNA was poor for those samples, which may have led to over-amplification being employed to get enough sequenceable material, which led to large amount of sequence dups.

ADD REPLY • link 2 days ago by GenoMax 150k

score 0 · Answer 1 · 2025-03-22

What do you want to do? Differential analysis?

If so, ignore all these lowlevel metrics and see how samples compare in a PCA and how downstream analysis goes. You can always go back if things look odd and start removing samples but I would never ever exclude a sample because fastqc compains a bit or because some duplicate metrics look strange.