Hello, I ran a MarkDuplicates analysis on my STAR output of rat brain region samples and a few samples look pretty weird. I could just not include them, but I'm interested in what could have gone wrong in any part of the process from a technical side to better understand the process itself. Here's the output:
Specifically, the last one, and the one above HPC_C1. I have uploaded the Falco FastQC HTML reports for each of these two samples and their two reads, MultiQC STAR and MultiQC idxstats HTML reports for these two samples, as well as the full MultiQC STAR report for all my samples on our University Computing Centre's NextCloud, available here.
Incidentally, both of these were ran across multiple lanes. For samples such as these, I concatenated all their FASTQ files (separated by reads 1 and 2) before doing anything, and others seem to be okay (the ones with the short name ending with _LA
, like the first one). I don't currently have much more info on the setup used to perform the sequencing, aside from it being Illumina SBS. I can ask the facility but I'm not even sure what information I need so all extra info requests are welcome.
Is this different data than what we were discussing in last two questions? Simple explanation may be that the quality of initial RNA was poor for those samples, which may have led to over-amplification being employed to get enough sequenceable material, which led to large amount of sequence dups.