Hi there,
I have been trying to troubleshoot my RNA-seq data from mice in two different conditions. It's a simple experiment, ovarectiomising (removing ovaries) one cohort and the other is just a sham operation, taking out the bone marrow and sequencing it. I did some biological analysis and found a statistically significant difference between the two groups in terms of bone volume (deprivation of estrogen causes bone loss) and so I know genes/cells have definitely changed. I extracted the RNA using a qiagen RNeasy kit with on-column gDNA digestion using their DNAse I enzyme.
I outsourced the sequencing, which used an rRNA depletion protocol (NEBNext rRNA Depletion Kit), as the RNA had RIN scores of between 5 and 6.8, and sequenced using Illumina Hi-Seq Rapid 100bp single read sequencing for all samples at 37.5 mil reads/sample (money was tight).
So I got the raw fastq files, aligned them to the GRCm39 which contains the comprehensive gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes) from gencode, found here - the one with ALL regions. I aligned them using STAR and then obtained adjusted counts using DegNorm which corrects for degraded RNA transcripts.
Following this, I loaded the adjusted counts file into R, performed some clustering and PCA which didn't show too much of a separation between samples, even after selecting only the transcripts which had counts of >5 in atleast 3 samples (out of 8) - a total of ~18k transcripts. Subsequently, DE analysis with DESEq2 did not have any DE genes with padj < 0.1 - although changing the minimum counts to >15 in ateast 3 samples gave 126 total DE genes, but none with a log2fc > 1 or < -1, showing that whatever changes were present were very minimal using this thresholding.
This is not exactly what I was expecting, but when adjusting the threshold values, a number of genes appear that are expected (collagen formation/breakdown), but I'm not sure if this is just p-hacking the results. But to be fair, it was a relatively short time period (2 weeks) to capture early events in bone degredation, but it should be sufficient to elicity changes at the gene level.
So, one thing I thought about was the high amount of duplicates, as you can see in the fastqc/multiqc files in this Github repo that I uploaded to share here - multiQC for summarised results and examples from 3 mice to look at individual duplicates and their results (post processing with TrimGalore).
Any ideas if there could be anything causing these results?
Thanks for your help.
Hello, I don't have experience on mouse data. I checked your qc files. The "Per base sequence content" and "Per sequence GC content" seems have some problem, "Per base sequence content" should be 4 horizontal lines except at the beginning positions. I think you can check rRNA contamination of your data, and also check other qc status of your library, like distribution of fragment size.