Question

No differentially expressed genes after multiple testing correction in mice

1

Entering edit mode

18 months ago

Sara ▴ 30

Hi all,

I am working with the RNA-seq data on mice (group A N=3 vs group B N=3). Mice are littermates, of which group A overexpresses a human transgene which I verified.

I have had .cram files from mouse data, and I converted them to .fastq files. Then I aligned them by hisat2 and used HTSeq count to generate the count files.

Here is an example of the commands:

#converting .cram files to fastq files
ls *.cram | parallel -j 8 --bar  'samtools view -h -T genome_mm10.fa {} |samtools collate -u -O  - | samtools fastq -1 {.}_paired1.fq.gz -2 {.}_paired2.fq.gz -0 /dev/null -s /dev/null -'

#trimming fastq files
ls *_paired1.fq.gz | parallel --bar -j12 'R2=$(echo {} | sed s/_paired1/_paired2/) && out=$(echo {} | sed s/_paired1.fq.gz/.fq.gz/) &&  trimmomatic PE {} $_paired2 -baseout $out LEADING:3 TRAILING:3 MINLEN:30'

#aligning and sorting sam files and converting them to bam files
ls *_1P.fastq.gz | parallel --bar -j8 'R2=$(echo {} | sed s/_1P.fastq.gz/_2P.fastq.gz/) && out=$(echo {} | sed s/_1P.fastq.gz/.bam/) && rg_sm=$(echo {} | cut -d. -f1) && hisat2 -1 {} -2 $R2 --dta -x /ref/mm10_indexed -p 4 --rg SM:$rg_sm --rg ID:**  --rg PL:RNAseq --rg LB:Illumina 2> ${out}.log | samtools sort -T /tmp/ -o $out 2>> ${out}.log'

#making the index file for bam files
ls *.bam| parallel --bar -j8 'samtools index {}'

#HTSeq count
ls *.bam| parallel --bar -j8 'out=$(echo {} | sed s/.bam/.count/) && htseq-count -f bam -r pos -q {} mm10.gtf.gz > $out'

Then I used DESeq2 to find differentially expressed genes.

dds <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable,
                                  directory=folder,
                                  design=~condition)

But nothing was significant after multiple testing corrections.

While the other analysis by Limma and edgeR showed a few significant genes. The output results of Limma and edgeR have some genes in common, while there is nothing in common between the results of DESeq2 compared to the limma and edgeR. Is it normal?

Another question is that when the padj value is not significant can we report the differentially expressed genes based on the p-value?

edgeR DESeq2 Limma mouse RNA-seq • 1.3k views

ADD COMMENT • link updated 18 months ago by seidel 11k • written 18 months ago by Sara ▴ 30

0

Entering edit mode

Thanks for your comment. I am attaching the MA, PCA, and Volcano plots.

PlotMA-dds

PCA plot

Volcano plot

ADD REPLY • link 18 months ago by Sara ▴ 30

2

Entering edit mode

The PCA indicates that spread is large and with n=3 at this spread the study is underpowered. It's expected to get few/no DEGs. Was this done in some sort of batches, or any confounders you can think of? The MA-plot suggests that there is not much going on in terms of fold changes that cold qualify as DE in my eyes.

ADD REPLY • link 18 months ago by ATpoint 85k

0

Entering edit mode

Have you tried controlling for replicates? design = ~ replicates + condition ?

ADD REPLY • link 18 months ago by Barry Digby ★ 1.3k

0

Entering edit mode

No, I don't have replicates. besides all of them have the same age and same gender

ADD REPLY • link 18 months ago by Sara ▴ 30

score 3 · Accepted Answer · 2023-05-10

While the other analysis by Limma and edgeR showed a few significant genes. The output results of Limma and edgeR have some genes in common, while there is nothing in common between the results of DESeq2 compared to the limma and edgeR. Is it normal?

That's not unheard of. Generally you'll see a small shift in p-values and fold-changes, so marginally significant findings from one tool might fall on the other side of significance in another. Given the apparently miniscule effect-size seen here what you've observed isn't then unsurprising.

In the comments above 2 main points were raised that are likely relevant. The first is statistical power. A 3 vs 3 comparison can be sufficient, but in cases where the biological change is very slight you may simply need more samples.

The second point raised was of batch effects. The PCA in particular hints that perhaps there are some odd batch effects, given the odd sample clustering (this could also simply be due to the small effect size though).

Another question is that when the padj value is not significant can we report the differentially expressed genes based on the p-value?

No, this would be scientific misconduct.