No differentially expressed genes after multiple testing correction in mice
1
1
Entering edit mode
18 months ago
Sara ▴ 30

Hi all,

I am working with the RNA-seq data on mice (group A N=3 vs group B N=3). Mice are littermates, of which group A overexpresses a human transgene which I verified.

I have had .cram files from mouse data, and I converted them to .fastq files. Then I aligned them by hisat2 and used HTSeq count to generate the count files.

Here is an example of the commands:

#converting .cram files to fastq files
ls *.cram | parallel -j 8 --bar  'samtools view -h -T genome_mm10.fa {} |samtools collate -u -O  - | samtools fastq -1 {.}_paired1.fq.gz -2 {.}_paired2.fq.gz -0 /dev/null -s /dev/null -'

#trimming fastq files
ls *_paired1.fq.gz | parallel --bar -j12 'R2=$(echo {} | sed s/_paired1/_paired2/) && out=$(echo {} | sed s/_paired1.fq.gz/.fq.gz/) &&  trimmomatic PE {} $_paired2 -baseout $out LEADING:3 TRAILING:3 MINLEN:30'

#aligning and sorting sam files and converting them to bam files
ls *_1P.fastq.gz | parallel --bar -j8 'R2=$(echo {} | sed s/_1P.fastq.gz/_2P.fastq.gz/) && out=$(echo {} | sed s/_1P.fastq.gz/.bam/) && rg_sm=$(echo {} | cut -d. -f1) && hisat2 -1 {} -2 $R2 --dta -x /ref/mm10_indexed -p 4 --rg SM:$rg_sm --rg ID:**  --rg PL:RNAseq --rg LB:Illumina 2> ${out}.log | samtools sort -T /tmp/ -o $out 2>> ${out}.log'

#making the index file for bam files
ls *.bam| parallel --bar -j8 'samtools index {}'

#HTSeq count
ls *.bam| parallel --bar -j8 'out=$(echo {} | sed s/.bam/.count/) && htseq-count -f bam -r pos -q {} mm10.gtf.gz > $out'

Then I used DESeq2 to find differentially expressed genes.

dds <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable,
                                  directory=folder,
                                  design=~condition)

But nothing was significant after multiple testing corrections.

While the other analysis by Limma and edgeR showed a few significant genes. The output results of Limma and edgeR have some genes in common, while there is nothing in common between the results of DESeq2 compared to the limma and edgeR. Is it normal?

Another question is that when the padj value is not significant can we report the differentially expressed genes based on the p-value?

edgeR DESeq2 Limma mouse RNA-seq • 1.3k views
ADD COMMENT
0
Entering edit mode

Thanks for your comment. I am attaching the MA, PCA, and Volcano plots.

PlotMA-dds

PCA plot

Volcano plot

ADD REPLY
2
Entering edit mode

The PCA indicates that spread is large and with n=3 at this spread the study is underpowered. It's expected to get few/no DEGs. Was this done in some sort of batches, or any confounders you can think of? The MA-plot suggests that there is not much going on in terms of fold changes that cold qualify as DE in my eyes.

ADD REPLY
0
Entering edit mode

Have you tried controlling for replicates? design = ~ replicates + condition ?

ADD REPLY
0
Entering edit mode

No, I don't have replicates. besides all of them have the same age and same gender

ADD REPLY
3
Entering edit mode
18 months ago

While the other analysis by Limma and edgeR showed a few significant genes. The output results of Limma and edgeR have some genes in common, while there is nothing in common between the results of DESeq2 compared to the limma and edgeR. Is it normal?

That's not unheard of. Generally you'll see a small shift in p-values and fold-changes, so marginally significant findings from one tool might fall on the other side of significance in another. Given the apparently miniscule effect-size seen here what you've observed isn't then unsurprising.

In the comments above 2 main points were raised that are likely relevant. The first is statistical power. A 3 vs 3 comparison can be sufficient, but in cases where the biological change is very slight you may simply need more samples.

The second point raised was of batch effects. The PCA in particular hints that perhaps there are some odd batch effects, given the odd sample clustering (this could also simply be due to the small effect size though).

Another question is that when the padj value is not significant can we report the differentially expressed genes based on the p-value?

No, this would be scientific misconduct.

ADD COMMENT
1
Entering edit mode

...when the padj value is not significant can we report the differentially expressed genes based on the p-value?

Devon Ryan is correct about not reporting the genes that do not survive a false discovery correction. However that does not mean you can't use them for the same purpose (i.e. to get clues). If they have given you insight into some process, or helped you design the next experiment, you can refer to that that indirectly, as they simply become part of the story. And if they really bear out, and are reproducible they will have served a purpose, regardless of not being directly reportable. Not all results are publishable, but it's also essential to distinguish what is useful vs. a false discovery.

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6