Hi all,
I am working with the RNA-seq data on mice (group A N=3 vs group B N=3). Mice are littermates, of which group A overexpresses a human transgene which I verified.
I have had .cram files from mouse data, and I converted them to .fastq files. Then I aligned them by hisat2 and used HTSeq count to generate the count files.
Here is an example of the commands:
#converting .cram files to fastq files
ls *.cram | parallel -j 8 --bar 'samtools view -h -T genome_mm10.fa {} |samtools collate -u -O - | samtools fastq -1 {.}_paired1.fq.gz -2 {.}_paired2.fq.gz -0 /dev/null -s /dev/null -'
#trimming fastq files
ls *_paired1.fq.gz | parallel --bar -j12 'R2=$(echo {} | sed s/_paired1/_paired2/) && out=$(echo {} | sed s/_paired1.fq.gz/.fq.gz/) && trimmomatic PE {} $_paired2 -baseout $out LEADING:3 TRAILING:3 MINLEN:30'
#aligning and sorting sam files and converting them to bam files
ls *_1P.fastq.gz | parallel --bar -j8 'R2=$(echo {} | sed s/_1P.fastq.gz/_2P.fastq.gz/) && out=$(echo {} | sed s/_1P.fastq.gz/.bam/) && rg_sm=$(echo {} | cut -d. -f1) && hisat2 -1 {} -2 $R2 --dta -x /ref/mm10_indexed -p 4 --rg SM:$rg_sm --rg ID:** --rg PL:RNAseq --rg LB:Illumina 2> ${out}.log | samtools sort -T /tmp/ -o $out 2>> ${out}.log'
#making the index file for bam files
ls *.bam| parallel --bar -j8 'samtools index {}'
#HTSeq count
ls *.bam| parallel --bar -j8 'out=$(echo {} | sed s/.bam/.count/) && htseq-count -f bam -r pos -q {} mm10.gtf.gz > $out'
Then I used DESeq2 to find differentially expressed genes.
dds <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable,
directory=folder,
design=~condition)
But nothing was significant after multiple testing corrections.
While the other analysis by Limma and edgeR showed a few significant genes. The output results of Limma and edgeR have some genes in common, while there is nothing in common between the results of DESeq2 compared to the limma and edgeR. Is it normal?
Another question is that when the padj value is not significant can we report the differentially expressed genes based on the p-value?
Thanks for your comment. I am attaching the MA, PCA, and Volcano plots.
The PCA indicates that spread is large and with n=3 at this spread the study is underpowered. It's expected to get few/no DEGs. Was this done in some sort of batches, or any confounders you can think of? The MA-plot suggests that there is not much going on in terms of fold changes that cold qualify as DE in my eyes.
Have you tried controlling for replicates?
design = ~ replicates + condition
?No, I don't have replicates. besides all of them have the same age and same gender