Hello everyone,
I have 4 sets of RNAseq data (RNA derived from eyes) of two different strains of mice.
1st strain - Left Eye, Right eye 2nd strain - Left Eye, Right eye
My aim is to find out genes that are deferentially expressed in eyes between these two strains of mice at p-value of .05.
I have already aligned the reads and calculated RPKM at different levels including genes, transcripts, exons. But I need to do a statistical testing that can help me to calculate p-value(differential expression) using RPKM values.
My first question:
1) Can I use two different eye data (left and right) from the same strain as biological replicates? I calculated correlation between expression of all genes(RPKM) between left eye and right eye from the same strain and it was really high .969 for first strain but kind of low .83 for the second strain. Also, the high correlation could be a result of lot of genes with very small values of RPKM. Only 8000 out of 24,000 genes have RPKM > 5.
2) Which tool would be the best for my case to calculate p-values of differential expression of a gene between two samples?
Thanks in advance.
I'm sorry if I got it wrong, but does this mean that you use the same RPKM for downstream analysis (DGE here) in DESeq or edgeR?
No, I use raw counts per exon for DEGseq and edgeR.
The tutorials about edgeR suggest to remove those genes does not that have at least 1 read per million in at least 'n' samples ( n = smallest group of samples). But the DESeq tutorials doesnt include this step. Should we remove those genes or keep them in the data analysis pipeline ?