Hi,
I am doing RNA-seq analysis of cells treated with (condition) and without (control) a drug. I have two technical replicates. I am running salmon and then tximport and limma. After comparing the condition versus control I see a strong effect (~4000 genes are DE: padj < 0.05 & |logFC| > 1). However, at the individual level, I observe that a lot of strongly expressed or downregulated genes are in fact pseudogenes or uncharacterized proteins.
I did GSEA with Hallmark pathways from MSigDB and found strong enrichment for Protein secretion, Mitotic spindle, G2M checkpoint pathways, but I am not sure if these results are trustworthy. I tried to look into literature but didn't find much.
So what does this overexpression of pseudogenes mean? Is it a technical artifact or could mean something biologically relevant? Any help/advice is highly appreciated!
Best regards,
Gherman
you might also want to filter for absolute expression. fold-change is not well defined when one sample or the other has nearly zero expression. I've seen this effect and while it could be real, we chalked it up to technical artifact: sample processing in the wetlab caused one sample type to lose long noncoding rnas, and the other to retain them. Since we wanted to see only 'real' genes, it was decided to ignore these results. An easy way is to filter off anything less than the 5th percentile gene expression in all samples, so you're only looking at 'seriously expressed' genes. Again, they could be a real biological effect i'm telling you to filter off. Your call what to do with too many results!
Have you looked at the genomic location of the pseudogenes affected ? I think that many of them are located in the pericentromeric/subtelomeric regions and are subject to heterochromatin silencing. If the drug you used affects the chromatin structure, your results could make biological sense.