I need a list of heat shock genes to exclude from my analysis. I'm working with hg38 genome. I'm not sure the best approach to find all heat shock genes, or if I can download it from somewhere.
Thanks.
EDIT -- Not just heat shock, perhaps more generally I need "stress response genes".
Stress is such a general term that it cannot be answered without more details. Stress can be the presence of heat or the presence of a chemotherapeutic agent or just the wrong pH in the incubator. I suggest you define what stress is in your context and then search NCBI for datasets where cells have been exposed to this stress and RNA-seq with proper experimental design (e.g. three biological replicates per stress and control condtion) was performed. Run it through a standard RNA-seq pipeline and extract the genes that come out differentially expressed.
Thanks for the idea. I do agree it is general. However I really do not have more details. I was simply asked to "remove stress response genes" because the biology expert thinks one of our replicates was subjected to some kind of stress during the experiment.
Let's assume I just want to focus on heat shock genes. Is there really no existing source of annotated heat shock genes?
Don't allow that the wet-lab guys fool you. If they want things removed, ask them to clearly specify what they exactly want. If you eventually remove the wrong genes, you'll get 100% of the blame so ask them to be specific ;-)
If one sample is suspected to have been subjected to uncontrolled factors, remove the whole sample, or remove nothing.
You can also do some diagnostics like PCA on the transformed counts (
rlog
orvst
in DESeq2) and see if on the global level you see evidence for a stress exposure (that would be the respective samples clustering away from the unaffected ones of the same treatment group).I don't have counts. This is ChIP-seq data of Pol II. For the most part the replicates agree with each other, but they think the heat shock genes are activated in one replicate, which affects our average gene metaplot, for example. I was just hoping there was some source of heat shock genes existing... Doesn't need to be a perfect list of genes or anything.
If one replicate is not right, replace it with a good one. But do not try to solve it with bioinformatics, ask your wet-lab colleagues to perform good experiments. Don't try to fix their errors and mistakes.
Overall they correlate very well. It's not a huge concern for us if this is just limited to a small subset of genes. We aren't doing differential expression. We're looking at the general profile of Pol II transcription with and without a treatment. If I remove these genes, and still see the differences, then we'll throw out the replicate.