Hi all,
I am working on rare diseases, therefore I have a very limited number of samples. The study that I am working on now has 2 samples. Sample 1 comes from a child with the disease and Sample 2 is from the father of the child and he has healthy. The setup is explained in detail below:
Dataset: RNA-seq data from 2 samples
- Sample 1: child, diseased, RNA-seq data is from peripheral blood lymphocytes
- Sample 2: father, healthy, RNA-seq data is from peripheral blood endothelial cells
I have completed the differential gene expression analysis and obtained differentially expressed gene list (DEGs). Because of lack of samples, I have to use Sample 2 as the control of Sample 1. I try to decrease the variance as much as possible. Therefore, I would like to find endothelial cell-specific genes and lymphocyte-specific genes to extract them from DEGs.
Is there a database giving a list of cell-type specific genes? Or could you suggest me an alternative solution?
Thanks!
Your experiment has the problem that 1) there are no replicates (at least from what I understnad, please elaborate) and 2) that you have two major confounders, namely the cell types are completely different and the age of the donors are completely different. Tough situation, I doubt that there is a reliable in silico way of really coming up with some good candidates, lots of confounding here. If the money allows it you could produce well-replicated RNA-seq from health lymphos of a child and healthy endos from an adult to get a list of DEGs that sufficiently explain the age- and cell type difference. All the wetlab part identical to the samples you already produced, so same isolation techniques, same kits, same everything. Then see what is left after removing those DEGs from your DE results of Sample1/2. Kind of a subtractive analysis if you will.
Thank you for your response. Unfortunately, we do not have replicates neither and we do not have the possibility of producing RNA-seq data from healthy subjects. I considered getting
from a database, but in this case, there will be problems caused by differences in experimental conditions. But since the situation is already problematic, would you suggest this approach (better than nothing)? Thanks a lot.
You would add additional uncertainty because any DEG could be due to the batch effect plus you do not even know whether the DEGs from your samples are reliable due to the lack of replicates. I personally (and I say this because we had once a project with suboptimal data that never got published) would seriously think about and discuss with PI and colleagues whether it is even worth investing time and effort here due to the very suboptimal data you have available. And I of course say this carefully since I have no insight into the project and what data you have available beyond this RNA-seq.
Thank you for your response, I will talk to my PI about the situation.
Curious as to why you chose to do RNAseq rather than DNAseq. While no replicate parts is understandable (human rare samples) why did you choose to use two different types of cell?
Endothelial cell database: https://vibcancer.be/software-tools/endodb
Immune cell expression database: https://dice-database.org/
Thank you for your response, the study had completed before I started working on unfortunately, so I could not have much effect on study design.