Dear friends, Maybe the post title is not described accurately, but I do not know how to say it. I want to download some data from SRA database on NCBI as a control group to make a comparison with my treatment group so that I can do some downstream analysis like differently expressed genes analysis. But I do not know how to choose proper raw data. The first problem is that if I can compare the single-end data with paired-end data? If the size of raw data is different, can I compare them directly? Another question is that after getting the gene expression matrix, do I need to use the TMM method to eliminate the batch effect?
You can only remove batch effect between different experiments, if at least one group overlaps between the two experiments. As I read your design correctly you want to download controls... To compare with your treatment group... It sounds like you don't have at least one overlapping group in both experiments.
Maybe I need to make it more clear. My case is that I have three amniotic epithelial cell samples (AECs), and I want to find out the differently expressed genes between AECs and hESC. However, I do not have hESC's data, so I have to download some from the SRA database. Actually, I did find some hESCs from different projects. But the result of PCA is not good. hESC samples from the different projects can not cluster together. So I want to know how to figure out this issue. Thanks a lot!