Entering edit mode
22 months ago
star
▴
350
I have downloaded several samples from 5 studies (5 batches).
Example of my count table:
S_rep1_batch1 S_rep2_batch1 S_rep1_batch2 S_rep2_batch2 S_rep3_batch2 . . .
Gene1 34 54 65 76 67
Gene2 87 77 90 35 19
Gene3 47 67 70 85 99
.
.
I would like to do Differentially Enrichment Analysis (DEA) between samples, also compare them based on gene expression profile (Heatmap and cluster based on a subset of gene list). To remove batch effect I have used RUVr
with k=13 from RUVSeq
R packages.
RUVr <- RUVSeq::RUVr(df, genes, k=13, res)
To calculate DEA between samples, I have used counts(RUVr)
to make DGEList and to make a gene expression profile and calculate TPM value, I have used normCounts(RUVr)
.
questions:
- normCounts(RUVr), is the correct input for calculating TPM?
- if I want to calculate the Average gene expression (Average TPM for each gene), can I get average for the same sample across different batches (e.g. S_batch1.batch2_averageTPM)? or I it is better to calculate for each batch separately (e.g. S_batch1_averageTPM; S_batch2_averageTPM)?
- or it is better to get raw count table and calculate TPM, then get average TPM for each gene across each batch separately?
Does this setup even allow removal of any batch effect? You cannot just randomly collect samples from GEO and expect them to be as if you had produced them under the same conditions. You would need replicates of every group you're testing in every of those batches, is that the case?