Mean_conditions <- rowMeans(count_data_normalized2[,c('Condition1', 'Condition2','Condition3')], na.rm=TRUE)
Mean_Treatment <- rowMeans(count_data_normalized2[,c('Treatment1', 'Treatment2','Treatment3')], na.rm=TRUE)
df2<-as.data.frame(Mean_conditions)
df1<- as.data.frame(Mean_Treatment)
t.test(df1, df2, var.equal = TRUE)
Welch Two Sample t-test
data: Mean_Treatment and Mean_conditions t = 0.35161, df = 24478, p-value = 0.7251 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -162.0191 232.8535 sample estimates: mean of x mean of y 1244.836 1209.419
Is the t-test is implemented correctly? Does that mean that there is no significantly differences between the samples normal and cancer?
I have >20,000 genes (rows) and 6 columns (3 cancer replicates and 3 normal replicates). Is it a good idea to perform t-test to know if there is a significant differences between the samples normal and cancer?
No. It's not a good idea to use t-test for RNASeq. Use DESeq2 or EdgeR.
and how to know if the samples are significantly different? by which test or plot?
There is no bullet-proof metric to say with certainty that something is "different". You can combine several diagnostics to build up your narrative though. Typically you would start with some sort of dimensionality reduction such as PCA to see whether your samples separate in reduced dimensional space or whether there are any sorts of confounders that need to be adjusted for. Confounders can be batch effects, or something anything associated wth individuals such as dietary status, age, disease prevalence, environmental exposure, drug consumptions and combinations of those -- given that you have any additional metadata for your cohort. You can then test for differential expression (adjusting for unwanted variation if necessary, see also SVAseq and RUVSeq), checking how many DEGs you get. Do heatmaps and clustering to infer patterns between DEGs and sample groups. Then see whether DEGs or signatures you get from clustering are enriched for terms (REACTOME, KEGG, KO...). Many other sorts of analysis possible, but start with these basics. You build your narrative on this.
Put that all together and interpret it biologically. See what literature already knows on your disease entity, then integrate that with the things you learned from above analysis.
There is not going to be that one single test that, if p<0.05, says "yes, it's different, here is my Nature paper". That's just not how biology and science works.
For starters on the technical parts: Basic normalization, batch correction and visualization of RNA-seq data and many other tutorials here and online. Please go through it, that's the basics of analysis with high-throughput data. Don't skip literature research, that's essential to separate from your results what is known already, novel, and likely just noise.
Thanks a lot... appreciated