I am new to Bioinformatics and still learning. I downlaoded TCGA RNA-Seqv2 data from FireHose:
COAD.rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes__data.data.txt
and exact gene count data. Using DESeq2 for DEG analysis and got about 1900+ genes with logFC>=2 and FDR <0.01, tumor vs normal samples. Now I want to check somes DE genes for pairwise correaltion.
Here are my question:
- Is it the data I used right for DEG analysis ?
- Using cor or cor.test for correlation test, which data should I use? For example, m is a matrix with 2 genes like:
>
geneA geneB
sample1 12 101
sample2 13 140
.....
samplen 10 200
I am using cor(m[,1], m[,2])
to get the correlation R, but I don't think raw count data is a good choice as count data of different genes varies a lot and samples get far away from each other in a dotplot. I am think about using FPKM data from TCGA portal to do correlation test and draw dotplot. Am I wrong?
Any advice would be appreciated.