Hello everyone,
This is my first analysis of RNA-seq data. I am using the TCGAbiolinks package. Initially, I am using the "TCGA-BRCA" project and I am using samples of healthy tissue and primary tumors.
I am downloading the data in HTSeq-FPKM-UQ, which are being stored in the variable "my_data". After downloading the data, I assign the corresponding groups. The TP vector stores the IDs of patients with a primary tumor, and the NT vector stores the IDs of normal patients.
My question is whether the following steps are adequate:
dataPrep <- TCGAanalyze_Preprocessing(object = my_data, cor.cut = 0.6)
dataFilt <- TCGAanalyze_Filtering(tabDF = dataPrep,
method = "quantile",
qnt.cut = 0.25)
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,dataSmNT],
mat2 = dataFilt[,dataSmTP],
Cond1type = "Normal",
Cond2type = "Tumor",
fdr.cut = 0.01 ,
logFC.cut = 1,
method = "glmLRT")
After these commands, I have an output containing the logFC, p-value, FDR, and other values. I ask this question because I am not performing data normalization, as I am using the "HTSeq-FPKM-UQ" table, as I read that:
Fragments Per Kilobase of transcript per Million mapped reads upper quartile (FPKM-UQ) is a RNA-Seq-based expression normalization method. The FPKM-UQ is based on a modified version of the FPKM normalization method.
In addition, I would like to confirm that upregulated transcripts (FC greater than 1) are increased in the CTRL, applying this approach, right?
Thanks in advance!