Entering edit mode
4.1 years ago
Rob
▴
170
Hello friends, I want to download HT-seq data from TCGA biolink. How can I download only coding genes? what should I add to my code? this is the code I am using:
library(TCGAbiolinks)
library(SummarizedExperiment)
BiocManager::install("BioinformaticsFMRP/TCGAbiolinks")
CancerProject <- "TCGA-KIRC"
query <- GDCquery(project = CancerProject,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
sample.type = c("Primary Tumor"),
workflow.type = "HTSeq - Counts")
#download raw counts for DESEq2
GDCdownload(query)
data <- GDCprepare(query, save = TRUE, save.filename = "exp.rda")
rna <- as.data.frame(SummarizedExperiment::assay(data)) # exp matrix
write.csv(rna, "rna.csv")
Dear rhasanvandj , As far as I know, there is no code for doing this at the download step. You need to download data and perform your analysis. Then you can select those genes you are interested in (here coding gene).
Having a list of genes you can retrieve data on their Biotype (including coding and non-coding and ...) from Ensembl by
biomaRt
package.Thank you so much dear Hamid