Hello everyone, I am going to do the differential gene expression (DEG) analysis in the bulk RNA seq data. The sample used are the NAFLD samples downloaded from the NCBI Gene Expression Omnibus (GEO) (link to the dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135251). When I attempted to download the datasets, I realized that there are so many Count Matrix provided (see the attached photo). Regarding this, I have several questions:
1) May I ask if it is normal to have so many count matrices there?
2) If Yes, which count matrix should I use for downstream DEG analysis by DESeq2? Or should I use all the count matrix to do the analysis?
Each file contains one column with the counts for that sample. You can load that all into R and combine into a single matrix of raw counts. For this, download via Select All, that will give a tarball (.tar). Unpack that tarball with tar xf that.tar. Then use this snipped in R:
# list all files from the tarball (unpack tarball in bash with tar xf tarball.tar)
listed <- list.files("/Users/atpoint/Downloads/data/", pattern="^GSM", full.names=TRUE)
listed <- grep("txt.gz$", listed, value=TRUE)
# load every single file
raw.counts <- lapply(listed, function(x){
r <- read.delim(x, header=FALSE, row.names=1)
colnames(r) <- gsub("\\.counts.*", "", basename(listed[1]))
r
})
# combine
raw.counts <- do.call(cbind, raw.counts)
raw.counts[1:3,1:3]
raw.counts[1:3,1:3]
GSM3998167_017-Ann-Daly_S1 GSM3998167_017-Ann-Daly_S1.1 GSM3998167_017-Ann-Daly_S1.2
ENSG00000000003 2565 2400 2391
ENSG00000000005 0 14 0
ENSG00000000419 605 525 709
This you can then use for DE analysis via DESeq2/edgeR/limma...
I guess that each counts.txt.gz is just 1 sample.
So you will find a total of 216 counts.txt.gzs.
Each counts.txt.gz may be used as one column in your count matrix for the downstream DE analysis.
Am I right?
It was a great way of extracting count matrix for RNA-Seq.
thanks
it's custom to this dataset, cannot be generally applied since GEO is not uniform in terms of what is supplied in the supplementary files