Hi,
I am running Ubuntu 20.04 LTS. Currently on a slower Macbook Air, but recently ordered this server: HP Proliant DL360p G8 8 Bays 2.5 Server - 2X Intel Xeon E5-2680 2.7GHz 8 Core - 16GB DDR3 REG Memory - HP P420i 512MB Raid Controller - 2.4TB (4X 600GB 10K SAS SED New HDD) - 2X 750w PSU (Renewed) to work faster.
so I'm just starting out using Monocle 3. I want to eventually be able to use all the tools that are available efficiently, however I am starting with Monocle 3 because of the option to do pseudotime trajectory analysis.
I want to recreate the finding on the original Monocle3 paper: "The single-cell transcriptional landscape of mammalian organogenesis." Specifically the "Resolving cellular trajectories in myogenesis" figure.
I was going to be beginning with fastq files that through some real struggle I figured out how to download in bulk through the 'awk' command.
However, I was told by a mentor that working with expression matrix files would make my life easier.
So my questions are on NIH NCBI GEO Accession page is the "series_matrix.txt.gz" file also known as the expression matrix file?
In the 'loading the data step' in Getting started in Moncole3 on the Monocle3 page
# Load the data
expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_expression.rds"))
cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_colData.rds"))
gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_rowData.rds"))
Would I first download these series matrix file:
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE119nnn/GSE119945/matrix/GSE119945_series_matrix.txt.gz
then would cell_annotation be cell_metadata?:
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE119nnn/GSE119945/suppl/GSE119945%5Fcell%5Fannotate%2Ecsv%2Egz
and lastly (this one is more obvious, I think) gene_annotation would be gene_annotate:
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE119nnn/GSE119945/suppl/GSE119945%5Fgene%5Fannotate%2Ecsv%2Egz
so I would download these files through wget, then extract them through
gzip -d filename
and then feed their directories into?
expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_expression.rds"))
cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_colData.rds"))
gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_rowData.rds"))
and then I would continue the steps of getting started on the Monocle 3 page.
Could someone share what you do, when you're getting started with analyzing data with Monocle 3 without 10x genomic data, please?
Very Respectfully, Pratik