Question

In what way should cellranger count be performed , when there are multiple runs for each of the samples ?

0

Entering edit mode

3.2 years ago

WhatAmIDoing • 0

I am trying to analyze single-cell RNA-seq data using cellranger, from publicly available datasets/ studies which have been published. I am getting SRA files from NCBI GEO - the data set currently under consideration is GSE168388 . When I look into this data, I can see that this dataset consists of 5 samples - Healthy donor 1 , Healthy donor 2 , Covid 1, Covid 2 and Covid 3. Within each of these samples there are 4 SRA for 4 runs. So a total of 20 SRA files (5 samples * 4 runs each ) are present.

Now I understand from the cellranger website + tutorials that for every sample, cellranger count needs to be run separately, and then ultimately, the cellranger count outputs can be merged using cellranger aggr. Coming to my problem area -- when I am considering one sample, say for example, Healthy donor 1 - there are 4 SRA files ( for each of the runs) : SRR13870501, SRR13870500, SRR13870499 and SRR13870498 - and therefore I have 4 sets of fastq files for this sample. Where my confusion lies --- when I want to run cellranger count for the Healthy donor 1 sample, do I have to run cellranger count 4 times, for each of the runs , Or can I run cellranger count once instead, specifying all of the above 4 sets of fastq files in the sample id argument of cellranger count command (as shown below) .

So basically, can my command cellranger count command look something like this or not ? cellranger count --id=healthy_donor1 \ --fastqs=/path/to/fastqs \ --sample=SRR13870498,SRR13870499,SRR13870500,SRR13870501 \ --transcriptome=path/to/refdata-gex-GRCh38-2020-A

I would really appreciate any/all help on this matter , cheers :)

count cellranger RNA-seq single-cell • 1.1k views

ADD COMMENT • link updated 3.2 years ago by swbarnes2 14k • written 3.2 years ago by WhatAmIDoing • 0

score 0 · Answer 1 · 2021-09-27

If you did one library prep and ran that one library prep multiple times, it is probably easiest to combine the fastqs (make sure to give them a name that perfectly matches the expected Ilumina naming scheme) and give that to cellranger. I am not at all sure that cellranger will willingly combine for different "sample names" into one. But it hardly hurts to try!