Question

scRNA - Illumina naming convention unclear

0

Entering edit mode

1 day ago

kristina • 0

Dear all, maybe someone can help me on this matter:

I have downloaded scRNA data from here: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA984257&o=experiment_s%3Aa%253Bacc_s%253Bacc_s%253Bacc_s%3Bacc_s%3Aa using the SRA toolkit in bulk using the following command (cat SRR_Acc_List.txt | xargs -I{} bin/fastq-dump -I --split-files {})
As one can see, the scRNA data have 3 runs per sample
As each of the RUNs has four reads per spot, of course when downloading the data with the SRA comand - I get four files; so far, all good.

Now, I would like to start the nf-core scRNA pipeline (https://nf-co.re/scrnaseq/3.0.0/docs/usage/), and for this I need to write the sample sheet following a specific naming convention. The Bioproject references that the library used is "paired reads" - and here the confusion starts. How do I write the sample sheet by following the usual Illumina naming convention? My assumption is the following:

S = is always the same for a given sample
L001 for run 1 and L002 for run 2 and L003 for Run 3 --> so the last number changes per spot read?
The main confusion is with the R1 and R2, since I have 3 runs per experiment? I would really appreciate some help in this. Many thanks

scRNA • 110 views

ADD COMMENT • link updated 22 hours ago by GenoMax 148k • written 1 day ago by kristina • 0

score 0 · Answer 1 · 2024-12-21

S refers to the location of a sample in a particular row in the samplesheet used for demultiplexing. It does not have any other significance.

L001 is not for run. It is for the lane the sample file came from. Samples generally run as a pool across multiple lanes so the data would be the identical for a sample even if you have lane specific files. cellranger (and other software) should understand what this means.

For single cell (10x) R1 contains UMI and cell barcodes. R2 is the RNA read.