Hi All,
I am a little confused about the processing manner of 10x single-cell multiomics sequencing files from one donor. For example, this ENCODE project (ENCSR889JIE) contains two complete sets of sequencing files from one donor. Taking scRNA-seq (check the "File details" tab) as an example, it contains S31 and S7:
S31:
linlab2_041122_snRNA-CGCGGTAGGT-CAACATCCTG_S31_L001_R1_001.fastq.gz
linlab2_041122_snRNA-CGCGGTAGGT-CAACATCCTG_S31_L001_R2_001.fastq.gz
linlab2_041122_snRNA-CGCGGTAGGT-CAACATCCTG_S31_L002_R1_001.fastq.gz
linlab2_041122_snRNA-CGCGGTAGGT-CAACATCCTG_S31_L002_R2_001.fastq.gz
S7:
linlab2_041122_snRNA-CGCGGTAGGT-CAACATCCTG_S7_L001_R1_001.fastq.gz
linlab2_041122_snRNA-CGCGGTAGGT-CAACATCCTG_S7_L001_R2_001.fastq.gz
linlab2_041122_snRNA-CGCGGTAGGT-CAACATCCTG_S7_L002_R1_001.fastq.gz
linlab2_041122_snRNA-CGCGGTAGGT-CAACATCCTG_S7_L002_R2_001.fastq.gz
The question is whether I should:
- Process them separately, generating two outputs (e.g. run
Cell Ranger ARC
twice, once on the S7 FASTQs and once on the S31 FASTQs), or - Combine (“merge”) them into a single run so that I end up with one set of output for this donor.
I know how to set up libraries.csv
when there’s only a single set of FASTQs (e.g., ENCSR000ULP with separate folders for RNA and ATAC). But in this case, if I want to merge S7 and S31, how do I structure my folders and/or modify libraries.csv
so that Cell Ranger ARC
knows it’s all one sample but from multiple lanes?
Below is my usual command for a single set:
cellranger-arc count --id=${project_name} \
--reference=${reference_dir}/refdata-cellranger-arc-GRCh38-2024-A \
--libraries=${work_dir}/libraries.csv \
--localcores=24 \
--localmem=180
And a typical libraries.csv
:
fastqs,sample,library_type
/ENCSR000ULP/RNA,linlab2_041122_snRNA-CGCGCACTTA-AGAATACAGG,Gene Expression
/ENCSR000ULP/ATAC,linlab2_041122_scATAC-AATCACTA-CCGAGAAC-GTAGTGCG-TGCTCTGT,Chromatin Accessibility
For the sample column, it is the string before the S index.
Now let's go back to the ENCSR889JIE which contains two sets of sequencing files (two S indices). If I want to "merge" and process two sets of sequencing files together, what should I modify:
- Do I just put both S7 and S31 FASTQs into the same RNA (or ATAC) directory and libraries.csv file would have two rows pointing to RNA and ATAC, respectively?
- Or do I need two rows (S7 and S31) for RNA and another two rows (S7 and S31) for ATAC in the libraries.csv?
- Or is there any recommended best practice for multiple S indices from the same donor?
Thank you very much!
It is a little confusing since the same sample can't be included in two rows when demultiplexing the data using
bcl-convert
(which leads to theS*
based on the sample location in rows of samplesheet). So not sure why there are twoS*
for one index pair. Only explanation would be it is a technical sequencing replicate where the sample ran on two flowcells.Hi GenoMax , thank you very much! So, if this is just technical replicates, does it mean I can reasoablely "merge" S7 and S31 and them process them together. To be specific, just put S7 and S31 RNA sequencing files in the same RNA folder, and so does the ATAC seqeuencing files, and build a libraries.csv like below which ignore the S indices
And the cellranger arc will have only one output?
Or do you mean that this kind of technical replicates sounds like something wrong and I should consult with the experimentalists to drop one replicates?
Thank you very much!