Hi guys,
it is the first time I have to deal with a paired-end single cell RNA-Seq experiment.
As output of demultiplexing I have (relative to one sample for simplicity) the following files:
My question is: since I have to cat the files may I have to cat the files by lanes (e.g. cat SampleA_L001_R1_001.fastq.gz SampleA_L002_R1_001.fastq.gz > ....) or by R* (e.g. cat SampleA_L001_R1_001.fastq.gz SampleA_L001_R2_001.fastq.gz > ....). I think that is irrelevant. I have to perform --count with Cellranger for 10x v3. I know that there are other tools by in my lab people prefer Cellranger.
R1 is the technical read (barcodes, UMI), and R2 the cDNA, therefore you have to cat the R1s and R2s separately, not like
cat R1 R2 > catted. CellRanger is smart though and can take lane replicates afaik, so you only have to cat if you use software other than CellRanger.
You should, if you need to for Cellranger, cat them to end up with one file for R1 and one file for R2 and thus do two cat commands, one for each read direction.
Cellranger is very picky about naming format. You do not have to cat the files, and if you do such that the name no longer has exactly the same format, cellranger will not work.
Catting R1 and R2 together is almost never a good idea. This is not an exception, do not cat them together.
So basically the cat will be done by lanes independently for R1 and R2. Thank you very much!!!!!