I'm trying to do STAR alignment on 10x data (I tried cellranger but I need a more customizable tool), but I'm a bit confused about the different fastq files and which ones to merge together. All my samples consists of .gz folders which have multiple files, but they come in triplets such as _S1_L001_I1_001.fastq, _S1_L001_R1_001.fastq and *_S1_L001_R2_001.fastq. Now I understand that R1 and R2 probably refer to the Illumina pair-end reads, but what is I1?
More concretely, which files should be given as arguments to --readFilesIn
and in which order? In the manual and some examples I found that R1 and R2 both have to be supplied together, e.g. --readFilesIn *_R1.fastq *_R2.fastq
. If I want to align all the reads, do I loop over this command, taking all R1 and R2 files, but ignore the I1 files?
Thanks, this is a much better explanation than the one on the 10x website! It's tricky that apparently R1 is barcode and R2 is cDNA, so that one needs to provide R2 before R1. Also, my sequence lengths are different from the ones quoted by Dave Tang. The cDNA reads seems to be 92 nt and barcodes 29 nt. I just assumed the longer one (from R2) is the cDNA.