I often get fastq files at the beginning of my workflow, but I'm wondering why R1 only has a length of 26 not the same as 98 or 150 in R2.
I get it Cellbarcode + UMI = 26/28 from 10X library prep. However, in a paired-end setting, should the length of R1 be the same as R2?
I guess the R1 is trimmed from the early pipeline, but why? I read the paired-end reads have a higher chance to map to the ref genome, doesn't the trimming induce some power loss?
Could someone provide some insight into this?
Also, is it really someone would set the sequence cycle to 98 instead of 100? Do I miss something here?
Thanks
I will address things that were not directly answered in answer below.
No. There is no requirement that the length of main reads (or for that matter index reads) be identical. You just need to make sure that the total length of all cycles (sequence + index) is not more than the rated capacity of sequencing kit being used.
No not if you set the run up as a 26 bp read 1.
Not necessarily. Paired-end reads give us an anchor on the genome so the insert size can be discerned.
Yes one can. You do not have to use all sequencing cycles provided by a sequencing kit.
Wow, these comments are very insightful and helpful.
Oh, I see. Do you know if this is done in practice by chance?
I agree the insert size is definitely a concern. Here is the article I refer to figure.
This is done deliberately for 10x samples since 10x recommends that the samples be sequenced like this.
Since you are capturing a relatively small fragment in single cell technologies there is no precedent to do paired-end sequencing. So the insert size does not come into play compared to bulk RNAseq.
As a general aside, there are many benefits to paired-end sequencing aside from insert size (which isn't really used in many RNAseq mapping tools).
The benefits to paired-end sequencing are to capture more information which can help with alignment and also helps with transcript isoform resolution. It's better than making single-end read longer because you get more coverage over the length of a gene. We see these benefits with the paired-endness of certain single cell umi protocols like smart-seq3.