Why does R1 read only have length of 26 with 10X library prepration?
1
1
Entering edit mode
16 months ago
yuw926 ▴ 10

I often get fastq files at the beginning of my workflow, but I'm wondering why R1 only has a length of 26 not the same as 98 or 150 in R2.

I get it Cellbarcode + UMI = 26/28 from 10X library prep. However, in a paired-end setting, should the length of R1 be the same as R2?

I guess the R1 is trimmed from the early pipeline, but why? I read the paired-end reads have a higher chance to map to the ref genome, doesn't the trimming induce some power loss?

Could someone provide some insight into this?

Also, is it really someone would set the sequence cycle to 98 instead of 100? Do I miss something here?

Thanks

Illumina 10X • 1.6k views
ADD COMMENT
1
Entering edit mode

I will address things that were not directly answered in answer below.

However, in a paired-end setting, should the length of R1 be the same as R2?

No. There is no requirement that the length of main reads (or for that matter index reads) be identical. You just need to make sure that the total length of all cycles (sequence + index) is not more than the rated capacity of sequencing kit being used.

I guess the R1 is trimmed from the early pipeline,

No not if you set the run up as a 26 bp read 1.

I read the paired-end reads have a higher chance to map to the ref genome, doesn't the trimming induce some power loss?

Not necessarily. Paired-end reads give us an anchor on the genome so the insert size can be discerned.

is it really someone would set the sequence cycle to 98 instead of 100?

Yes one can. You do not have to use all sequencing cycles provided by a sequencing kit.

ADD REPLY
0
Entering edit mode

Wow, these comments are very insightful and helpful.

No not if you set the run up as a 26 bp read 1.

Oh, I see. Do you know if this is done in practice by chance?

Not necessarily. Paired-end reads give us an anchor on the genome so the insert size can be discerned.

I agree the insert size is definitely a concern. Here is the article I refer to figure.

ADD REPLY
1
Entering edit mode

Do you know if this is done in practice by chance?

This is done deliberately for 10x samples since 10x recommends that the samples be sequenced like this.

Since you are capturing a relatively small fragment in single cell technologies there is no precedent to do paired-end sequencing. So the insert size does not come into play compared to bulk RNAseq.

ADD REPLY
1
Entering edit mode

As a general aside, there are many benefits to paired-end sequencing aside from insert size (which isn't really used in many RNAseq mapping tools).

The benefits to paired-end sequencing are to capture more information which can help with alignment and also helps with transcript isoform resolution. It's better than making single-end read longer because you get more coverage over the length of a gene. We see these benefits with the paired-endness of certain single cell umi protocols like smart-seq3.

ADD REPLY
2
Entering edit mode
16 months ago

R1 is only the cell barcode + UMI, as you state. There is no genomic information in there. That is present in R2. So you can only map single ended reads with this library design.

ADD COMMENT

Login before adding your answer.

Traffic: 1589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6