Understanding STARsolo --soloStrand values and 10X scRNA-seq library structure
1
0
Entering edit mode
20 days ago
mk ▴ 310

Background:

These github docs for STARsolo specify --soloStrand=Forward for all 3' 10X scRNA-seq and --soloStrand=Reverse for all 5' 10X scRNA-seq libraries.

However, Alex Dobin posted here that for paired end mapping (where we over-sequence the adaptor read), 5' 10X scRNA-seq libraries need --soloStrand=Forward.

Intuitively, since all mRNA is anti-sense, all cDNA is sense, therefore the first read of any library should be anti-sense, aka --soloStrand=Reverse.

Question:

So I'm wondering if anyone has a simple explanation for why --soloStrand=Forward would be used for 3' SE and 5' PE, and --soloStrand=Reverse would be used for 5' PE?

Initial research:

I pulled the following figures from the 10X protocols, but embarrassingly, I'm still not sure how to answer the bolded question above:

enter image description here enter image description here

10X scRNA-seq stranded • 532 views
ADD COMMENT
3
Entering edit mode
20 days ago
dsull ★ 7.0k

R1 contains the barcodes and UMIs (and some cDNA). R2 contains purely cDNA.

In PE, need to set forward strandedness because the stuff after BC+UMI+TSO is 5’->3’ oriented, and you are supplying the files in the order: R1 R2. When using STARsolo in PE mode, you can tell it “yeah, the first input file I gives you contains BC+UMI+stuff after it that I want to align and the second input file I supply contains the mate I also want to try to align”.

However, in SE you need to supply the files in the R2 R1 order because of how STARsolo is structured for SE mode (the first file supplied needs to be the stuff you’re aligning to the genome index and the second file supplied needs to contain the technical sequences). Again, in 10x, R1 contains barcodes and UMIs but, here, you want to ignore that stuff after the BC+UMI+TSO on R1 when aligning (you only want to align R2). Since R2 is on the other strand, you gotta set the Reverse option.

The tl;dr is that STARsolo is structured differently for PE and SE mode, with the major difference being that in SE mode, the first input file supplied MUST be the cDNA you map to the index and the second file supplied MUST be the barcode+UMI file.

More details about 10x 5’ vs. 10 3’:

In 10x 5’ chemistry, it’s the R2 read on the reverse strand so you need to align the reverse complement of that read. Let’s say hello is the sequence we want to align. Let's say 5'-CCGT-3' is the barcode, 5'-GAGA-3' is the UMI, 5'-CC-3' is the TSO, and 5'-TGGAC-3' is the cDNA. Your R1 read looks like:

5’-CCGT-GAGA-CC-TGGAC-3’ ^

whereas your R2 read (if we hypothetically sequence through the insert) looks like:

^ 5’-GTCCA-GG-TCTC-ACGG-3’

For 10x 3’ chemistry, the R1 read looks like:

5’CCGT-GAGA-TTTTTTTT-GTCCA-3’

The R2 read looks like:

5’TGGAC-AAAAAAAA-TCTC-ACGG-3’

This is simply because the primer (the string of T’s) starts on the “left” side of the cDNA (in 10x 3’ chemistry) whereas normally (i.e. in the 10x 5’ chemistry), the primer starts on the “right” side of the cDNA of you were to draw out the stuff I wrote above. You can think of it as: the primer works such that the 5'-cDNA-3' (i.e. 5'-TGGAC-3') MUST be succeeded by a polyA tail (AAAAAAAA). It is straightforward to see where the polyA is in the 3' chemistry (because the RT "string of T's" primer is part of the fragment) but I put ^ above to denote where the priming would occur in the 10x 5' chemistry. It's a bit unintuitive because of how the RT followed by template-switching works, but happy to answer any questions.

ADD COMMENT
0
Entering edit mode

I just edited my post above to make it more intuitive+coherent (it was 3AM in Los Angeles last night when I wrote my initial post).

ADD REPLY
0
Entering edit mode

Thanks for the thoughtful answer, dsull . This question is going to sound really dumb, but just to confirm, regardless of what is entered for --soloStrand or any other parameters, the STARsolo output BAM will always give the reference sequence for the alignment (corresponding to the 5'->3' on the plus reference strand)?

ADD REPLY
1
Entering edit mode

That is correct.

ADD REPLY

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6