10X #3' RNA seq - How is strandedness determined?
1
0
Entering edit mode
4 months ago
Ashley • 0

Hello! Long time reader, first time poster.

I have used 10X genomics 3' single cell RNA sequencing in a project where our gene of interest also has an antisense transcript. This means we can't confidently specify a read as our sense gene-of-interest or it's antisense transcript. 10X claims their gene expression libraries are strand-specific, however they don't explain how they preserve strandedness. Since there are multiple cycles of cDNA PCR amplification, the original strand information would not be preserved. Yet, the bam files will call each read as + or - (here and here)

scRNA-seq • 838 views
ADD COMMENT
0
Entering edit mode

Please do not delete posts that have received feedback. If the feedback helped solve the problem, vote/respond accordingly. If you solved your problem by yourself, add an answer outlining your steps so others in your position benefit from your effort.

ADD REPLY
1
Entering edit mode
4 months ago
rfran010 ★ 1.3k

I'm not super familiar with 10x library prep or the differences between versions, but based on the protocol here: v3.1 chemistry dual index

enter image description here

You can see that the READ1 adapter is added during the reverse transcription step. This means that it is added specifically to the 3' end of the transcript. This specificity allows retention of strandedness. So during sequencing, the READ1 primer will anneal to the READ1 adapter, therefore all READ1 reads will sequence transcripts from the 3' end.

For example, a library that loses strandedness might be reverse transcribed with random primers, therefore lacking the ability to add an adapter sequence to the end. In these libraries, you would then ligate adapters to the cDNA. Here, if the READ1 adapter is designed to ligate to the 5' end of the cDNA, it may ligate the 5' end of the sense OR antisense cDNA, therefore READ1 would sequence either strand and would not retain strandedness. However, there is adaptations to retain the strand information even when ligating to cDNA, for example destroying one of the strands via dU labeling and enzyme digestion.

ADD COMMENT
0
Entering edit mode

Thank you for the reply! I agree that the Read1 adapter could be used to make a stranded library prep, but when I look closely at the full library prep protocol, I still don't see how they preserve the strand information. There are no clear adaptions to the cDNA synthesis or Tru-Seq steps that allow for directionality preservation

Further, according to 10X, in nuclei prep, half of all intronic reads are "antisense." (31.6% sense exonic 32.2% sense intronic, 4.6% antisense exonic, 31.4% antisense intronic) I don't understand how this would happen if it was truly a stranded library prep. Of all intronic reads, half are sense and half are antisense? Yet, there is a huge difference with the fraction of intronic versus exonic antisense reads. About 15% of exonic reads are antisense, compared to 50% of intronic reads --Why?

To me, this would indicate that it is not a stranded library prep. However they chose to exclude antisense reads from gene counts, so that tells me that believe (1) it is accurately a stranded library prep and (2) these antisense reads are not accurately originating from RNAs (--then where are these 36% of total reads coming from?).

In response to the question if libraries are strand specific, they also state, "[Yes, however] there is a possibility of legitimate antisense transcripts but they are unlikely to follow the same exon structure as the sense transcripts." I am really struggling to understand how exon structure is relevant to a stranded library prep? They also say here "antisense expression is about "1% of mapped reads." (Perhaps specifically in total cells and excluding intronic reads?)

I'm really trying to figure out what I'm missing here.

ADD REPLY
1
Entering edit mode

I agree that the Read1 adapter could be used to make a stranded library prep, but when I look closely at the full library prep protocol, I still don't see how they preserve the strand information. There are no clear adaptions to the cDNA synthesis or Tru-Seq steps that allow for directionality preservation

So they use a different method to add the adapter? What method do they use? To be clear, in the example above, adding the adapter as part of the polyA primer for RT is the only step necessary to preserve strand information. Strand information will be retained in all subsequent steps (as long as the adapter used for the other side is different, which is the case in this example: 2 different adapters).

It can definitely be confusing, so regarding the rest of your post, this mostly refers to real biology. They mention in one of your linked posts that nuclei have increased levels of intronic and antisense reads. This makes a lot of sense becuase nuclei are more enriched for nascent (freshly transcribed) transcripts. So when a gene's transcript is nascent, it will resemble pre-mRNA with intronic regions. But introns will be spliced out and the "stable" mRNA will have only exons, usually exported to the cytoplasm. Since the mature mRNA is generally more stable, it makes up the majority of coding transcripts when considering the whole cell. However, considering just nuclei you expect the nascent transcripts to be more relatively abundant, so you will get more intronic reads.

Antisense transcripts however refers to actual RNA Pol II activity on the opposite strand. This can be an antisense gene, which will have different exons and in effect actually be sense transcription for the other gene. Another occurrence is true antisense RNA Pol II activity where RNA Pol II has the opportunity to bind and transcribe the opposite strand, this could be regulatory or relatively stochastic.

I don't understand how this would happen if it was truly a stranded library prep.

So, a stranded library prep doesn't mean it only retains and sequences sense transcripts, but that it can trace back the orginating strand, and due to biology, actual transcripts are present that are antisense to your reference.

Of all intronic reads, half are sense and half are antisense? Yet, there is a huge difference with the fraction of intronic versus exonic antisense reads. About 15% of exonic reads are antisense, compared to 50% of intronic reads --Why?

The specifics are probably biology dependent, but intron reads more likely reflect nascent transcripts so will be enriched with other nascent transcripts like antisense reads. Exons though are enriched because stable transcripts are captured and so you expect nascent transcripts to be a smaller fraction of exonic reads. To reiterate, by definition, intron reads derive from nascent transcripts which are a small proportion of captured RNA, and so you can expect enrichment of other nascent RNAs. Whereas exonic reads can derive from nascent or mature, and are more likely to derive from mature RNA, espcially due to the polyA capture technique.

It's not clear to me what they mean in the QA by "legitimate antisense transcripts" but I think they may mean the presence of an antisense gene. I also don't know why they mention exon structure since you could ignore exon structure and just count reads from the same strand and exclude antisense reads altogether.

They also say here "antisense expression is about "1% of mapped reads." (Perhaps specifically in total cells and excluding intronic reads?)

The exact percentage will depend on biology, e.g. cell type. But probably not excluding intronic reads. If it is whole cell, then introns will also be a very small percentage, which will vary also depending on biology/cell type.

I didn't read the links in too much detail so sorry if anything is redundant.

Did you have a chance to read the "Proposed mechanisms for presence of intronic and antisense reads in gene expression data" link It might be more helpful than me, but I'm happy to keep trying if you think I can help.

ADD REPLY
0
Entering edit mode

The point is that the “Read1” black bar (in the diagram above) is there before the second strand synthesis. So after PCR, it will still be in the correct orientation (i.e. let’s say its sequence is ATG, you’d have 5’ATG—3’ and 3’TAC—5’ products so there’s still strand info there, e.g. a 5’ATG’3’ primer can only prime from the latter strand so we can do strand-specific priming). You can see how strand-specific priming is used here: https://cdn.10xgenomics.com/image/upload/v1660261286/support-documents/CG000108_AssayConfiguration_SC3v2.pdf

In unstranded protocols, oftentimes, the adapter is ligated after the second strand synthesis (at which point, strand-specificity is already lost: you have two cDNA strands and don’t know what’s forward and what’s reverse).

In any case, if this doesn’t make sense, follow the diagram in the link above to write out some sequences, and you’ll understand why it’s strand-specific (if you draw out the sequences and are not convinced, post the sequences here and I’ll tell you where you’re off).

ADD REPLY

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6