Question

Stranded cDNA libraries and genome mapping

0

Entering edit mode

4 months ago

BioStar55555 ▴ 10

I am trying to wrap my head around these concepts but I can't seem to understand, probably there's a mistake in my reasoning process.

We are interested in identifying separate sense and antisense transcripts. Let's assume we are looking at a specific gene with a mono-exonic transcript and no introns, and that this transcript doesn't get polyAdenylated. Let's assume to have a stranded cDNA library. So, we know the direction of the original cDNA (the one synthesized starting from mRNA).

When performing mapping, the cDNA will be aligned to the template strand, not to the coding strand. However, the gene is annotated on the coding one, so the reads will result to be antisense with respect to the gene. But this is wrong, because the cDNA is from that gene.

A practical example:

DNA

5' - ACTGTGGATTC - 3' CODING STRAND for the gene
3' - TGACACCTAAG - 5' TEMPLATE STRAND for the gene

Transcript

5' - ACUGUGGAUUC - 3'

cDNA + PCR primers (double stranded, the second strand is the one marked with dU to prevent PCR from amplifying it)

5' - 5'PR - GAATCCACAGT - 3'PR - 3'
3' - 3'PR - CUUAGGUGUCA - 5'PR - 5'

So, after PCR, the library will look like this

5' - 5'PR - GAATCCACAGT - 3'PR - 3'
3' - c5'PR - CTTAGGTGTCA - c3'PR - 5'

With the 5PR allowing to identify the directionality of the original cDNA. So, in the end, after primer trimming, the reads (5'->3') will all look like this

GAATCCACAGT
or
CTTAGGTGTCA

When mapping, they will clearly be aligned to the gene's template strand, despite coming from the gene's mRNA that's annotated on the other strand.

What am I missing here? In what point is my reasoning wrong?

cDNA sequence strand • 371 views

ADD COMMENT • link updated 4 months ago by rfran010 ★ 1.3k • written 4 months ago by BioStar55555 ▴ 10

score 0 · Answer 1 · 2024-07-05

I will admit I'm a little lost in your example, but it looks like a "reversely stranded" library prep.

One way to generate these types is to destroy the "sense" or "coding" strand with dU marking. This then leaves the antisense or template cDNA only. In sequencing, this will then equate to Read1 giving the antisense sequence only (since 5' primer is only ligated to the 5' end of these fragments) and if paired, read2 will give the sense sequence only.

This is in comparison to non-stranded library where Read1 can be on the sense or antisense fragment.

So, since in your stranded library you know all Read1 should be antisense, you can make the proper adjustments to your data. For example during quantification, aligners will mark the alignment matching the forward or reverse reference strand. So if you had a gene with coding sequence on the forward strand, and you wanted to know how many reads map to the gene, you would count the number of Read1s that map the the reverse strand in that region and the number of Read2s that map to the forward strand in that region.

So, I don't think your reasoning is wrong, but maybe misinterpreting the downstream steps?