I am trying to wrap my head around these concepts but I can't seem to understand, probably there's a mistake in my reasoning process.
We are interested in identifying separate sense and antisense transcripts. Let's assume we are looking at a specific gene with a mono-exonic transcript and no introns, and that this transcript doesn't get polyAdenylated. Let's assume to have a stranded cDNA library. So, we know the direction of the original cDNA (the one synthesized starting from mRNA).
When performing mapping, the cDNA will be aligned to the template strand, not to the coding strand. However, the gene is annotated on the coding one, so the reads will result to be antisense with respect to the gene. But this is wrong, because the cDNA is from that gene.
A practical example:
DNA
5' - ACTGTGGATTC - 3' CODING STRAND for the gene
3' - TGACACCTAAG - 5' TEMPLATE STRAND for the gene
Transcript
5' - ACUGUGGAUUC - 3'
cDNA + PCR primers (double stranded, the second strand is the one marked with dU to prevent PCR from amplifying it)
5' - 5'PR - GAATCCACAGT - 3'PR - 3'
3' - 3'PR - CUUAGGUGUCA - 5'PR - 5'
So, after PCR, the library will look like this
5' - 5'PR - GAATCCACAGT - 3'PR - 3'
3' - c5'PR - CTTAGGTGTCA - c3'PR - 5'
With the 5PR allowing to identify the directionality of the original cDNA. So, in the end, after primer trimming, the reads (5'->3') will all look like this
GAATCCACAGT
or
CTTAGGTGTCA
When mapping, they will clearly be aligned to the gene's template strand, despite coming from the gene's mRNA that's annotated on the other strand.
What am I missing here? In what point is my reasoning wrong?