Can somebody explain to me why StringTie calls fr-firststrand to a library preparation that is prepared by dUTP protocol? dUTP protocol keeps the reverse strand of the amplified cDNA, right? So if it is the reverse strand, wouldn't it make more sense to call it secondstrand?
Maybe the answer is "because thats the way they wanted to name it", but maybe I am missing something and if so, I would like to know.
I am a computer scientist and I am trying to get the biology over this. Sorry in advance if this is basic wet-lab terminology I'm not aware of.
It's just matter of referential, if you use the mRNA strand as referential or the first strand synthetised.
To understand what is called first strand have a look to the figure here or here for an up-to-date version of it.
StringTie uses this nomenclature due to historical reasons: it inherited this convention from TopHat. TopHat had a reasoning, which is explained at its manual:
fr-unstranded Standard Illumina Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
fr-firststrand dUTP, NSR, NNSR Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
fr-secondstrand Ligation, Standard SOLiD Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
P.S.: as both tools are from the same group and share a number of authors, one could argue that the reasoning for this nomenclature is the same for both tools.
Your figures are very helpful, thank you.