When I specify library-type
to TopHat, i.e., first-strand, second-strand, unstranded
, TopHat appends a value + or - to the XS:A
flag, which is useful for subsequent analyses, such as annotation.
However, does this information actually influence the "mappability" of reads, or is this unaffected?
My thinking is that the information would be considered for mapping reads to the GTF file if supplied with -G
.
In that dataset, read pairs should be concordant with transcript strand. i,e., if -library-type first-strand
was indicated, and transcript A is at coords. X to Y, on the + strand, MATE 1 of a pair should map to the reverse-complement of the 3' end of TRANSCRIPT A, and MATE 2 of the pair should map to the 5' end, in the same strand as the transcript sequence.
However, if no GTF is supplied with -G,
or in the subsequent stage of mapping reads that didn't map to the transcriptome, now to the whole genome, then TopHat should make no use of library-type
information, right?
--library-type
TopHat will treat the reads as strand specific. Every read alignment will have an XS attribute tag. Consider supplying library type options below to select the correct RNA-seq protocol.
From TH Manual:
Since the splice junction finding algorithm of TopHat makes use of
library-type
information (if provided), one of the two TopHat runs would result in many more splice junctions than the other one. You can then use the library type that gives more junctions. If this is not the case TopHat might not work well with your sequencing protocol. Please let us know more details about your protocol so we can add support for new library types.
So this indicates that the strandedness argument does influence the mapping algorithm. But, HOW does TopHat use library-type information for its splice junction finding algoritm, if it has to be unbiased regarding on which strand actual transcripts exist?
Thanks, ashutosmits! Yes, that is what I meant. However, I can't see the way this would influence the ability to align reads. Does it have to do with the GTF file supplied in case of selecting -G? Otherwise, I don't see how TopHat would make assumptions about what should map to the + or - strand....