I have a number of libraries prepared with the Illumina Tru-seq strand specific library prep kit. I want to detect sense and antisense transcription of each gene.
I mapped the reads using hisat2 with the --rna-strandedness RF option.
The recommendations I can find for separating the bam files by strand, so that I can essentially have a sense .bam and an antisense .bam, are to use samtools -F 0x10 and -f 0x10, or similar flags using 16 and 32.
Whatever method I take, I get the same general output, but I can't shake this nagging feeling that I am not actually separating them based on sense and antisense transcription but rather some aspect of the library that is sense + something and antisense+ something.
I know that down the line I will need to check into the directionality of each gene, and that just separating by sense and antisense will include genes in the opposite orientation in each file.
But to begin with, how would one go about examining sense and antisense transcription by separating the .bam files?
I know this is not the first post on this, but many posts do not seem to elaborate whether they are interested only in separating the files by strand or sense and antisense transcription, and I don't think they are entirely the same thing.
Hi,
According to the Hisat manual, the option just add the XS-tag to each alignment. To check the sence/antisense mappings, you might want to look at RSeQC infer experiment. There the read-orientation is analysed in regard of the transcripts' orientation.
I agree that there is probably a simpler way to do this rather than splitting the files, but it was reassuring to make sure at each step that what should be happening was happening, by checking in IGV.
I will say that running stringtie separately on the negative and positive reads increases the number of predicted features by A LOT and I don't have any sense of the 'rightness' of this.