I am using publicly available data and trying to infer direction of paired end data. I have aligned filtered fastq files against c. elegans reference genome using Hisat2 and converted the sam to sorted bam using samtools. After that I have used two approaches, In the first approach I have used infer_experiment.py from RSeQC package and got this output:
This is PairEnd Data
Fraction of reads failed to determine: 0.0659
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0055
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9285
It shows that RNA-Seq data is strand specific and strandness of read1 is opposite with that of gene model, while strandness of read2 is consistent to the strand of reference gene model.
I have used these threads to get a detailed insight of output :
http://rseqc.sourceforge.net/#infer-experiment-py
RSeQC Output from infer_experiment.py - what does it mean?
Secondly I indexed the bam file using samtools and loaded in Igv. I have used color alignments by first-of-pair strand option to infer strandness and got this output:
I have got both red and blue color and this link (http://software.broadinstitute.org/software/igv/book/export/html/6) shows that: For a given transcript, non-directional libraries will show a mix of red and blue reads aligning to the locus.
The results of infer_experiment and IGV seems contradictory to me, as infer_experiment shows that it is reverse stranded and IGV shows that it is non-directional.
Can someone please clarify these things in more detail?
Many thanks in advance.
See How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here)