I'm trying to infer the the type of experiment for my paired-end RNAseq data. I have employed STAR to index the genome and map the paired reads (as unstranded) to derive the experiment. Depending on the bed file I employ to infer the experiment I get different results. As genome I'm employing the genecode:
GRCm39.primary_assembly.genome.fa
Anyone can guess why are these differences happening?
Method 1 (bed converted from genecode gtf)
Convert gtf to bed
bedparse gtf2bed gencode.vM27.annotation.gtf > gencode.vM27.annotation.gtf.bed
Code
infer_experiment.py -r /local/ref/gencode.vM27.annotation.gtf.bed -i /local/out/star/ADU2124Aligned.sortedByCoord.out.bam
Result:
This is PairEnd Data
Fraction of reads failed to determine: 0.1017
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0080
Fraction of reads explained by "1+-,1-+,2++,2--": 0.8903
Method (bed file provided by rseqc)
Code
infer_experiment.py -r /local/ref/mm10_Gencode_VM18.bed -i /local/out/star/ADU2124Aligned.sortedByCoord.out.bam
Result:
This is PairEnd Data
Fraction of reads failed to determine: 0.0577
Fraction of reads explained by "1++,1--,2+-,2-+": 0.3863
Fraction of reads explained by "1+-,1-+,2++,2--": 0.5560
Thanks!
Don't worry. We have all been there at one time or other :-)
I assume your problem has been solved? If so I can move your comment to an answer which you can accept to provide closure to this thread.