For strand-specific RNA-seq (by UNG treatment), whether read 1 is reverse complementary to mRNA, and read 2 is the same to mRNA?
For strand-specifc RNA-seq data, what is the best way to separate reads, based on which strand the mRNA is from? In another word, how to group all reads that map to genes on plus strand?
I have made a gtf file containing only genes on plus strand (genes_on_plus_chr.gtf), and used "bedtools intersect -a Aligned.sortedByCoord.out.bam -b genes_on_plus_chr.gtf" trying to pool out all shared region. But I found this is not good enough, because I can still find some reads with flag number 147 and 99 (ideally would be only contain 163 and 83). Since read 1 should be reverse complementary to mRNA while read2 is the same to mRNA for stranded RNA-seq, can I pool out all the reads with flag number either 163 or 83, and these should correspond to mRNA from gene on plus strand?
- How I can do the separation on unstranded RNA-seq data, since I don't know the relationship between read1/read2 to the mRNA?
Right, so what is your actual biological goal? I imagine you're trying to get counts or something like that. In this case use featureCounts or htseq-count and call it done, there is rarely a reason to use anything else.
For unstranded data you can't ever discern the strand of the original fragment.
I am not interested in differential expression level. My goal is to identify post-transcriptional RNA modification (A to I). To achieve that, I need to first separate the reads based on where them mapped to. For reads mapped to gene on plus strand, I will look for A to G; for gene on reverse strand, I will look for T to C