I am new in bioinformatics and I am going to identify antisense in RNAseq, and I did the following steps to my raw rnaseqdata (5 human and 5 mouse tissues): 1- I checked my raw data using FasQC prog. 2- I filtered / cleaned my data using Fasttoolkit. 3- I aligned /mapped my data using tophat 4- I used cufflinks to assembles transcripts, estimates their abundances.
So my question is how to identify the antisense rna in my data
You can detect antisense RNA if your sequencing data is stranded (common protocol is in random direction). If this is the case, you need to activate some options for tophat and cufflinks, check the manuals.
I got the following information about the data: The ribosomal RNA (rRNA) was removed with Ribozero Human/Mouse from Epicentre. The strand-specific RNAseq libraries were prepared with the NEXTflex™ Directional RNA-Seq Kit, dUTP method (Bioo Scientific, Austin, TX). Each library was quantitated by qPCR and sequenced on one lane 101 cycles on a HiSeq2000 using a TruSeq SBS sequencing kit version 3 and analyzed with Casava1.8.2 . So which Tophat library should I use to identify antisense. (Note: I used fr-firststrand library in Tophat because of the information above "dUTP" ). But when I checked Tophat FAQ http://tophat.cbcb.umd.edu/faq.shtml#library_type , I found that they said the library that give more junctions in junctions.bed file in "tophat output" is the best one to use. So, when I tried that unstranded library gave me more junctions.
Then your first steps are correct, now you need to find regions where you expect a gene/transcript in one strand but you got matches in the other strand, you can use Cufflinks to do this with the "-g" option, and "--library-type fr-firststrand". Then just check for "x" class (exonic overlap with reference on the opposite strand) in the *.tracking files.
BTW, if you are getting more junctions/mapping using unstranded mapping, maybe your library isn't stranded at all.
Dear JC, Thanks for helping me, but I didn't understand what do you mean by find regions where you expect a gene/transcript in one strand but you got matches in the other strand, you can use Cufflinks to do this with the "-g" option, and "--library-type fr-firststrand". Then just check for "x" class (exonic overlap with reference on the opposite strand) in the *.tracking files.
Also, is there any way to know if my library is stranded or not.
I mean running Cufflinks with strand aware (--library-type fr-firststrand) and looking for new unannotated transcripts (-g), if a sequence(s) match an exon in the opposite strand, cufflinks classify those as "x" (check cufflinks class codes).
You can verify your sequences looking how much sequences are correctly mapped changing library type in TopHat.
Thanks again JC. I run Tophat and Cufflinks in my analysis as follows:
tophat --library-type fr-firststrand -p 10 -G /disk1/RNAseq/GeneModel/Hs_ensembl_37.gtf -o /disk1/RNAseq1/Alignment/Human_brain /disk1/RNAseq/Human_genome/Ensembl/GRCh37/Bowtie2Index/genome /disk1/TrimmedData/Human_brain_trimmed.fastq &
and Cufflinks as follows:
cufflinks -G /disk1/RNAseq/GeneModel/Hs_ensembl_37.gtf -o TranscriptCount/Human_brain -p 10 Alignment/ Human_brain/accepted_hits.bam &
So are those steps are right or not. Note: I didn't use library in Cufflinks also, I used G instead of g in cufflinks also.
You need to activate library type and de-novo discovery mode in Cufflinks, you are getting only known gene expressions.