Question

How To Identify Antisense In Rna Seq Analysis?

1

Entering edit mode

10.6 years ago

M K ▴ 660

I am new in bioinformatics and I am going to identify antisense in RNAseq, and I did the following steps to my raw rnaseqdata (5 human and 5 mouse tissues): 1- I checked my raw data using FasQC prog. 2- I filtered / cleaned my data using Fasttoolkit. 3- I aligned /mapped my data using tophat 4- I used cufflinks to assembles transcripts, estimates their abundances.

So my question is how to identify the antisense rna in my data

• 7.2k views

ADD COMMENT • link updated 10.6 years ago by Charles Warden 8.3k • written 10.6 years ago by M K ▴ 660

0

Entering edit mode

You can detect antisense RNA if your sequencing data is stranded (common protocol is in random direction). If this is the case, you need to activate some options for tophat and cufflinks, check the manuals.

ADD REPLY • link 10.6 years ago by JC 13k

0

Entering edit mode

I got the following information about the data: The ribosomal RNA (rRNA) was removed with Ribozero Human/Mouse from Epicentre. The strand-specific RNAseq libraries were prepared with the NEXTflex™ Directional RNA-Seq Kit, dUTP method (Bioo Scientific, Austin, TX). Each library was quantitated by qPCR and sequenced on one lane 101 cycles on a HiSeq2000 using a TruSeq SBS sequencing kit version 3 and analyzed with Casava1.8.2 . So which Tophat library should I use to identify antisense. (Note: I used fr-firststrand library in Tophat because of the information above "dUTP" ). But when I checked Tophat FAQ http://tophat.cbcb.umd.edu/faq.shtml#library_type , I found that they said the library that give more junctions in junctions.bed file in "tophat output" is the best one to use. So, when I tried that unstranded library gave me more junctions.

ADD REPLY • link 10.6 years ago by M K ▴ 660

1

Entering edit mode

Then your first steps are correct, now you need to find regions where you expect a gene/transcript in one strand but you got matches in the other strand, you can use Cufflinks to do this with the "-g" option, and "--library-type fr-firststrand". Then just check for "x" class (exonic overlap with reference on the opposite strand) in the *.tracking files.

BTW, if you are getting more junctions/mapping using unstranded mapping, maybe your library isn't stranded at all.

ADD REPLY • link 10.6 years ago by JC 13k

0

Entering edit mode

Dear JC, Thanks for helping me, but I didn't understand what do you mean by find regions where you expect a gene/transcript in one strand but you got matches in the other strand, you can use Cufflinks to do this with the "-g" option, and "--library-type fr-firststrand". Then just check for "x" class (exonic overlap with reference on the opposite strand) in the *.tracking files.

Also, is there any way to know if my library is stranded or not.

ADD REPLY • link 10.6 years ago by M K ▴ 660

0

Entering edit mode

I mean running Cufflinks with strand aware (--library-type fr-firststrand) and looking for new unannotated transcripts (-g), if a sequence(s) match an exon in the opposite strand, cufflinks classify those as "x" (check cufflinks class codes).

You can verify your sequences looking how much sequences are correctly mapped changing library type in TopHat.

ADD REPLY • link 10.6 years ago by JC 13k

0

Entering edit mode

Thanks again JC. I run Tophat and Cufflinks in my analysis as follows:

tophat --library-type fr-firststrand -p 10 -G /disk1/RNAseq/GeneModel/Hs_ensembl_37.gtf -o /disk1/RNAseq1/Alignment/Human_brain /disk1/RNAseq/Human_genome/Ensembl/GRCh37/Bowtie2Index/genome /disk1/TrimmedData/Human_brain_trimmed.fastq &

and Cufflinks as follows:

cufflinks -G /disk1/RNAseq/GeneModel/Hs_ensembl_37.gtf -o TranscriptCount/Human_brain -p 10 Alignment/ Human_brain/accepted_hits.bam &

So are those steps are right or not. Note: I didn't use library in Cufflinks also, I used G instead of g in cufflinks also.

ADD REPLY • link 10.6 years ago by M K ▴ 660

0

Entering edit mode

You need to activate library type and de-novo discovery mode in Cufflinks, you are getting only known gene expressions.

ADD REPLY • link 10.6 years ago by JC 13k

score 0 · Answer 1 · 2014-04-07

0

Entering edit mode

10.6 years ago

Charles Warden 8.3k

To echo what JC said, this depends upon your library. Unless you used a special library, the strand cannot be determined. Here are examples of such protocols listed in the relevant TopHat parameter):

fr-unstranded: Standard Illumina (default setting: can't determine strand)

fr-firststrand: dUTP, NSR, NNSR

fr-secondstrand: Ligation, Standard SOLiD

I think dUTP is one of the most common strategies that preserves strand information

Link: http://tophat.cbcb.umd.edu/manual.shtml

ADD COMMENT • link 10.6 years ago by Charles Warden 8.3k

0

Entering edit mode

I got the following information about the data: The ribosomal RNA (rRNA) was removed with Ribozero Human/Mouse from Epicentre. The strand-specific RNAseq libraries were prepared with the NEXTflex™ Directional RNA-Seq Kit, dUTP method (Bioo Scientific, Austin, TX). Each library was quantitated by qPCR and sequenced on one lane 101 cycles on a HiSeq2000 using a TruSeq SBS sequencing kit version 3 and analyzed with Casava1.8.2 . So which Tophat library should I use to identify antisense. (Note: I used fr-firststrand library in Tophat because of the information above "dUTP" ). But when I checked Tophat FAQ http://tophat.cbcb.umd.edu/faq.shtml#library_type , I found that they said the library that give more junctions in junctions.bed file in "tophat output" is the best one to use. So, when I tried that unstranded library gave me more junctions.

ADD REPLY • link 10.6 years ago by M K ▴ 660