Question

Detect circular RNA using RNA-seq

8

Entering edit mode

10.3 years ago

zxoabc ▴ 110

I'm new in this field. I want to identify circular RNAs from RNA-seq. I want to know which tools I could choose to detect circular RNAs using some simple commands. Thanks!

RNA-Seq • 14k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.3 years ago by zxoabc ▴ 110

9

Entering edit mode

10.3 years ago

David Langenberger 11k

You could try segemehl:

Create index:

./segemehl.x -d hg19.fa -x hg18.idx

Map you reads:

./segemehl.x -d hg19.fa -i chr1.idx -q reads.fastq -S | samtools view -bS - | samtools sort -o - deleteme | samtools view -h - > mapped.sam

Call different splice junctions:

./testrealign.x -d hg19.fa -q mapped.sam -n

You will end up with three different BED files: normal.bed, trans.bed and circular.bed. The last holds the back-splicings (circularized RNAs).

If you don't understand the output format, you can find more information in the publication

Hoffmann S, Otto C, Doose G, Tanzer A, Langenberger D, Christ S, Kunz M, Holdt L, Teupser D, Hackermüeller J, Stadler PF: 'A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and fusion detection', Genome Biology, 15:R34, doi:10.1186/gb-2014-15-2-r34 (2014)

and in the manual. These sources will also explain you how the algorithm works.

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 10.3 years ago by David Langenberger 11k

1

Entering edit mode

I have tried segemehl, but it consumed two much computation memory to build the genome index. Could you have some suggestions to solve it? Thanks!

ADD REPLY • link 10.3 years ago by zxoabc ▴ 110

1

Entering edit mode

Well, the only solution is to use a machine with at least 60GB of main memory. That's the main drawback of segemehl.

Its results are pretty good, but if you don't have a good machine, you should go for other tools. The overlap of all tools is very high anyways. All of them will find the most over-represented circularized RNAs.

If you are interested, here is a comparison of the segemehl with other tools. But keep in mind that is from the segemehl publication!

Performance of various read aligners on simulated data sets with different splice events. For simulated 454 reads (400 bp), segemehl performed significantly better in detecting conventional and 'non-conventional' (strand-reversing, long-range) splice junctions. segemehl was the only tool that consistently recalled more than 90% of conventional splice junctions. For 'non-conventional' splice events, segemehl extended its lead to 40% for recall without losing precision. Likewise, compared to three of the seven alternative tools, segemehl had a 30% increase in recall for irregularly spliced Illumina reads (100 bp). Compared to TopHat2, it had a slight increase while reporting significantly fewer false positives. At the same time, segemehl's performance with simulated, regularly spliced Illumina reads was comparable with the other seven tools tested. gs, GSNAP; ms, MapSplice; ru, RUM; se, segemehl; sm, SpliceMap; so, SOAPsplice; st, STAR; to, TopHat2.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 10.3 years ago by David Langenberger 11k

1

Entering edit mode

Thanks for your prompt reply. For a normal server, 60GB memory is affordable. but if I want to run segemehl on many samples simultaneously, segemehl seem to be not efficient and convenient.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by zxoabc ▴ 110

0

Entering edit mode

Well, 60gb is pretty much!

ADD REPLY • link 10.3 years ago by Lars ★ 1.1k

1

Entering edit mode

Another concern about segemehl is that it only offers the position of back-splicing junctions, but does not provide more other information (such as their host gene, the number of back-splicing exons). At this point, CIRCexplorer did better.

ADD REPLY • link 10.2 years ago by zxoabc ▴ 110

0

Entering edit mode

Just to keep it clear here:

segemehl is a mapping algorithm, that allows split-read mapping (including back-splicing events). segemehl is not a circularized RNA detection tool, like CIRCexplorer.

Just to assure that we do not compare apples with oranges here. CIRCexplorer offers a lot of downstream information, which nobody would expect from a mapping tool like segemehl. :) Talking about the mapping: segemehl might be a bit more sensitive in finding back-splice junctions, but as I mentioned before the overlap of the results between the different mapping algorithms (which are able to detect back-splicing) is pretty high anyway.

If you are not a bioinformatician, CIRCexplorer might be the tool of your choice. If you are a bioinformatician, you can just use bedtools intersect to overlap the results with any annotation of your choice and get the host gene, etc. in less than a minute.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by David Langenberger 11k

0

Entering edit mode

Hi, just a hint here. I download the latest version of segemehl (0.2.0), try the same command as David's. However, the testrealign.x program only give me two output files, splicesites.bed and transrealigned.bed, the program doesn't provide circular.bed file. I read the manual and find that I can detect the circular splicing events from the splicesites.bed file.

cat splicesites.bed | grep C:P

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.7 years ago by Yu ▴ 140

Ram · Accepted Answer · 2015-01-22

8

Entering edit mode

10.3 years ago

kepbod ▴ 90

You could try CIRCexplorer, a combined strategy to identify junction reads from back spliced exons and intron lariats using TopHat and TopHat-Fusion. The commands are very simple. For more information. please refer to Zhang et al., Complementary Sequence-Mediated Exon Circularization, Cell (2014), 159:134-147

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by kepbod ▴ 90

0

Entering edit mode

It is very efficient and convenient. Thanks!

ADD REPLY • link 10.3 years ago by zxoabc ▴ 110

0

Entering edit mode

Have you compared it to any of the other tools available? That is, CIRI or find_circ? I have tested find_circ, and will also run CIRI, but it would be interesting to know how CIRCexplorer performs.

Cheers.

ADD REPLY • link 10.2 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

The most time-consuming step for circular RNA identification is to search for fusion junction reads, most circular RNA identification tools (include CIRI, find_circ and CIRCexplorer) use other aligners (like bowtie or tophat-fusion) to map fusion junction reads. So the running performance relies on the aligner users used. I think the sensitivity of circular RNA identification has no big differences among these tools. But now CIRCexplorer could support more aligners (tophat and STAR) than others, and we also plan to develop more useful downstream analysis pipelines for circular RNA studies. If you have any suggestions, I am very glad to hear from you.

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by kepbod ▴ 90

0

Entering edit mode

Cool. I have already tested CIRCexplorer with STAR and it is very very fast, as you say, mostly due to the aligner, but the standalone CIRCexplorer scripts also perform very well. I am testing a few tools at the moment, and might write a blog post with the results.

Let me just add something to "So the running performance relies on the aligner users used". That is indeed true for most of the tools you mention, but not for CIRI which takes several hours/days to complete (and several GB of memory for a large dataset, >100M PE reads), whilst in the same conditions, CIRCexplorer takes a few minutes and requires only minimal amounts of RAM (<500MB). So points for CIRCexplorer :) I am only persisting with CIRI because (i) I don't like giving up; and (ii) I would like to see how the results compare (sensitivity/specificity) amongst several tools.

(p.s.: I am the guy confused with ciRNA/circRNA).

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by A. Domingues ★ 2.7k

1

Entering edit mode

Circular RNAs could be classified into two groups. One type is circular intronic RNAs (ciRNAs), which are derived from spliced introns. Here is a paper about ciRNAs. Another is back spliced exons (circRNAs). Circular RNAs mentioned in most papers are circRNAs.

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by kepbod ▴ 90

0

Entering edit mode

CIRCexplorer is fast and requires minimal amount of RAM, since its task a absolutely trivial! Sorry to say that, but computationally the problem is a one-liner. Filter for the correct Flags/Tags in the sam format, merge these to potential circular candidates and overlap them with a set of annotations.

That does not mean that the tool is useless! It is very useful, but it is not a surprise that it is fast and needs only a small amount of memory. Actually, ~500MB seems pretty much to me.

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by JohnBlue81 ▴ 500

0

Entering edit mode

Less than 500MB. If we are being pedantic:

    CPU time :               45.24 sec.
    Max Memory :             222.06 MB
    Average Memory :         84.46 MB

for the largest dataset. But perhaps the surprise here is not that CIRCexplorer is fast, but rather that CIRI is not (at least in my hands).

ADD REPLY • link 10.1 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

Ok, sorry. Normally one writes less than 500MB, if it close to 500MB. Otherwise you could write less than 10GB, which would hold the same information, meaning no information. ;) Just kidding.

ADD REPLY • link 10.1 years ago by JohnBlue81 ▴ 500

0

Entering edit mode

It's all good. I could have been more specific but could not be arsed to look up the cluster logs ;)

ADD REPLY • link 10.1 years ago by A. Domingues ★ 2.7k