I'm new in this field. I want to identify circular RNAs from RNA-seq. I want to know which tools I could choose to detect circular RNAs using some simple commands. Thanks!
I'm new in this field. I want to identify circular RNAs from RNA-seq. I want to know which tools I could choose to detect circular RNAs using some simple commands. Thanks!
You could try CIRCexplorer, a combined strategy to identify junction reads from back spliced exons and intron lariats using TopHat and TopHat-Fusion. The commands are very simple. For more information. please refer to Zhang et al., Complementary Sequence-Mediated Exon Circularization, Cell (2014), 159:134-147
You could try segemehl:
Create index:
./segemehl.x -d hg19.fa -x hg18.idx
Map you reads:
./segemehl.x -d hg19.fa -i chr1.idx -q reads.fastq -S | samtools view -bS - | samtools sort -o - deleteme | samtools view -h - > mapped.sam
Call different splice junctions:
./testrealign.x -d hg19.fa -q mapped.sam -n
You will end up with three different BED files: normal.bed
, trans.bed
and circular.bed
. The last holds the back-splicings (circularized RNAs).
If you don't understand the output format, you can find more information in the publication
and in the manual. These sources will also explain you how the algorithm works.
Well, the only solution is to use a machine with at least 60GB of main memory. That's the main drawback of segemehl.
Its results are pretty good, but if you don't have a good machine, you should go for other tools. The overlap of all tools is very high anyways. All of them will find the most over-represented circularized RNAs.
If you are interested, here is a comparison of the segemehl with other tools. But keep in mind that is from the segemehl publication!
Performance of various read aligners on simulated data sets with different splice events. For simulated 454 reads (400 bp), segemehl performed significantly better in detecting conventional and 'non-conventional' (strand-reversing, long-range) splice junctions. segemehl was the only tool that consistently recalled more than 90% of conventional splice junctions. For 'non-conventional' splice events, segemehl extended its lead to 40% for recall without losing precision. Likewise, compared to three of the seven alternative tools, segemehl had a 30% increase in recall for irregularly spliced Illumina reads (100 bp). Compared to TopHat2, it had a slight increase while reporting significantly fewer false positives. At the same time, segemehl's performance with simulated, regularly spliced Illumina reads was comparable with the other seven tools tested. gs, GSNAP; ms, MapSplice; ru, RUM; se, segemehl; sm, SpliceMap; so, SOAPsplice; st, STAR; to, TopHat2.
Just to keep it clear here:
segemehl is a mapping algorithm, that allows split-read mapping (including back-splicing events). segemehl is not a circularized RNA detection tool, like CIRCexplorer.
Just to assure that we do not compare apples with oranges here. CIRCexplorer offers a lot of downstream information, which nobody would expect from a mapping tool like segemehl. :) Talking about the mapping: segemehl might be a bit more sensitive in finding back-splice junctions, but as I mentioned before the overlap of the results between the different mapping algorithms (which are able to detect back-splicing) is pretty high anyway.
If you are not a bioinformatician, CIRCexplorer might be the tool of your choice. If you are a bioinformatician, you can just use bedtools intersect to overlap the results with any annotation of your choice and get the host gene, etc. in less than a minute.
Hi, just a hint here. I download the latest version of segemehl (0.2.0), try the same command as David's. However, the testrealign.x
program only give me two output files, splicesites.bed
and transrealigned.bed
, the program doesn't provide circular.bed
file. I read the manual and find that I can detect the circular splicing events from the splicesites.bed
file.
cat splicesites.bed | grep C:P
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It is very efficient and convenient. Thanks!
Have you compared it to any of the other tools available? That is, CIRI or find_circ? I have tested find_circ, and will also run CIRI, but it would be interesting to know how CIRCexplorer performs.
Cheers.
The most time-consuming step for circular RNA identification is to search for fusion junction reads, most circular RNA identification tools (include CIRI, find_circ and CIRCexplorer) use other aligners (like bowtie or tophat-fusion) to map fusion junction reads. So the running performance relies on the aligner users used. I think the sensitivity of circular RNA identification has no big differences among these tools. But now CIRCexplorer could support more aligners (tophat and STAR) than others, and we also plan to develop more useful downstream analysis pipelines for circular RNA studies. If you have any suggestions, I am very glad to hear from you.
Cool. I have already tested CIRCexplorer with STAR and it is very very fast, as you say, mostly due to the aligner, but the standalone CIRCexplorer scripts also perform very well. I am testing a few tools at the moment, and might write a blog post with the results.
Let me just add something to "So the running performance relies on the aligner users used". That is indeed true for most of the tools you mention, but not for CIRI which takes several hours/days to complete (and several GB of memory for a large dataset, >100M PE reads), whilst in the same conditions, CIRCexplorer takes a few minutes and requires only minimal amounts of RAM (<500MB). So points for CIRCexplorer :) I am only persisting with CIRI because (i) I don't like giving up; and (ii) I would like to see how the results compare (sensitivity/specificity) amongst several tools.
(p.s.: I am the guy confused with ciRNA/circRNA).
Circular RNAs could be classified into two groups. One type is circular intronic RNAs (ciRNAs), which are derived from spliced introns. Here is a paper about ciRNAs. Another is back spliced exons (circRNAs). Circular RNAs mentioned in most papers are circRNAs.
CIRCexplorer is fast and requires minimal amount of RAM, since its task a absolutely trivial! Sorry to say that, but computationally the problem is a one-liner. Filter for the correct Flags/Tags in the sam format, merge these to potential circular candidates and overlap them with a set of annotations.
That does not mean that the tool is useless! It is very useful, but it is not a surprise that it is fast and needs only a small amount of memory. Actually, ~500MB seems pretty much to me.
Less than 500MB. If we are being pedantic:
for the largest dataset. But perhaps the surprise here is not that CIRCexplorer is fast, but rather that CIRI is not (at least in my hands).
Ok, sorry. Normally one writes less than 500MB, if it close to 500MB. Otherwise you could write less than 10GB, which would hold the same information, meaning no information. ;) Just kidding.
It's all good. I could have been more specific but could not be arsed to look up the cluster logs ;)