I'm working on comparing circRNAs obtained via microarray analysis and RNA-Seq for a given group of multiple sclerosis human samples.
Microarray analysis categorize the different circRNAs as exonic, intronic or sense overlapping (if the circRNA transcribed from the same gene locus).
For RNA-Seq analysis I align the reads with STAR and annotate the circRNAs with CIRCexplorer2 (just how it is described in the CIRCexplorer2 home page: https://circexplorer2.readthedocs.io/en/latest/ . I downloaded the hg19 reference and annotation files from USCS (fasta, gtf and genePred).
When I compared both results, I found out that no circRNA categorized as intronic or sense overlapping was detected by CIRCexplorer2 algorithm.
Does anyone have ever a problem like this? If yes, does anyone know if it is because of CIRCexplorer2 itself or maybe different annotation files are needed? Also, if somebody knows about other circrna prediction tools that avoid this problem could help me.
Thanks
Thanks! I was just comparing microarray and RNASeq results and didn't notice that CIRCexplorer2 detects intronic circRNA. Another related thing, should I understand that CIRCexplorer2 compares the obtained back spliced junctions with CIRCpedia?
Hi again :)
No, CIRCexplorer2 takes two inputs:
In this step, what CIRCexplorer2 does is assigning you raw signal (the backspliced read coordinates) to the genes on the genome (the gene annotation file). To do so, it is basically comparing the alignment coordinates of your reads, to the annotation of the genome. If they match, the number of backspliced junctions is assigned to the matched transcript
Let me now if this helps, if not we can chat using other means
cheers,
Hi! Yes, it helps.
Only one more doubt. Since my gene annotation file only stores information about exon's start and end position I would never find any circRNA that splits an exon, would I?
For example, RPPH1 is a gene with only one exon that has many circRNA annotated inside it (Circbase provides many splice sites inside it that form circRNA). In fact, in the back_spliced_junction file generated by CIRCExplorer2 I have found some of them that will not be annotated as circRNA in the next step.
Is there any gene annotation file that could solve this problem? Maybe the best solution is just compare those back spliced junctions with circRNA databases and classify them as circRNA.
cheers,
In this case I would go for De novo indentification of circRNA: In this scenario, you scan the reads for backspliced reads without no use of information in the genome annotation. The pros of this are that you will indentify circRNAs that do not follow the reference (obvious).The cons is that you will select for a lot of noise and might end up on a rabbit hole, and eventually going back to CIRCexplorer2. The use of this tools depends on which scientific question you want to answer.
As far as I am aware, the tools that do this are CIRI and segemehl. But there might be a recent software. You should check that.
I have never use CIRI, but my experience with segemehl is that it has a really good sensitivity, with a cost of a high false positive rate, see this paper
Hope this helps,