Dear All,
I have used several tools (findcirc, Circexplorer, DCC) to detect the circRNAs from mouse tissue. I have generated a list containing back-spliced junctions for each circRNA detected. An example for some circRNAs is given below
Chr Start End Strand
chr1 5089009 5098133 -
chr1 5093363 5133262 -
chr1 7120194 7120615 -
chr1 8414203 8448132 +
chr1 8554725 8595542 +
chr1 8554725 8607152 +
chr1 8583205 8595542 +
chr1 8583205 8607152 +
chr1 8624779 8682029 +
I'm wondering if there is a script or tool that can be used to extract the exon sequences of circRNAs detected. These circRNA sequences will be used for scanning miRNA binding sites.
The reference genome is mm10 and the gene annotation I used is gencode.vM14.annotation.gtf.
Any advice is very appreciated. Thank you very much!
Zhuofei
Are you sure that the output is from circexplorer? circexplorer outputs exon information. Then, you could code a little bit or use existing utilities for dealing with that
The output is from DCC. According to your suggestion, I have found the related output from circexplorer and used bedtools getfasta to get the circRNA sequence. Thanks a million!
I am new in circRNA research. Could you please help me how I extract circRNA Sequences from the circexplorer output.
Thanks in advance.
do you know how to program in python or use bash?
Thanks for your reply. Yes I know python.
Then, you can use pysam module,example:
Of course, then you can get the complementary if it is the minus strand using biopython or coding by scratch
The sample output look like this: Chr Start of junction End of junction Circular RNA/Junction reads score Strand chrY 150833 159885 circular_RNA/2 1 + chrY 256250 258428 circular_RNA/1 0 - chrY 272139 273067 circular_RNA/2 0 - chrY 1455672 1456171 circular_RNA/1 1 - chrY 1490550 1497014 circular_RNA/1 1 - chrY 2111063 2111271 circular_RNA/4 0 - chrY 2134780 2159644 circular_RNA/2 0 - chrX 299512 302131 circular_RNA/1 0 - chrX 322139 323067 circular_RNA/1 0 - chrX 1505672 1506171 circular_RNA/1 1 -
That is, in you code 100 is the Start of junction and 200 is the End of Junction. Thank you very much.
Hi, probably, I'm out of topic, but I'm not familiar with python. Do you know if a database with FASTA sequences of circRNAs backsplice junction does exist? If it is available, please, could you kind let me have the web address where I can retrieve these FASTA sequences? I would be grateful for your help. Best,
Davide
Hi Davide, I have no idea. I encourage you to create a new post with that