Question

How to extract the circRNA sequences based on the the back-spliced junctions detected?

2

Entering edit mode

8.0 years ago

zhuofei.xu ▴ 20

Dear All,

I have used several tools (findcirc, Circexplorer, DCC) to detect the circRNAs from mouse tissue. I have generated a list containing back-spliced junctions for each circRNA detected. An example for some circRNAs is given below

Chr Start End Strand

chr1 5089009 5098133 -

chr1 5093363 5133262 -

chr1 7120194 7120615 -

chr1 8414203 8448132 +

chr1 8554725 8595542 +

chr1 8554725 8607152 +

chr1 8583205 8595542 +

chr1 8583205 8607152 +

chr1 8624779 8682029 +

I'm wondering if there is a script or tool that can be used to extract the exon sequences of circRNAs detected. These circRNA sequences will be used for scanning miRNA binding sites.

The reference genome is mm10 and the gene annotation I used is gencode.vM14.annotation.gtf.

Any advice is very appreciated. Thank you very much!

Zhuofei

rna-seq • 4.2k views

ADD COMMENT • link updated 5.6 years ago by davidebarbagallo • 0 • written 8.0 years ago by zhuofei.xu ▴ 20

0

Entering edit mode

Are you sure that the output is from circexplorer? circexplorer outputs exon information. Then, you could code a little bit or use existing utilities for dealing with that

ADD REPLY • link 7.9 years ago by IP ▴ 780

0

Entering edit mode

The output is from DCC. According to your suggestion, I have found the related output from circexplorer and used bedtools getfasta to get the circRNA sequence. Thanks a million!

ADD REPLY • link 7.9 years ago by zhuofei.xu ▴ 20

0

Entering edit mode

I am new in circRNA research. Could you please help me how I extract circRNA Sequences from the circexplorer output.

Thanks in advance.

ADD REPLY • link 7.5 years ago by tofazzal.stat • 0

0

Entering edit mode

do you know how to program in python or use bash?

ADD REPLY • link 7.5 years ago by IP ▴ 780

0

Entering edit mode

Thanks for your reply. Yes I know python.

ADD REPLY • link 7.5 years ago by tofazzal.stat • 0

1

Entering edit mode

Then, you can use pysam module,example:

import pysam as ps
genome_fa = '/path/to/genome/fasta'
fastafile = ps.FastaFile(genome_fa) 
sequence = fastafile.fetch(chr1,100, 200)

Of course, then you can get the complementary if it is the minus strand using biopython or coding by scratch

ADD REPLY • link 7.5 years ago by IP ▴ 780

0

Entering edit mode

The sample output look like this: Chr Start of junction End of junction Circular RNA/Junction reads score Strand chrY 150833 159885 circular_RNA/2 1 + chrY 256250 258428 circular_RNA/1 0 - chrY 272139 273067 circular_RNA/2 0 - chrY 1455672 1456171 circular_RNA/1 1 - chrY 1490550 1497014 circular_RNA/1 1 - chrY 2111063 2111271 circular_RNA/4 0 - chrY 2134780 2159644 circular_RNA/2 0 - chrX 299512 302131 circular_RNA/1 0 - chrX 322139 323067 circular_RNA/1 0 - chrX 1505672 1506171 circular_RNA/1 1 -

That is, in you code 100 is the Start of junction and 200 is the End of Junction. Thank you very much.

ADD REPLY • link 7.5 years ago by tofazzal.stat • 0

0

Entering edit mode

Hi, probably, I'm out of topic, but I'm not familiar with python. Do you know if a database with FASTA sequences of circRNAs backsplice junction does exist? If it is available, please, could you kind let me have the web address where I can retrieve these FASTA sequences? I would be grateful for your help. Best,

Davide

ADD REPLY • link 5.6 years ago by davidebarbagallo • 0

0

Entering edit mode

Hi Davide, I have no idea. I encourage you to create a new post with that

ADD REPLY • link 5.6 years ago by IP ▴ 780