Hi All, I am trying to run Circexplorer2 version 2.3.5 with Star output Chimeric.out.junction files. I have multiple samples.
The parsing step produces multiple bed files and annotation step produces outputs that have different number of circRNAs and ciRNAs. How should I combine these into a count table for differential expression analysis. Ideally shouldn't there be a single bed file. Am I missing something.
Here are my codes:
Parsing
for i in *Chimeric.out.junction; do time CIRCexplorer2 parse -t STAR $i -b $i.new.back_spliced_junction.bed > $i.CIRCexplorer2_parse.log ;done
Annotating
for i in *.back_spliced_junction.bed ; do CIRCexplorer2 annotate -r hg38_ref_all.txt -g hg38.fa -b $i -o $i.Circexplorer2.txt ; done
cat Sample_145_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
19258 circRNA
612 ciRNA
cat Sample_146_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
17791 circRNA
729 ciRNA
Thank you!!!
Thank you for the help Kevin!!
I want to do the option 2. I want to produce a 'master' table of all identified RNAs for all samples.
However as the number of circRNAs are different for each sample, I am not sure what is the best way combine these tables together.
Thank you!!
Hey, that is just an analysis decision that you will have to make. You could include only the common circRNAs,or all identified circRNAs