Hi There,
I would like to identify novel splicing events occurring in 4 human paired-end RNA-seq samples. From literature I figured tophat and cufflink can do such thing. So I used Tophat to assemble and Cufflink to find all transcripts. Next I used cuffcompare to identify novel transcripts from known ones (using gene.gtf downloaded from UCSC table browser). And then I got the ones with class_code = "j", which according to manual should be novel.
So now I am left with a list like this:
chr1 Cufflinks exon 885636 886043 . + . gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "1"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1 Cufflinks exon 886536 887714 . + . gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "2"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1 Cufflinks exon 887947 888496 . + . gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "3"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1 Cufflinks exon 888580 888747 . + . gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "4"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1 Cufflinks exon 889163 889251 . + . gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "5"; gene_name "ENST00000379410"; oId "CUF
But I do not know how to interpret them? How I can get the sequence of the novel transcripts? Is there a way to figure what happened that result in a novel junction? like if it is an insertion, deletion and if cause frameshift?
This is my first time doing such analysis, any help would be greatly appreciated, :-)
Sahel