We performed deep-seq on several samples and have been provided with the resulting analysis. I'm attempting to mine the data for specific transcript isoforms so I can do further wet lab analysis (eg. make primers and quantify each isoform). To do this, I need to use the merged.gtf file to "make" the sequences in fasta format. What is the best way to go about this? I've become familiar with command line and Galaxy functions. Below is a snippet of the data. It's obvious that each line is an exon, and I'm assuming that each potentially novel isoform is just the concatenation of these exons, but that seems too simple and I still don't know how to automate this to get a fasta for any particular transcript. Any help or direction to tutorials would be great. Thanks!
1 Cufflinks exon 1189228 1189283 . + . gene_id "XLOC_000009"; transcript_id "TCONS_00000010"; exon_number "1"; gene_name "CRYZL1"; oId "CUFF.179.5"; nearest_ref "ENSBTAT00000049681"; class_code "j"; tss_id "TSS10";
1 Cufflinks exon 1197951 1198022 . + . gene_id "XLOC_000009"; transcript_id "TCONS_00000010"; exon_number "2"; gene_name "CRYZL1"; oId "CUFF.179.5"; nearest_ref "ENSBTAT00000049681"; class_code "j"; tss_id "TSS10";
1 Cufflinks exon 1199592 1199669 . + . gene_id "XLOC_000009"; transcript_id "TCONS_00000010"; exon_number "3"; gene_name "CRYZL1"; oId "CUFF.179.5"; nearest_ref "ENSBTAT00000049681"; class_code "j"; tss_id "TSS10";
The list of exons continues, and restarts at exon 1 multiple times for any one gene isoform.