Hello, I have a RNASeq.gtf file containing splicing variants of a long series of genes. I would like to obtain:
- a) a text file listing all the spliced FASTA sequences for every variant;
- b) a text file listing all the common (between splicing variants) spliced FASTA sequences for every gene.
For the point a) I fixed the input file format for the UCSC TableBrowser, I uploaded it as a custom track, I downloaded all the subregions of the track listed as exons on UCSC Table Browser. Even if the overall results appear fine, some sequences (once BLATed at Ensembl) appear strongly 3'-truncated. Could it just be essentially due to inaccuracies of the RNASeq file?
For the point b) I was thinking that somehow extracting a consensus from the .gtf file would basically output a list of all the common (between splicing variants) unspliced FASTA sequences for every gene (one way would probably be to use SamTools, but currently I do not know how to do this). Repeating the exon extraction as done for the point a), if correct, would give me the b) list.
In summary, I am asking:
- is the approach I am using valid? Are there better alternatives?
- how to extract a consensus file from a .gtf file?
Thanks in advance.