Obtaining Fastas From .Gtf File For Splicing Variants
0
0
Entering edit mode
11.4 years ago
Anima Mundi ★ 2.9k

Hello, I have a RNASeq.gtf file containing splicing variants of a long series of genes. I would like to obtain:

  • a) a text file listing all the spliced FASTA sequences for every variant;
  • b) a text file listing all the common (between splicing variants) spliced FASTA sequences for every gene.

For the point a) I fixed the input file format for the UCSC TableBrowser, I uploaded it as a custom track, I downloaded all the subregions of the track listed as exons on UCSC Table Browser. Even if the overall results appear fine, some sequences (once BLATed at Ensembl) appear strongly 3'-truncated. Could it just be essentially due to inaccuracies of the RNASeq file?

For the point b) I was thinking that somehow extracting a consensus from the .gtf file would basically output a list of all the common (between splicing variants) unspliced FASTA sequences for every gene (one way would probably be to use SamTools, but currently I do not know how to do this). Repeating the exon extraction as done for the point a), if correct, would give me the b) list.

In summary, I am asking:

  • is the approach I am using valid? Are there better alternatives?
  • how to extract a consensus file from a .gtf file?

Thanks in advance.

rna-seq gtf splicing samtools • 3.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 2832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6