A recent publication investigated splice variants of a gene I am interested in (using SMRT seq) and they described different/additional exons compared to what I find in NCBI or ENSEMBL.
I wanted to analyze splice variants and exon counts of this gene using the described exons from this publication in my RNA-seq data. I have a file with exon number and sequence.
How do I align my RNA-seq data to this list of exons? I thought about taking the normal .gtf file from ENSMBL and edit it to accommodate the exon changes. Is that the recommended way of doing so? And if so, how do I do it?
Thank you for your help!
What is the best way to edit the GTF file?
Would I remove the lines associated with the Gene I am interested in and then add my custom lines?
I will probably need the exact start and end of each exon on the genome I am using for indexing, right?
awk?
I agree with Nicolas. Adding custom transcript annotations to the gtf is correct way forward. One thing to remember is to add the fasta sequence of custom exon annotations to their specific start positions in the chromosome/scaffold of interest in the genome .fa file.