Question

Align RNA-seq data to a custom list of exons?

0

Entering edit mode

6.1 years ago

Cumol ▴ 40

A recent publication investigated splice variants of a gene I am interested in (using SMRT seq) and they described different/additional exons compared to what I find in NCBI or ENSEMBL.

I wanted to analyze splice variants and exon counts of this gene using the described exons from this publication in my RNA-seq data. I have a file with exon number and sequence.

How do I align my RNA-seq data to this list of exons? I thought about taking the normal .gtf file from ENSMBL and edit it to accommodate the exon changes. Is that the recommended way of doing so? And if so, how do I do it?

Thank you for your help!

RNA-Seq alignment exon • 1.5k views

ADD COMMENT • link 6.0 years ago by Cumol ▴ 40

score 1 · Answer 1 · 2018-11-12

1

Entering edit mode

6.1 years ago

Nicolas Rosewick 11k

Yes you can directly change the gtf file to add the exon of interest. Then use featurecounts to count the number of reads per exon (using -t exon -g exon_id (if there is an exon_id in the gtf file).

ADD COMMENT • link 6.1 years ago by Nicolas Rosewick 11k

0

Entering edit mode

What is the best way to edit the GTF file?

Would I remove the lines associated with the Gene I am interested in and then add my custom lines?

I will probably need the exact start and end of each exon on the genome I am using for indexing, right?

ADD REPLY • link 6.1 years ago by Cumol ▴ 40

0

Entering edit mode

awk?

ADD REPLY • link 6.1 years ago by cpad0112 21k

0

Entering edit mode

I agree with Nicolas. Adding custom transcript annotations to the gtf is correct way forward. One thing to remember is to add the fasta sequence of custom exon annotations to their specific start positions in the chromosome/scaffold of interest in the genome .fa file.

ADD REPLY • link 6.0 years ago by Praneet Chaturvedi ▴ 120

score 0 · Answer 2 · 2018-11-12

0

Entering edit mode

6.0 years ago

Cumol ▴ 40

How do I turn my exon sequences into the fasta format? The only idea I had was to blast them against the target genome (using ensembl) and then convert the output into GTF. But it doesn't seem to be so straight forward.

Is there a better option?

ADD COMMENT • link 6.0 years ago by Cumol ▴ 40