Question

Finding isoform sequences for Differentially spliced genes

0

Entering edit mode

9.3 years ago

kandoigaurav ▴ 150

I've performed RNA-Seq analysis for A. thaliana using the TopHat/Cufflinks pipeline for 2 conditions. How can I extract the sequences of all isoforms of the genes which are differentially alternatively spliced b/w the two conditions? (Significant genes as output by cuffdiff in the splicing.diff file)

While browsing through the splicing.diff output file, I found that the coordinates for genes mentioned in the file are nowhere to be found in the transcript.gtf file produced by Cufflinks.

Sample splicing.diff file:

TSS1 XLOC_000001 AT1G01010 Chr1:3630-5899 WT20 WT37 NOTEST 0 0 0 0 1 1 yes
TSS10 XLOC_000007 AT1G01180 Chr1:75582-76758 WT20 WT37 NOTEST 0 0 0 0 1 1 yes

Sample transcript.gtf file:

Chr1 Cufflinks transcript 11903 12897 1000 - . gene_id "WT_37.1"; transcript_id "WT_37.1.1"; FPKM "0.6185344748"; frac "1.000000"; conf_lo "0.341917"; conf_hi "0.895152"; cov "2.561616";
Chr1 Cufflinks exon 11903 12897 1000 - . gene_id "WT_37.1"; transcript_id "WT_37.1.1"; exon_number "1"; FPKM "0.6185344748"; frac "1.000000"; conf_lo "0.341917"; conf_hi "0.895152"; cov "2.561616";

As you can see, the gene_id in the transcript.gtf file is also not same as that in the splicing.diff file, which further complicates extraction. The coordinates in splicing.diff files are usually not present in the transcript.gtf file. Either of the start or end coordinates is few always ~100nt apart while the other end is much further away.

I've tried gffreads, but it only extracts sequences for the transcript.gtf file for both conditions and not differentially spliced between the two conditions. The GeneID's in gtf and splicing.diff files are different, so it gets more confusing.

Cufflinks RNA-Seq Splicing TopHat • 2.4k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by kandoigaurav ▴ 150

0

Entering edit mode

Convert the splicing.diff to bed format and use bedtools getfasta?

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by GouthamAtla 12k

1

Entering edit mode

Thanks for the suggestion. I can do that, but it'll give me the entire gene sequence for all differentially spliced genes in splicing.diff and not all the isoforms of these genes per se. Any way to get the isoforms from both conditions for these differentially spliced genes?

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by kandoigaurav ▴ 150