Can I use own transcriptome without GTF file?
2
0
Entering edit mode
9.4 years ago
jhkim1972 • 0

I am a novice of RNA-Seq analysis. For differentially expressed genes analysis, I am trying to run bowtie/tophat and cufflinks pipeline with own transcriptome as a reference. Can I run with only the transcriptome without GTF file? My species is salmon that there is no genome for GTF file. Is the GTF file essential for this pipeline?

Thank you in advance for any idea and suggestion for this fundimental question.

RNA-Seq • 6.0k views
ADD COMMENT
8
Entering edit mode
9.4 years ago

Bowtie will work fine here, but cufflinks will not. If all you have to align against is a transcriptome then I would recommend doing the following:

  1. Align against the transcriptome with Hisat, bowtie2, or one of the many many other aligners.
  2. Use eXpress, RSEM, or one of the many similar tools to get estimated per-transcript counts. You could alternatively use Kallisto or Salmon for both this and the previous step (I'd recommend using Salmon simply because using software with the same name as the organism you're working on should get you a "+1 nicely played" from reviewers).
  3. Use these estimated counts in limma or edgeR or a similar program (not DESeq2, it won't allow this) to get differentially expressed transcripts. If you know (or at least can make a good guess) which transcripts belong together as a gene then you can sum their estimated counts and do gene-level differential expression.
ADD COMMENT
0
Entering edit mode

Wondering why DESeq2 cannot be used for this purpose. Can you explain why? Because you don't have gene names?

ADD REPLY
0
Entering edit mode

It won't accept non-integer counts. That's the only reason.

ADD REPLY
0
Entering edit mode

Dear Devon Ryan, Thank you for your brilliant suggestion. I will do.

ADD REPLY
0
Entering edit mode

thank you does RSEM accepts hisat2 index?

ADD REPLY
4
Entering edit mode
9.4 years ago
cyril-cros ▴ 950

Just read on the Salmon genome sequencing efforts, you are correct there is no annotated genome yet, and this will be annoying.

You may need to use a de novo transcriptome (created with Trinity Oases, transAbyss), after aligning your RNASeq reads. I also believe Cufflinks can work without an annotation (confirmation, anyone?).

Anyway, this step should yield a transcripts.fa file with the genomic sequences of the transcripts you want to quantify.

Use Devon Ryan's advice for finding differentially expressed transcripts. You will then need to identify your differentially expressed transcripts in order to match them to a metabolic process.

This article is about the Rainbow Trout which is also a member of the Salmonidiae. It might be of interest to you in terms of material/methods, and as a relatively close species.

ADD COMMENT
0
Entering edit mode

While cufflinks can run without a reference annotation, it wouldn't produce meaningful results when run on data aligned against a transcriptome. Cufflinks (the same holds for stringTie) is only useful when you feed it alignments in genomic coordinates. For transcriptomic coordinates, you already know where all of the transcripts are, they're each entry in the fasta file.

ADD REPLY
0
Entering edit mode

Thanks for the clarification. I have just got one last question: once you have aligned your reads, you don't have immediate access to the transcript sequences (the multi fasta file required by eXpress or Salmon - +1 indeed for the pun), no?
I don't see how you can get from the alignment step to the quantifying step without using a de novo alignment - but I am no expert...

ADD REPLY
0
Entering edit mode

jhkim1972 mentioned aligning against a transcriptome, so I assume he/she either downloaded an assembled transcriptome or already assembled one (e.g., with Trinity). Otherwise, yeah, the first step would be assembly. I think Trinity comes with some instructions for the whole assembly->DE transcripts process.

ADD REPLY
0
Entering edit mode

My bad, missed the 'own transcriptome' part. I am now unsure what the author means by 'no genome for gtf file'.

ADD REPLY
0
Entering edit mode

What OP meant to write was, "there's not yet a reference or assembled genome and, thus, also no good annotation (e.g., GTF file) yet."

ADD REPLY
0
Entering edit mode

Exactly, I have already got a de novo transcriptome by Trinity. And I want to use this transcriptome as a reference for my RNA-Seq data. However, as Devon Ryan said, cufflinks seemed to need assembled genome (GTF file). The study of salmonid genome is difficult because of a whole genome duplication (WGD). Anyway, thanks all for the useful comment.

ADD REPLY

Login before adding your answer.

Traffic: 1931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6