Hi all,
I'm not certain what's the best way to do this, so any help will be highly appreciated.
I have a merged gtf file that I created after doing rna-seq>STAR
>stringtie
pipeline with a refernce genome.
I also have other gtfs and files such as: RFAM
DB results, exonerate
(protein alignment) result, ab initio result from AUGUSTUS
.
I then used the transdecoder
tool to predict ORFs but I only used the stringtie merged gtf file.
Is it better to try and merge all of my diferent inputs to get a larger and more descriptive gtf and then use transdecoder
on it for the final results, or should I use transdecoder
on the rnaseq pipeline resuls and then merge the result gff with the other gtfs I got from different type of evidences?
The goal is to create gene prediction models based on all this evidence and the input genome.
Thanks a lot.
Not sure if I can follow:
it seems you want to do gene prediction on a genome but if I remember correctly TransDecoder is used for ORF finding on transcripts.
You are correct,
StringTie
pipeline results in an assembled transcriptome on which I usetransdecoder
I thought that finding ORFs within these transcripts can help me gather information regarding the genes but perhaps I am wrong.
I'm trying to create gene models with what I have, but I lack knowledge in this field.
If you have any recommendation on how to proceed with what I have (assembled transcriptome,
exonerate
protein2genome
output, ab initio gene prediction,RFAM
results andtransdecoder
) Id love to hear it, Thanks!ok.
No, you're not wrong here, at worst a bit sub-optimal :)