I am working on saffron. my goal is to find candidate resistance genes in saffron. since saffron genome is not sequenced, I used its RNA-seq. I de novo assembled RNA-seq data using trinity in galaxy. then, again in galaxy, using tblastn, with E value 0.00000000001 and Minimum query coverage per hsp 70%, I found contigs that were similar to 112 reference plant resistance proteins.
First question: What you think about my approach? What are your better ideas for finding this genes in saffron RNA-seq?
I extracted longest ORFs of hit contigs and checked compared domains in those ORFs with domains in reference resistance genes using pfam. some ORFs have more domains than their similar reference genes.
Second question: How can I search domains in 700 ORFs and 112 genes in one step and not one by one?
Third question: How can I be sure about my annotations when some ORFs have additional domains that similar reference proteins don't have those domains.
Thank you All