Entering edit mode
14 hours ago
Biodee
•
0
Hi,
I want to analyze the expression of stress-response related genes of a non-model organism. How will I select the best transcript for a gene?
Thank you!
There is no such thing as the best transcript, there is the possibility to select the longest transcript per gene from a transcriptome assembly. Why not include all transcripts from the transcriptome and use Salmon or Kallisto for transcript abundance?
What Michael said. You have a few options: You can pick the "canonical" = longest transcript, or you can use something like MANE to pick the most used (in papers, etc.) transcript.
For a way to pick the "best" transcript per gene (the algorithm that VEP uses), see: https://useast.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick_options
The canonical is not always the longest transcript. I support the suggestion to do gene level analysis. Meaning, use something like tximport from Bioconductor to sum transcript counts per gene into a single gene level value.
Maybe the later versions have corrected but canonical defaults to the longest, atleast using snpEff. VEP seems to handle this better. I do always go for the MANE/RefSeq-MANE as that's always accurate. ESR1 and BRCA1 are examples where canonical by snpEff is definitely wrong. I think older VEP versions used to get this wrong too, but v100+ (at least) get this right.