Choosing the best representative transcript for a gene
0
0
Entering edit mode
6 months ago
ZuelTech • 0

Hi,

I want to analyze the expression of stress-response related genes of a non-model organism. How will I select the best transcript for a gene?

Thank you!

transcript • 1.1k views
ADD COMMENT
2
Entering edit mode

There is no such thing as the best transcript, there is the possibility to select the longest transcript per gene from a transcriptome assembly. Why not include all transcripts from the transcriptome and use Salmon or Kallisto for transcript abundance?

ADD REPLY
0
Entering edit mode

Thank you. I actually have a csv file containing TPM values produced from Salmon. How can I use this as your suggestion?

ADD REPLY
1
Entering edit mode

What Michael said. You have a few options: You can pick the "canonical" = longest transcript, or you can use something like MANE to pick the most used (in papers, etc.) transcript.

For a way to pick the "best" transcript per gene (the algorithm that VEP uses), see: https://useast.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick_options

ADD REPLY
1
Entering edit mode

The canonical is not always the longest transcript. I support the suggestion to do gene level analysis. Meaning, use something like tximport from Bioconductor to sum transcript counts per gene into a single gene level value.

ADD REPLY
1
Entering edit mode

Maybe the later versions have corrected but canonical defaults to the longest, atleast using snpEff. VEP seems to handle this better. I do always go for the MANE/RefSeq-MANE as that's always accurate. ESR1 and BRCA1 are examples where canonical by snpEff is definitely wrong. I think older VEP versions used to get this wrong too, but v100+ (at least) get this right.

ADD REPLY
0
Entering edit mode

Thanks! @ATpoint I actually have two csv files containing TPM values produced by Salmon. One csv file has the Transcript IDs, and the other has the Gene IDs. Given that your suggestion to do gene level analysis, how will I use this to create a heatmap for the expression of stress-related genes?

ADD REPLY
0
Entering edit mode

gene level analysis

 

create a heatmap

Those are two different goals. To analyze salmon output (say for differential expression analysis), you can import it using tximport and use DESeq2. You can create heatmaps of any metrics using ComplexHeatmap, but comparing across samples is not straightforward.

ADD REPLY
0
Entering edit mode

How about the gene level analysis?

ADD REPLY
0
Entering edit mode

tximport + DESeq2. The former has options using which you can pick transcript/gene level.

ADD REPLY
0
Entering edit mode

thanks. Can I use the gene level analysis basis in creating a heatmap for expression analysis?

ADD REPLY
0
Entering edit mode

Heatmaps are not useful for any kind of robust analysis. They can provide a good visual. Please don't use heatmaps for anything but eyeballing to spot potential areas of investigation.

ADD REPLY
0
Entering edit mode

maybe I mislead you in my question. I just wanted to know how can I use gene level analysis instead of transcript level, especially in expression analysis. Is this a way for me to choose a representative (or best) transcript?

ADD REPLY
0
Entering edit mode

@Ram Thanks. Is this tool VEP applicable for non-model organism? I only have denovo assembly. Also, I would appreciate if you could share a video tutorial on this one.

ADD REPLY
1
Entering edit mode

You should see tutorials if you google "VEP tutorial video". I don't think it can work with non model organisms, check out their website though, they have an extensive list of organisms.

ADD REPLY

Login before adding your answer.

Traffic: 2794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6