Hi, I have realized a de novo annotation of some genes of interest using https://github.com/shenkers/isoscm. I can generate a few different isoforms for a dozen of genes that are of particular interest to me. I end up with a gtf file of my various transcripts.
Could you give me your advice on how to quantify their use in a given alignment? I have already done some Northern Blot and RT-qPCR to identify isoforms and estimate their relative abundance, for a few specific genes.
What I would like to get now for each of my genes is a relative abundance along those lines: in this given gene, transcript x represents y% of all isoforms, plus or minus z%.
I have noticed this listing of quantification software (http://omictools.com/quantification-c354-p1.html) and it is a bit overwhelming to say the least.
I won't use the reference annotation (gtf) or gene regulation (gff) because I work on olfactory genes which are not well annotated. I know Cufflinks is a possible solution, but I want to use my own transcripts (which I checked using IGV).
It seems to me that Salmon might be the easiest solution...
PS: as a last remark, I have masked most of my alignment file since very few genes interest me. Computational requirements won't be an issue here.
Note that cufflinks allows you to supply your own custom GTF when run in
ref-only
orref-guided
mode.Yes, it does. However, one of my issue is that I mainly want to study the UTRs [Logic: there are virtually no introns in the cds of olfactory genes]. Cufflinks sometimes does weird things with the 3'UTR, and I am not sure its quantification method is very good for isoforms...
I have had a similar problem for quantifying 3'UTR isoforms. My solution was the following:
poly-d(T) RT cDNA
size select on 8% PAGE for 75-125 bp
stranded library prep kit
Paired-end read 2x50
Filter for reads that have a seed mapping to
AAAAAAAA
and any 15mer mapping to the 3'UTR region of my gene of interest (zero allowed mismatches toAAAAAAAA
and 1 allowed mismatch to 15mer)The resulting file gave all candidate reads.
I have been using published RNASeq data as a first approach, so I can't easily change or use 3'UTR specific sequencing methods. My reads are 75bp long, paired-end, unstranded, with a good quality. I am working on mice olfactory epithelia, merging 3 biological replicates for each sex. I end up with 140M reads per each sex, with ~120M uniquely aligned (using STAR). The unstranded protocol is a bit annoying...
Thanks for your suggestion though.