Relative transcript expression
3
0
Entering edit mode
6.5 years ago

Hi,

I have been trying to understand relative expression of two transcripts from a gene.

Let's, say I have a gene with 6 exon and it produces two transcripts: isoform 1 with all 6 exons. and isoform 2 with exon 1, 2, 3, 4 & 6.

I have bam files STAR and I don't want to do alignment again so I would really appreciate if anyone can suggest tool that will quantify these two isoforms.

Thanks in advance.

RNA-Seq transcript expression isoform • 3.0k views
ADD COMMENT
0
Entering edit mode

Try miso as well.

ADD REPLY
0
Entering edit mode

I have MISO results and as you know miso only consider alternative exon along with upstream and downstream exons but not entire transcript.

ADD REPLY
1
Entering edit mode
6.5 years ago

Hi Govardhan

Basically, there are 2 steps

  1. The identification of the transcripts.
  2. Estimating the "relative" abundance of those transcripts in your sample.

When you say you have already have isoforms in hand, I believe that you are already done with the step#1.

So, if you have the bam files and the corresponding reference genome in hand, you can run stringtie to estimate the abundances (step#2)

In case if you are not yet done with step#1 then you will have to run stringtie 2 times as described below

  • first time with the bam files and the reference file to perform a "reference guided" transcriptome assembly.
  • taking the consensus set of transcripts from all samples as reference, you will have to estimate their abundance.

By abundance, I mean the FPKM or TPM values (or your favourite metric) which stringtie will generate for you.

NOTE: StringTie is part of the new tuxedo protocol.

ADD COMMENT
0
Entering edit mode

Hi Vijay,

Thank you.

Yes, I have identified the transcripts and I have generated GTF file of two transcripts. Now I am trying to get the relative abundance but I getting "Error: could not any valid reference transcripts in Demo.gtf (invalid GTF/GFF file?)?

My gtf looks like : chrX protein_coding exon XXX507 XXX637 . + . gene_id "geneX"; transcript_id "isoX"; gene_name "geneX"; chrX protein_coding CDS XXX507 XXX637 . + . gene_id "geneX"; transcript_id "isoX"; gene_name "geneX"; chrX protein_coding exon XXX612 XXX724 . + . gene_id "geneX"; transcript_id "isoX"; gene_name "geneX"; chrX protein_coding CDS XXX612 XXX724 . + . gene_id "geneX"; transcript_id "isoX"; gene_name "geneX";

ADD REPLY
0
Entering edit mode

Share the exact command for

  • mapping

  • and for this step (abundance)

ADD REPLY
0
Entering edit mode

Alignment

STAR --runMode alignReads --outSAMtype BAM SortedByCoordinate --runThreadN 10 --genomeDir $FastaIndex --readFilesIn $R1 $R2

I just started with basic one for abundance

~/stringtie-1.3.4d.Linux_x86_64/stringtie Aligned.sortedByCoord.out.bam -G Demo.gtf
ADD REPLY
0
Entering edit mode

What is the output of this?

stringtie -G reference.gtf -o out.gtf sample.sorted.bam

reference.gtf = GTF file for the corresponding reference genome you are using

out.gtf = stringtie will generate for you

sample.sorted.bam = coordinate sorted bam file

This step is the assembly step. The out.gtf will have the information of the assembled transcripts.

Once you are done with this, the next step is abundance which I ll share later

ADD REPLY
0
Entering edit mode

Why do I need to use reference GTF when I can use gtf of two transcripts??

Is it something StrinTie requires?? and output of above command is GTF i.e. chrM StringTie transcript 1 16571 1000 . . gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "20872.708984"; chrM StringTie exon 1 16571 1000 . . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "20872.708984" ;

ADD REPLY
1
Entering edit mode
6.5 years ago

For quantification of transcripts you could also look at fast alignment-free approaches such as Salmon.

ADD COMMENT
1
Entering edit mode

It's also worth adding here, that Salmon needs a tool called Wasabi, to make the output into a h5 structure, ready for differential isoform modelling in Sleuth

ADD REPLY
0
Entering edit mode
6.5 years ago

Why do I need to use reference GTF when I can use gtf of two transcripts??

A reference is required when you are performing "reference guided assembly". Information of the genomic features will be utilized from the reference GTF file. Are you trying to do a de novo assembly?

Is it something StrinTie requires??

Its optional, stringtie can perform denovo assembly.

ADD COMMENT
0
Entering edit mode

I am working on Human samples, so I just need expression of each transcripts from one gene.

Thanks, Govardhan

ADD REPLY
0
Entering edit mode

Did you try RSEM?

ADD REPLY
0
Entering edit mode

Again, the problem with RSEM is the alignment. I have bam files from STAR and they are not compatible with RSEM and same goes for cufflinks as well. Honestly I can't afford realignment so trying to find way to utilise what I have at the moment. Anyways, thank you for your help and time.

ADD REPLY
0
Entering edit mode

If "time" is the concern, then you can try HISAT2 for alignment, but the call is yours! You're welcome. I ll be glad if you share the final thing that helped.

ADD REPLY
0
Entering edit mode

Govardhan, STAR is compatible with cufflinks. Please paste an error snippet if you get any so that I may help with it

ADD REPLY
0
Entering edit mode

Jeffin, you are right cufflinks accepts the STAR bam files but results are different. I tried feeding one sample bam from TopHat and STAR. Anyways, I got splicing information from various tool and now I am planning use that value.

I posted this question here because I wish to compare entire transcript expression rather than alternative exon.

ADD REPLY

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6