Question

STAR RNA-seq aligned output ReadsPerGene.out.tab

0

Entering edit mode

6.1 years ago

Boboboe ▴ 40

Hi,

I'm trying to perform differential expression analysis on different conditions my RNA-seq data. I used --quant GeneCount(something similar, forgot the except option) to get the output ReadsPerGene.out.tab. I was wondering if I could use the read count from ReadsPerGene table generated from STAR to do some differential expression analysis, or if the read count in that output is not reliable enough to do the following analysis. Also, would it be necessary to do transcript assembly for the purpose of performing diffrerential expression analysis? Thanks!!

RNA-Seq STAR differential expression analysis • 6.2k views

ADD COMMENT • link updated 6.1 years ago by h.mon 35k • written 6.1 years ago by Boboboe ▴ 40

score 0 · Answer 1 · 2018-10-24

0

Entering edit mode

6.1 years ago

h.mon 35k

Yes, --quantMode GeneCounts is reliable and fast, you can use it for differential gene expression. However, as it summarizes counts over genes, it is not appropriate for differential transcript expression.

Also, would it be necessary to do transcript assembly for the purpose of performing diffrerential expression analysis?

No, it is not necessary to perform transcript assembly, as it seems you have an annotated genome.

ADD COMMENT • link 6.1 years ago by h.mon 35k

0

Entering edit mode

Thank you for your response. I am pretty sure my data is unstranded. in that case, should I use the second column of the output for the analysis?

ADD REPLY • link 6.1 years ago by Boboboe ▴ 40

0

Entering edit mode

Yes, In case of unstranded, you need to use the 2nd column of ReadsPerGene.out.tab output as RAW count.

You can follow up the documentation section 7 Counting number of reads per gene in this STAR manual.

ADD REPLY • link 6.1 years ago by sangram_keshari ▴ 260

0

Entering edit mode

I used gencode annotation. and when I looked into the gene_id, all the id are in form of "ENSG00000223972.5_2", with an underscore and a number attached to it. what is the number after the underscore? is it the exon? (e.g for my example, it would be the exon 2 of ENSG00000223972.5 gene). If that's the case, should I pull the reads for all the exons together?

ADD REPLY • link 6.1 years ago by Boboboe ▴ 40

0

Entering edit mode

The number after the underscore also might indicate the isoforms. Check properly what type of annotation you used and which parameters provided while running the tool.

ADD REPLY • link 6.1 years ago by sangram_keshari ▴ 260

0

Entering edit mode

Please read STAR manual, it explains in detail the output of the --quantMode geneCounts option.

ADD REPLY • link 6.1 years ago by h.mon 35k