Hi,
I'm trying to perform differential expression analysis on different conditions my RNA-seq data. I used --quant GeneCount(something similar, forgot the except option) to get the output ReadsPerGene.out.tab. I was wondering if I could use the read count from ReadsPerGene table generated from STAR to do some differential expression analysis, or if the read count in that output is not reliable enough to do the following analysis. Also, would it be necessary to do transcript assembly for the purpose of performing diffrerential expression analysis? Thanks!!
Thank you for your response. I am pretty sure my data is unstranded. in that case, should I use the second column of the output for the analysis?
Yes, In case of unstranded, you need to use the 2nd column of ReadsPerGene.out.tab output as RAW count.
You can follow up the documentation section 7 Counting number of reads per gene in this STAR manual.
I used gencode annotation. and when I looked into the gene_id, all the id are in form of "ENSG00000223972.5_2", with an underscore and a number attached to it. what is the number after the underscore? is it the exon? (e.g for my example, it would be the exon 2 of ENSG00000223972.5 gene). If that's the case, should I pull the reads for all the exons together?
The number after the underscore also might indicate the isoforms. Check properly what type of annotation you used and which parameters provided while running the tool.
Please read STAR manual, it explains in detail the output of the
--quantMode geneCounts
option.