I have 2x75b TruSeq RNA-Seq data (paired end, stranded) collected on an Illumina instrument, aligned with STAR, and counted with htseq-count (which agreed with STAR's --quantMode GeneCounts option
). These are rat samples (2 conditions, three biological replicates), which has ~32,754 genes. For one of my samples, here is a binned list of raw (htseq-count) counts by gene:
Counts Genes
0 19,136
1-10 3,699
10-100 3,722
100-1,000 3,784
1,000-10,000 2,089
10,000-100,000 309
100,000-1,000,000 15
1,000,000+ 0
Total:
20,399,575 32,754
As you can see, only ~2,400 genes have counts of 1,000 or more. Do I have enough data to perform differential expression analysis with confidence? What counts cutoff, if any, do you use for DE analysis?
As an example, for one specific gene across six samples (first three control, next three test condition) I have counts of 4, 8, 6, 53, 78, 216
and, after normalization, an adjusted p value of 1.28E-06
indicating differential expression at a log2FC of 3.703
.
Can you tell us how many reads are in each sample? What was the alignment percentage?
Generally for eukaryotic data 25-30 M reads per sample are enough for doing DE analysis.
I have nine samples covering three conditions. Here are the raw counts from htseq-count, as well as normalized counts from SARTools' DESeq2 pipeline:
The details in my original post is for sample c1.
Raw_Counts
=Aligned reads
orAligned Fragments
? (does that number include multi-mappers which are about 9% for most)?You have enough data to try the DE analysis.
Here is a more detailed summary:
If I have done what I think I have done (I used htseq-count's
--mode intersection-strict
), thenRaw_Counts = Aligned reads
and does not include multi mappers (multi mappers would be in the--ambiguous
column).not directly related to your question but I would think there are more (protein coding) exons in the rat genome. How do you get to that number?
Here's a quick check I just performed:
32,754 refers to genes I am looking at, not exons. I've adjusted my question.
OK, that sounds more like it indeed. thx